Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting cmake code for generating coverage reports #2047

Merged
merged 9 commits into from
Jan 6, 2022

Conversation

piponazo
Copy link
Collaborator

@piponazo piponazo commented Jan 5, 2022

I started taking a look to the existing CMake code we have to generate Code Coverage reports with gcovr. Since we started using ctest, I decided to update the notes we had in the file cmake/gcovr.cmake. I also made some changes there to exclude the folders that are not interesting for the coverage report.

Afterwards I also added an additional "unrelated" change which came to my mind after reviewing the code coverage report. I noticed that the BasicIO::seek function had an overload that could be easily simplified. I removed the overload and the compilation and tests seem to stay green.

What I have in mind at the moment for the beginning of 2022 is to increase the code coverage by adding more unit tests, so that I can learn more about the code base. Hopefully this will also lead me to find more refactoring opportunities.

src/basicio.cpp Outdated
@@ -1931,32 +1883,14 @@ namespace Exiv2 {
case BasicIo::end: newIdx = p_->size_ + offset; break;
}

/// \todo in previous ::seek() overload there was the following comment:
/// #1198. Don't return 1 when asked to seek past EOF. Stay calm and set eof_
/// And the following line was commented. Check out this part of the code
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change that I think it might be a bit risky. The previous 2 RemoteIo::seek() overloads had slightly different code.

Any ideas about how we could test this? I think somebody added some python tests to exercise the RemoteIo code, but I did not pay too much attention there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piponazo I thought @kevinbackhouse removed isMalloced_ from the code base last week.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed a different isMalloced_:

bool isMalloced_; //!< True if this entry owns the value data

I think the isMalloced_ in basicio.cpp looks ok because I don't see any copy constructors or operators. In TiffEntryBase I was worried that the ownership of the buffer was unclear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safer to make sure that 0 <= idx_ and idx_ <= size_ are always true. Because code like this doesn't look like it was designed to handle the case where idx_ is out-of-bounds:

exiv2/src/basicio.cpp

Lines 1858 to 1866 in c28e501

long RemoteIo::read(byte* buf, long rcount)
{
assert(p_->isMalloced_);
if (p_->eof_) return 0;
p_->totalRead_ += rcount;
size_t allow = std::min(rcount, (long)( p_->size_ - p_->idx_));
size_t lowBlock = p_->idx_ /p_->blockSize_;
size_t highBlock = (p_->idx_ + allow)/p_->blockSize_;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your suggestion and added the check for 0 <= idx_ and also set the EOF variable to true in case the newIdx is bigger than the size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas about how we could test this? I think somebody added some python tests to exercise the RemoteIo code, but I did not pay too much attention there.

@piponazo There was some python to test RemoteIo. I think it was focused on getting complete files and the like and could exercise different http verbs (the request methods: get/head/etc). We've deprecated the WebReady ssh code. I don't know the status of ftp. Nobody has ever reported any issue about ssh or ftp, so I don't think it's being used.

http is tested in the bash tests by powering up the python web-server. There's code in tests/bash_tests/utils.py to start/stop the python web server.

Can you write a C++ unit test (in bin/unit_tests) to test BasicIo::seek() on local storage? When that's working, we can discuss how best to power up the python web-server and perform the test on a remote file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Robin. We already have such unit tests for MemIo and FileIo. You can see those tests in the following files:

exiv2/unitTests/test_basicio.cpp
exiv2/unitTests/test_FileIo.cpp

I have actually revisited some of those tests, made some improvements there (specially in terms of test naming) and add few new tests.

Copy link
Collaborator

@clanmills clanmills Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful API analysis and code elimination in classes FileIo and MemIo is of great value. Probably 99.9% of BasicIo use occurs in class FileIo and class MemIo.

How about opening an issue: "Unit Test RemoteIo". Just punt your observations about RemoteIo and one day somebody will have fun dealing with them.

@codecov
Copy link

codecov bot commented Jan 5, 2022

Codecov Report

Merging #2047 (d6f8649) into main (a094dde) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2047      +/-   ##
==========================================
+ Coverage   61.39%   61.43%   +0.03%     
==========================================
  Files          96       96              
  Lines       19205    19203       -2     
  Branches     9849     9847       -2     
==========================================
+ Hits        11791    11797       +6     
+ Misses       5093     5087       -6     
+ Partials     2321     2319       -2     
Impacted Files Coverage Δ
include/exiv2/basicio.hpp 100.00% <ø> (+20.00%) ⬆️
src/basicio.cpp 49.75% <100.00%> (+0.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a094dde...d6f8649. Read the comment docs.

@piponazo piponazo self-assigned this Jan 5, 2022
Copy link
Collaborator

@clanmills clanmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Our test suite is focused on user functionality with increasing security tests. Being certain that the plumbing is minimised and works correctly is a very good idea.

I see you've expanded some of the BasicIo functions to use int64_t. For historical reasons, BasicIo API uses longs and ints. We didn't "take the hit" when we moved to 64bit to analyse the API carefully and ensure it uses uint64_t everywhere (we ignored this and didn't deal with it at all).

It's another matter is to test that the code really works on huge files. I created 10GB tiffs while working on my book and exiv2 successfully read the files. Can src/tiff*.cpp really descend into 64 bit files? One day we'll have a project to support BigTiff and multi-page Tiff. That is when we must test really big files. Imperial College have 100GB medical imaging files which have thousands of pages. For now, let's ensure that BasicIo uses 64bit.

src/basicio.cpp Outdated
Comment on lines 1886 to 1887
if (newIdx < 0 || static_cast<size_t>(newIdx) > p_->size_)
return 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (newIdx < 0 || static_cast<size_t>(newIdx) > p_->size_)
return 1;
if (newIdx < 0)
return 1;
if (static_cast<size_t>(newIdx) > p_->size_) {
p_->eof_ = true;
return 1;
}

src/basicio.cpp Outdated
if (p_->idx_ > static_cast<long>(p_->size_))
p_->idx_ = static_cast<long>(p_->size_);
p_->idx_ = static_cast<long>(newIdx);
p_->eof_ = static_cast<size_t>(newIdx) > p_->size_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
p_->eof_ = static_cast<size_t>(newIdx) > p_->size_;
p_->eof_ = false;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, I think my suggestion is wrong. I noticed that your assignment p_->eof_ = static_cast<size_t>(newIdx) > p_->size_ was equivalent to p_->eof_ = false due to the early return. But it also doesn't make sense to change p_->eof_ if p_->idx_ doesn't change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered your suggestions and thanks to that I realised that MemIo and RemoteIo were sharing the same implementation for the seek method. Therefore, I have factored out that code and reuse it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does fseek() set eof_ in the standard C library?

Emulating FILE* semantics in BasicIo is a good design. @piponazo, you'll remember I didn't do that with RemoteIo and got into trouble like billions of fstat() calls.

I don't like the early return. As horrible as goto. I'd only allow return on the initial and last statement of a function. (I've never met anybody who agrees with me about this. That doesn't change my opinion!).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::fseek does not seems to set EOF in the stream, other operations like std::fgetc or std::fread sets EOF when one tries to read beyond the limits of the stream. After that, you can check the EOF status with std::feof. That's actually what we do in FileIo.

Regarding the multiple return calls, I do not have strong opinions about it. I normally would agree with you (early return or return at the end). However, even in the Cpp Core Guidelines, people have make a note about this "old rule":
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#nr2-dont-insist-to-have-only-a-single-return-statement-in-a-function

I think I will leave the seek implementations as they were before, because it was not the main intention of this PR to refactor that code. I do not think there is a major issue there (problably the RemoteIO class is not used too much). I just noticed that the implementations of MemIo::seek and RemoteIo::seek were almost similar and it could make sense to reuse the same code for both.

@clanmills
Copy link
Collaborator

clanmills commented Jan 5, 2022

@piponazo. What you've described sounds correct. feof() is set when you fread()/fgetc() past end of file and not set by an attempt to fseek() past the end.

I think you've done really good work here. I'll leave @kevinbackhouse to approve it.

I would write this code:

template<class T>
//  requires Number<T>
string sign(T x)
{
    if (x < 0)
        return "negative";
    if (x > 0)
        return "positive";
    return "zero";
}

With much simpler/clearer code which many people would hate:

template<class T>
//  requires Number<T>
string sign(T x)
{
    return x < 0  ? "negative"
         : x > 0  ? "positive"
         : "zero"
         ;
}

@piponazo piponazo merged commit 8135665 into main Jan 6, 2022
@piponazo piponazo deleted the main_coverageNotes branch January 6, 2022 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants