-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup simple fuzzing for unrar. #951
Conversation
I think we might break something if we just create a build without any fuzz target. It will definitely break regression testing due to these bad builds. Once you add fuzz target, you can just add the followup cl here and then we will merge them together. |
Fuzz by writing temp file and calling CmdExtract::DoExtract()
@inferno-chromium -- PTAL. |
projects/unrar/build.sh
Outdated
# remove the .so file so that the linker links unrar statically. | ||
rm -v $SRC/unrar/unrar/libunrar.so | ||
|
||
cat <<HERE > $SRC/unrar/unrar_fuzzer.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to put this in a file and use the COPY
docker command to copy it into the container instead of catting it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
projects/unrar/build.sh
Outdated
$CXX $CXXFLAGS -v -g -ggdb -std=c++11 -I. \ | ||
$SRC/unrar/unrar_fuzzer.cc -o $OUT/unrar_fuzzer \ | ||
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DRAR_SMP -DRARDLL \ | ||
-lFuzzingEngine -L$SRC/unrar/unrar -lunrar -lstdc++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious: why is -lstdc++ needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed. Removed.
projects/unrar/build.sh
Outdated
|
||
set -eu | ||
|
||
tar xf $SRC/unrarsrc-5.5.8.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be done in the Dockerfile.
e.g.
RUN wget https://www.rarlab.com/rar/unrarsrc-5.5.8.tar.gz && tar xf unrarsrc-5.5.8.tar.gz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
projects/unrar/build.sh
Outdated
# | ||
################################################################################ | ||
|
||
set -eu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still necessary since we're doing "#!/bin/bash -eu" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
Incorporate review feedback
Not sure if it sent out the email so explicitly adding a comment for that: PTAL |
projects/unrar/build.sh
Outdated
rm -v $UNRAR_SRC_DIR/libunrar.so | ||
|
||
# build fuzzer | ||
$CXX $CXXFLAGS -v -g -ggdb -I. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that -v -g -ggdb
is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Removed.
#include "rar.hpp" | ||
|
||
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { | ||
char filename[] = "mytemp.XXXXXX"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make this to static const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Actually it can't be a const since the mkstemp
updates it to store the random file name.
Was: Done.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { | ||
char filename[] = "mytemp.XXXXXX"; | ||
int fd = mkstemp(filename); | ||
write(fd, data, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anyway to avoid writing data into file and reading its content back? Otherwise fuzzing would be much slower than it could be if we did everything in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, at the moment the unrar SDK does not provide an API for providing the contents of the file as an input. I can check with the maintainer if they'd be willing to support that in the future.
Incorporate review feedback
Looks like all review feedback is incorporated, merging. |
I observe lots of cases like this:
oss-fuzz sets allocator_may_return_null=1 so this doesn't lead to a crash, |
@kcc I'll follow-up with the maintainer. I think elsewhere he has suggested specifying an option to limit the allocation size. |
|
||
try { | ||
CmdExtract extractor(cmd_data.get()); | ||
extractor.DoExtract(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for to prevent files from being written to disk? We're seeing some issues on our VMs due to junk files being written after each run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the library does not provide a way, but I can ask them to add it.
Is there a good interim solution for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We encourage developers to store fuzz target next to the project source code. That also simplifies usage of "internal" APIs, e.g. if DoExtract()
reads the file and then calls something else (let's name it "DoExtractOnDataBuffer") to do the actual unpacking, we should call that method directly without extra steps like file creation.
After a quick look at the extraction code (https://github.com/aawc/unrar/blob/2a079823c708a637bc36e888180ebb96fdfba526/extract.cpp), it seems a bit more complicated. In that case, another approach can be to have a mock Archive
class that actually keeps the data in memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dor1s -- I'm discussing this with the maintainer. A mock Archive
is also a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use mytemp-PID
-- this will create one file per process, but multiple processes won't conflict.
he does not plan to implement an in-memory
Sad. the file IO probably costs us 10x in CPU time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kcc thanks. If we are reusing files, might as well use the simplest approach and have the exact same filename. If it runs into any issues, I'll definitely try your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-using filename fixed via #994
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(cc: @kcc)
The maintainer provided me a patch to use in-memory archives instead of doing file IO.
The patch is here: https://github.com/aawc/unrar/compare/merge_5.6.1.4
I ran the fuzzer locally with and without the patch and on my beefy machine the numbers look like this:
root@7961b333a0f1:/out# unrar_fuzzer_inmem -runs=100000 2>&1 | grep second
Done 100000 runs in 49 second(s)
root@7961b333a0f1:/out# unrar_fuzzer_file -runs=100000 2>&1 | grep second
Done 100000 runs in 56 second(s)
@Dor1s thinks that the difference on VMs would be much more significant since they use HDDs instead of SSDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
root@7961b333a0f1:/out# unrar_fuzzer_inmem -runs=332254 2>&1 | grep second
Done 332254 runs in 341 second(s)
root@7961b333a0f1:/out# unrar_fuzzer_file -runs=332254 2>&1 | grep second
Done 332254 runs in 295 second(s)
So about a 15% increase.
Here's the diff:
diff --git a/projects/unrar/Dockerfile b/projects/unrar/Dockerfile
index bbdd722..d25c44f 100644
--- a/projects/unrar/Dockerfile
+++ b/projects/unrar/Dockerfile
@@ -18,7 +18,7 @@ FROM gcr.io/oss-fuzz-base/base-builder
MAINTAINER vakh@chromium.org
RUN apt-get update && apt-get install -y make build-essential
-RUN git clone --depth 1 https://github.com/aawc/unrar.git --branch merge_5.6.1.3 --single-branch
+RUN git clone --depth 1 https://github.com/aawc/unrar.git --branch merge_5.6.1.4 --single-branch
WORKDIR unrar
COPY build.sh $SRC/
diff --git a/projects/unrar/unrar_fuzzer.cc b/projects/unrar/unrar_fuzzer.cc
index 084aa6a..8089be4 100644
--- a/projects/unrar/unrar_fuzzer.cc
+++ b/projects/unrar/unrar_fuzzer.cc
@@ -9,19 +9,20 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
std::stringstream ss;
ss << "temp-" << getpid() << ".rar";
static const std::string filename = ss.str();
- std::ofstream file(filename,
- std::ios::binary | std::ios::out | std::ios::trunc);
- if (!file.is_open()) {
- return 0;
- }
- file.write(reinterpret_cast<const char *>(data), size);
- file.close();
+ //std::ofstream file(filename,
+ // std::ios::binary | std::ios::out | std::ios::trunc);
+ //if (!file.is_open()) {
+ // return 0;
+ //}
+ //file.write(reinterpret_cast<const char *>(data), size);
+ //file.close();
std::unique_ptr<CommandData> cmd_data(new CommandData);
cmd_data->ParseArg(const_cast<wchar_t *>(L"-p"));
cmd_data->ParseArg(const_cast<wchar_t *>(L"x"));
cmd_data->ParseDone();
std::wstring wide_filename(filename.begin(), filename.end());
+ cmd_data->SetArcInMem(const_cast<unsigned char *>(data), size);
cmd_data->AddArcName(wide_filename.c_str());
try {
@@ -30,7 +31,7 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
} catch (...) {
}
- unlink(filename.c_str());
+ //unlink(filename.c_str());
return 0;
}
It appears that doing in-memory fuzzing is not really providing any meaningful gains in fuzzer performance. I enabled in-memory fuzzing via #1090 It is surprising that the average executions per second reduced because at the very least, doing it in-memory avoids file IO so it should be faster or much faster. Is my interpretation incorrect? |
W/o actually looking at the profile my hypothesis is that there are other reasons of slowness that make i/o slowdown less important. 110 exec/s is not great (but not too bad either). |
when i see |
* Get the shared library to build for unrar * Fuzz by writing temp file and calling CmdExtract::DoExtract() * Incorporate review feedback * Incorporate review feedback
Get the shared library to build for unrar. No fuzzing yet.
Edit (2017-11-09): Has simple fuzzing now.
Followed steps at:
https://github.com/google/oss-fuzz/blob/master/docs/new_project_guide.md#overview