Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup simple fuzzing for unrar. #951

Merged
merged 7 commits into from
Nov 13, 2017
Merged

Setup simple fuzzing for unrar. #951

merged 7 commits into from
Nov 13, 2017

Conversation

aawc
Copy link
Contributor

@aawc aawc commented Nov 3, 2017

Get the shared library to build for unrar. No fuzzing yet.

Edit (2017-11-09): Has simple fuzzing now.

Followed steps at:
https://github.com/google/oss-fuzz/blob/master/docs/new_project_guide.md#overview

@inferno-chromium
Copy link
Collaborator

I think we might break something if we just create a build without any fuzz target. It will definitely break regression testing due to these bad builds. Once you add fuzz target, you can just add the followup cl here and then we will merge them together.

Varun Khaneja and others added 2 commits November 9, 2017 18:42
@aawc aawc changed the title Get the shared library to build for unrar. No fuzzing yet. Setup simple fuzzing for unrar. Nov 10, 2017
@aawc
Copy link
Contributor Author

aawc commented Nov 10, 2017

@inferno-chromium -- PTAL.
I have added a simple fuzzer. It seems to not fail for at least 5 minutes on my local machine.

# remove the .so file so that the linker links unrar statically.
rm -v $SRC/unrar/unrar/libunrar.so

cat <<HERE > $SRC/unrar/unrar_fuzzer.cc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to put this in a file and use the COPY docker command to copy it into the container instead of catting it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

$CXX $CXXFLAGS -v -g -ggdb -std=c++11 -I. \
$SRC/unrar/unrar_fuzzer.cc -o $OUT/unrar_fuzzer \
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DRAR_SMP -DRARDLL \
-lFuzzingEngine -L$SRC/unrar/unrar -lunrar -lstdc++
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: why is -lstdc++ needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed. Removed.


set -eu

tar xf $SRC/unrarsrc-5.5.8.tar.gz
Copy link
Collaborator

@oliverchang oliverchang Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done in the Dockerfile.

e.g.

RUN wget https://www.rarlab.com/rar/unrarsrc-5.5.8.tar.gz && tar xf unrarsrc-5.5.8.tar.gz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#
################################################################################

set -eu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still necessary since we're doing "#!/bin/bash -eu" ?

Copy link
Contributor Author

@aawc aawc Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Varun Khaneja and others added 2 commits November 10, 2017 11:11
@aawc
Copy link
Contributor Author

aawc commented Nov 10, 2017

Not sure if it sent out the email so explicitly adding a comment for that: PTAL
@inferno-chromium @oliverchang

rm -v $UNRAR_SRC_DIR/libunrar.so

# build fuzzer
$CXX $CXXFLAGS -v -g -ggdb -I. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that -v -g -ggdb is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Removed.

#include "rar.hpp"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
char filename[] = "mytemp.XXXXXX";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this to static const

Copy link
Contributor Author

@aawc aawc Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Actually it can't be a const since the mkstemp updates it to store the random file name.

Was: Done.

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
char filename[] = "mytemp.XXXXXX";
int fd = mkstemp(filename);
write(fd, data, size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyway to avoid writing data into file and reading its content back? Otherwise fuzzing would be much slower than it could be if we did everything in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, at the moment the unrar SDK does not provide an API for providing the contents of the file as an input. I can check with the maintainer if they'd be willing to support that in the future.

Varun Khaneja and others added 2 commits November 13, 2017 10:43
@inferno-chromium
Copy link
Collaborator

Looks like all review feedback is incorporated, merging.

@inferno-chromium inferno-chromium merged commit 44ac124 into google:master Nov 13, 2017
@kcc
Copy link
Contributor

kcc commented Nov 15, 2017

I observe lots of cases like this:

==16947==WARNING: AddressSanitizer failed to allocate 0xfefdfbf7efdfbf7d bytes
==16947==AddressSanitizer's allocator is terminating the process instead of returning 0
==16947==If you don't like this behavior set allocator_may_return_null=1
==16947==AddressSanitizer CHECK failed: /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:218 "((0)) != (0)" (0x0, 0x0)
    #0 0x4e8abf in __asan::AsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /src/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:69
    #1 0x5054c5 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79
    #2 0x4ee426 in __sanitizer::ReportAllocatorCannotReturnNull() /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:218
    #3 0x4ee463 in __sanitizer::ReturnNullOrDieOnFailure::OnBadRequest() /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:234
    #4 0x427137 in __asan::asan_realloc(void*, unsigned long, __sanitizer::BufferedStackTrace*) /src/llvm/projects/compiler-rt/lib/asan/asan_allocator.cc:865
    #5 0x4dfb50 in realloc /src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:108
    #6 0x5a48c7 in Array<unsigned char>::Add(unsigned long) /src/unrar/./array.hpp:129:22
    #7 0x5f3d24 in Archive::ProcessExtra50(RawRead*, unsigned long, BaseBlock*) /src/unrar/arcread.cpp:1148:25
    #8 0x5f0caf in Archive::ReadHeader50() /src/unrar/arcread.cpp:827:11
    #9 0x5ea8c8 in Archive::ReadHeader() /src/unrar/arcread.cpp:25:16
    #10 0x5e9088 in Archive::IsArchive(bool) /src/unrar/archive.cpp:196:10
    #11 0x59c309 in CmdExtract::ExtractArchive() /src/unrar/extract.cpp:105:12
    #12 0x59bc7f in CmdExtract::DoExtract() /src/unrar/extract.cpp:45:29
    #13 0x51b086 in LLVMFuzzerTestOneInput /src/unrar/unrar_fuzzer.cc:22:15

oss-fuzz sets allocator_may_return_null=1 so this doesn't lead to a crash,
but I wonder if this behavior is expected.

@aawc
Copy link
Contributor Author

aawc commented Nov 15, 2017

@kcc I'll follow-up with the maintainer. I think elsewhere he has suggested specifying an option to limit the allocation size.


try {
CmdExtract extractor(cmd_data.get());
extractor.DoExtract();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for to prevent files from being written to disk? We're seeing some issues on our VMs due to junk files being written after each run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the library does not provide a way, but I can ask them to add it.
Is there a good interim solution for this?

Copy link
Contributor

@Dor1s Dor1s Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We encourage developers to store fuzz target next to the project source code. That also simplifies usage of "internal" APIs, e.g. if DoExtract() reads the file and then calls something else (let's name it "DoExtractOnDataBuffer") to do the actual unpacking, we should call that method directly without extra steps like file creation.

After a quick look at the extraction code (https://github.com/aawc/unrar/blob/2a079823c708a637bc36e888180ebb96fdfba526/extract.cpp), it seems a bit more complicated. In that case, another approach can be to have a mock Archive class that actually keeps the data in memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dor1s -- I'm discussing this with the maintainer. A mock Archive is also a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use mytemp-PID -- this will create one file per process, but multiple processes won't conflict.

he does not plan to implement an in-memory

Sad. the file IO probably costs us 10x in CPU time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcc thanks. If we are reusing files, might as well use the simplest approach and have the exact same filename. If it runs into any issues, I'll definitely try your suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-using filename fixed via #994

Copy link
Contributor Author

@aawc aawc Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cc: @kcc)
The maintainer provided me a patch to use in-memory archives instead of doing file IO.

The patch is here: https://github.com/aawc/unrar/compare/merge_5.6.1.4

I ran the fuzzer locally with and without the patch and on my beefy machine the numbers look like this:

root@7961b333a0f1:/out# unrar_fuzzer_inmem -runs=100000 2>&1 | grep second
Done 100000 runs in 49 second(s)

root@7961b333a0f1:/out# unrar_fuzzer_file -runs=100000 2>&1 | grep second
Done 100000 runs in 56 second(s)

@Dor1s thinks that the difference on VMs would be much more significant since they use HDDs instead of SSDs.

Copy link
Contributor Author

@aawc aawc Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root@7961b333a0f1:/out# unrar_fuzzer_inmem -runs=332254 2>&1 | grep second
Done 332254 runs in 341 second(s)

root@7961b333a0f1:/out# unrar_fuzzer_file -runs=332254 2>&1 | grep second
Done 332254 runs in 295 second(s)

So about a 15% increase.

Here's the diff:

diff --git a/projects/unrar/Dockerfile b/projects/unrar/Dockerfile
index bbdd722..d25c44f 100644
--- a/projects/unrar/Dockerfile
+++ b/projects/unrar/Dockerfile
@@ -18,7 +18,7 @@ FROM gcr.io/oss-fuzz-base/base-builder
 MAINTAINER vakh@chromium.org
 RUN apt-get update && apt-get install -y make build-essential
 
-RUN git clone --depth 1 https://github.com/aawc/unrar.git --branch merge_5.6.1.3 --single-branch
+RUN git clone --depth 1 https://github.com/aawc/unrar.git --branch merge_5.6.1.4 --single-branch
 WORKDIR unrar
 
 COPY build.sh $SRC/
diff --git a/projects/unrar/unrar_fuzzer.cc b/projects/unrar/unrar_fuzzer.cc
index 084aa6a..8089be4 100644
--- a/projects/unrar/unrar_fuzzer.cc
+++ b/projects/unrar/unrar_fuzzer.cc
@@ -9,19 +9,20 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
   std::stringstream ss;
   ss << "temp-" << getpid() << ".rar";
   static const std::string filename = ss.str();
-  std::ofstream file(filename,
-                     std::ios::binary | std::ios::out | std::ios::trunc);
-  if (!file.is_open()) {
-    return 0;
-  }
-  file.write(reinterpret_cast<const char *>(data), size);
-  file.close();
+  //std::ofstream file(filename,
+  //                   std::ios::binary | std::ios::out | std::ios::trunc);
+  //if (!file.is_open()) {
+  //  return 0;
+  //}
+  //file.write(reinterpret_cast<const char *>(data), size);
+  //file.close();
 
   std::unique_ptr<CommandData> cmd_data(new CommandData);
   cmd_data->ParseArg(const_cast<wchar_t *>(L"-p"));
   cmd_data->ParseArg(const_cast<wchar_t *>(L"x"));
   cmd_data->ParseDone();
   std::wstring wide_filename(filename.begin(), filename.end());
+  cmd_data->SetArcInMem(const_cast<unsigned char *>(data), size);
   cmd_data->AddArcName(wide_filename.c_str());
 
   try {
@@ -30,7 +31,7 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
   } catch (...) {
   }
 
-  unlink(filename.c_str());
+  //unlink(filename.c_str());
 
   return 0;
 }

@aawc
Copy link
Contributor Author

aawc commented Jan 24, 2018

It appears that doing in-memory fuzzing is not really providing any meaningful gains in fuzzer performance.

I enabled in-memory fuzzing via #1090
Here are the stats for the fuzzer:
Before: https://oss-fuzz.com/v2/fuzzer-stats/by-fuzzer/2018-01-12/2018-01-16/fuzzer/libFuzzer_unrar_fuzzer (avg_exec_per_sec: 110.539)
After: https://oss-fuzz.com/v2/fuzzer-stats/by-fuzzer/2018-01-19/2018-01-23/fuzzer/libFuzzer_unrar_fuzzer (avg_exec_per_sec: 109.907)

It is surprising that the average executions per second reduced because at the very least, doing it in-memory avoids file IO so it should be faster or much faster. Is my interpretation incorrect?

CC: @Dor1s @oliverchang @inferno-chromium

@kcc
Copy link
Contributor

kcc commented Jan 24, 2018

W/o actually looking at the profile my hypothesis is that there are other reasons of slowness that make i/o slowdown less important. 110 exec/s is not great (but not too bad either).

@inferno-chromium
Copy link
Collaborator

when i see
https://oss-fuzz.com/v2/performance-report/libFuzzer_unrar_fuzzer/libfuzzer_asan_unrar/2018-01-22
oom and timeout account for 70% of runs failure, these should be causing the slowdown and needs to be fixed first.
https://oss-fuzz.com/v2/testcase-detail/6476783588212736
https://oss-fuzz.com/v2/testcase-detail/5247511359913984

tmatth pushed a commit to tmatth/oss-fuzz that referenced this pull request Oct 22, 2018
* Get the shared library to build for unrar

* Fuzz by writing temp file and calling CmdExtract::DoExtract()

* Incorporate review feedback

* Incorporate review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants