Setup simple fuzzing for unrar. #951

aawc · 2017-11-03T23:02:07Z

Get the shared library to build for unrar. No fuzzing yet.

Edit (2017-11-09): Has simple fuzzing now.

Followed steps at:
https://github.com/google/oss-fuzz/blob/master/docs/new_project_guide.md#overview

inferno-chromium · 2017-11-06T15:19:06Z

I think we might break something if we just create a build without any fuzz target. It will definitely break regression testing due to these bad builds. Once you add fuzz target, you can just add the followup cl here and then we will merge them together.

Fuzz by writing temp file and calling CmdExtract::DoExtract()

aawc · 2017-11-10T02:48:07Z

@inferno-chromium -- PTAL.
I have added a simple fuzzer. It seems to not fail for at least 5 minutes on my local machine.

oliverchang · 2017-11-10T03:00:35Z

projects/unrar/build.sh

+# remove the .so file so that the linker links unrar statically.
+rm -v $SRC/unrar/unrar/libunrar.so
+
+cat <<HERE > $SRC/unrar/unrar_fuzzer.cc


I think it's better to put this in a file and use the COPY docker command to copy it into the container instead of catting it here.

oliverchang · 2017-11-10T03:03:39Z

projects/unrar/build.sh

+$CXX $CXXFLAGS -v -g -ggdb -std=c++11 -I. \
+     $SRC/unrar/unrar_fuzzer.cc -o $OUT/unrar_fuzzer \
+     -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DRAR_SMP -DRARDLL \
+     -lFuzzingEngine -L$SRC/unrar/unrar -lunrar -lstdc++


just curious: why is -lstdc++ needed?

Not needed. Removed.

oliverchang · 2017-11-10T03:04:41Z

projects/unrar/build.sh

+
+set -eu
+
+tar xf $SRC/unrarsrc-5.5.8.tar.gz


this can be done in the Dockerfile.

e.g.

RUN wget https://www.rarlab.com/rar/unrarsrc-5.5.8.tar.gz && tar xf unrarsrc-5.5.8.tar.gz

oliverchang · 2017-11-10T03:05:47Z

projects/unrar/build.sh

+#
+################################################################################
+
+set -eu


is this still necessary since we're doing "#!/bin/bash -eu" ?

Incorporate review feedback

aawc · 2017-11-10T23:06:25Z

Not sure if it sent out the email so explicitly adding a comment for that: PTAL
@inferno-chromium @oliverchang

Dor1s · 2017-11-12T03:05:09Z

projects/unrar/build.sh

+rm -v $UNRAR_SRC_DIR/libunrar.so
+
+# build fuzzer
+$CXX $CXXFLAGS -v -g -ggdb -I. \


I don't think that -v -g -ggdb is necessary

Agree. Removed.

Dor1s · 2017-11-12T03:06:09Z

projects/unrar/unrar_fuzzer.cc

+#include "rar.hpp"
+
+extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
+  char filename[] = "mytemp.XXXXXX";


let's make this to static const

Update: Actually it can't be a const since the mkstemp updates it to store the random file name.

Was: Done.

Dor1s · 2017-11-12T03:09:00Z

projects/unrar/unrar_fuzzer.cc

+extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
+  char filename[] = "mytemp.XXXXXX";
+  int fd = mkstemp(filename);
+  write(fd, data, size);


Is there anyway to avoid writing data into file and reading its content back? Otherwise fuzzing would be much slower than it could be if we did everything in memory.

Unfortunately, at the moment the unrar SDK does not provide an API for providing the contents of the file as an input. I can check with the maintainer if they'd be willing to support that in the future.

Incorporate review feedback

inferno-chromium · 2017-11-13T18:48:28Z

Looks like all review feedback is incorporated, merging.

kcc · 2017-11-15T16:38:36Z

I observe lots of cases like this:

==16947==WARNING: AddressSanitizer failed to allocate 0xfefdfbf7efdfbf7d bytes
==16947==AddressSanitizer's allocator is terminating the process instead of returning 0
==16947==If you don't like this behavior set allocator_may_return_null=1
==16947==AddressSanitizer CHECK failed: /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:218 "((0)) != (0)" (0x0, 0x0)
    #0 0x4e8abf in __asan::AsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /src/llvm/projects/compiler-rt/lib/asan/asan_rtl.cc:69
    #1 0x5054c5 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79
    #2 0x4ee426 in __sanitizer::ReportAllocatorCannotReturnNull() /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:218
    #3 0x4ee463 in __sanitizer::ReturnNullOrDieOnFailure::OnBadRequest() /src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:234
    #4 0x427137 in __asan::asan_realloc(void*, unsigned long, __sanitizer::BufferedStackTrace*) /src/llvm/projects/compiler-rt/lib/asan/asan_allocator.cc:865
    #5 0x4dfb50 in realloc /src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:108
    #6 0x5a48c7 in Array<unsigned char>::Add(unsigned long) /src/unrar/./array.hpp:129:22
    #7 0x5f3d24 in Archive::ProcessExtra50(RawRead*, unsigned long, BaseBlock*) /src/unrar/arcread.cpp:1148:25
    #8 0x5f0caf in Archive::ReadHeader50() /src/unrar/arcread.cpp:827:11
    #9 0x5ea8c8 in Archive::ReadHeader() /src/unrar/arcread.cpp:25:16
    #10 0x5e9088 in Archive::IsArchive(bool) /src/unrar/archive.cpp:196:10
    #11 0x59c309 in CmdExtract::ExtractArchive() /src/unrar/extract.cpp:105:12
    #12 0x59bc7f in CmdExtract::DoExtract() /src/unrar/extract.cpp:45:29
    #13 0x51b086 in LLVMFuzzerTestOneInput /src/unrar/unrar_fuzzer.cc:22:15

oss-fuzz sets allocator_may_return_null=1 so this doesn't lead to a crash,
but I wonder if this behavior is expected.

aawc · 2017-11-15T17:30:30Z

@kcc I'll follow-up with the maintainer. I think elsewhere he has suggested specifying an option to limit the allocation size.

oliverchang · 2017-11-16T00:36:19Z

projects/unrar/unrar_fuzzer.cc

+
+  try {
+    CmdExtract extractor(cmd_data.get());
+    extractor.DoExtract();


Is there a way for to prevent files from being written to disk? We're seeing some issues on our VMs due to junk files being written after each run.

At the moment the library does not provide a way, but I can ask them to add it.
Is there a good interim solution for this?

We encourage developers to store fuzz target next to the project source code. That also simplifies usage of "internal" APIs, e.g. if DoExtract() reads the file and then calls something else (let's name it "DoExtractOnDataBuffer") to do the actual unpacking, we should call that method directly without extra steps like file creation.

After a quick look at the extraction code (https://github.com/aawc/unrar/blob/2a079823c708a637bc36e888180ebb96fdfba526/extract.cpp), it seems a bit more complicated. In that case, another approach can be to have a mock Archive class that actually keeps the data in memory

@Dor1s -- I'm discussing this with the maintainer. A mock Archive is also a good idea.

you can use mytemp-PID -- this will create one file per process, but multiple processes won't conflict.

he does not plan to implement an in-memory

Sad. the file IO probably costs us 10x in CPU time.

@kcc thanks. If we are reusing files, might as well use the simplest approach and have the exact same filename. If it runs into any issues, I'll definitely try your suggestion.

Re-using filename fixed via #994

(cc: @kcc)
The maintainer provided me a patch to use in-memory archives instead of doing file IO.

The patch is here: https://github.com/aawc/unrar/compare/merge_5.6.1.4

I ran the fuzzer locally with and without the patch and on my beefy machine the numbers look like this:

root@7961b333a0f1:/out# unrar_fuzzer_inmem -runs=100000 2>&1 | grep second Done 100000 runs in 49 second(s) root@7961b333a0f1:/out# unrar_fuzzer_file -runs=100000 2>&1 | grep second Done 100000 runs in 56 second(s)

@Dor1s thinks that the difference on VMs would be much more significant since they use HDDs instead of SSDs.

root@7961b333a0f1:/out# unrar_fuzzer_inmem -runs=332254 2>&1 | grep second Done 332254 runs in 341 second(s) root@7961b333a0f1:/out# unrar_fuzzer_file -runs=332254 2>&1 | grep second Done 332254 runs in 295 second(s)

So about a 15% increase.

Here's the diff:

diff --git a/projects/unrar/Dockerfile b/projects/unrar/Dockerfile index bbdd722..d25c44f 100644 --- a/projects/unrar/Dockerfile +++ b/projects/unrar/Dockerfile @@ -18,7 +18,7 @@ FROM gcr.io/oss-fuzz-base/base-builder MAINTAINER vakh@chromium.org RUN apt-get update && apt-get install -y make build-essential -RUN git clone --depth 1 https://github.com/aawc/unrar.git --branch merge_5.6.1.3 --single-branch +RUN git clone --depth 1 https://github.com/aawc/unrar.git --branch merge_5.6.1.4 --single-branch WORKDIR unrar COPY build.sh $SRC/ diff --git a/projects/unrar/unrar_fuzzer.cc b/projects/unrar/unrar_fuzzer.cc index 084aa6a..8089be4 100644 --- a/projects/unrar/unrar_fuzzer.cc +++ b/projects/unrar/unrar_fuzzer.cc @@ -9,19 +9,20 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { std::stringstream ss; ss << "temp-" << getpid() << ".rar"; static const std::string filename = ss.str(); - std::ofstream file(filename, - std::ios::binary | std::ios::out | std::ios::trunc); - if (!file.is_open()) { - return 0; - } - file.write(reinterpret_cast<const char *>(data), size); - file.close(); + //std::ofstream file(filename, + // std::ios::binary | std::ios::out | std::ios::trunc); + //if (!file.is_open()) { + // return 0; + //} + //file.write(reinterpret_cast<const char *>(data), size); + //file.close(); std::unique_ptr<CommandData> cmd_data(new CommandData); cmd_data->ParseArg(const_cast<wchar_t *>(L"-p")); cmd_data->ParseArg(const_cast<wchar_t *>(L"x")); cmd_data->ParseDone(); std::wstring wide_filename(filename.begin(), filename.end()); + cmd_data->SetArcInMem(const_cast<unsigned char *>(data), size); cmd_data->AddArcName(wide_filename.c_str()); try { @@ -30,7 +31,7 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { } catch (...) { } - unlink(filename.c_str()); + //unlink(filename.c_str()); return 0; }

aawc · 2018-01-24T23:26:22Z

It appears that doing in-memory fuzzing is not really providing any meaningful gains in fuzzer performance.

I enabled in-memory fuzzing via #1090
Here are the stats for the fuzzer:
Before: https://oss-fuzz.com/v2/fuzzer-stats/by-fuzzer/2018-01-12/2018-01-16/fuzzer/libFuzzer_unrar_fuzzer (avg_exec_per_sec: 110.539)
After: https://oss-fuzz.com/v2/fuzzer-stats/by-fuzzer/2018-01-19/2018-01-23/fuzzer/libFuzzer_unrar_fuzzer (avg_exec_per_sec: 109.907)

It is surprising that the average executions per second reduced because at the very least, doing it in-memory avoids file IO so it should be faster or much faster. Is my interpretation incorrect?

CC: @Dor1s @oliverchang @inferno-chromium

kcc · 2018-01-24T23:35:51Z

W/o actually looking at the profile my hypothesis is that there are other reasons of slowness that make i/o slowdown less important. 110 exec/s is not great (but not too bad either).

inferno-chromium · 2018-01-25T05:23:23Z

when i see
https://oss-fuzz.com/v2/performance-report/libFuzzer_unrar_fuzzer/libfuzzer_asan_unrar/2018-01-22
oom and timeout account for 70% of runs failure, these should be causing the slowdown and needs to be fixed first.
https://oss-fuzz.com/v2/testcase-detail/6476783588212736
https://oss-fuzz.com/v2/testcase-detail/5247511359913984

* Get the shared library to build for unrar * Fuzz by writing temp file and calling CmdExtract::DoExtract() * Incorporate review feedback * Incorporate review feedback

Get the shared library to build for unrar

3fe239b

inferno-chromium requested a review from oliverchang November 6, 2017 15:18

Varun Khaneja and others added 2 commits November 9, 2017 18:42

Fuzz by writing temp file and calling CmdExtract::DoExtract()

f5b7576

Merge pull request #1 from aawc/02_unrar_build_lib

e71f8b2

Fuzz by writing temp file and calling CmdExtract::DoExtract()

aawc changed the title ~~Get the shared library to build for unrar. No fuzzing yet.~~ Setup simple fuzzing for unrar. Nov 10, 2017

oliverchang reviewed Nov 10, 2017

View reviewed changes

Varun Khaneja and others added 2 commits November 10, 2017 11:11

Incorporate review feedback

af7b90d

Merge pull request #2 from aawc/02_unrar_build_lib

aadd68a

Incorporate review feedback

Dor1s reviewed Nov 12, 2017

View reviewed changes

Varun Khaneja and others added 2 commits November 13, 2017 10:43

Incorporate review feedback

172c4e9

Merge pull request #3 from aawc/02_unrar_build_lib

8515482

Incorporate review feedback

inferno-chromium merged commit 44ac124 into google:master Nov 13, 2017

oliverchang reviewed Nov 16, 2017

View reviewed changes

aawc mentioned this pull request Nov 17, 2017

Use temp-PID.rar as the name of the input file created on disk #994

Merged

aawc mentioned this pull request Jan 18, 2018

Interpret a blob of memory as a rar file for fuzzing. #1090

Merged

aawc deleted the unrar_build_lib branch January 18, 2018 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup simple fuzzing for unrar. #951

Setup simple fuzzing for unrar. #951

aawc commented Nov 3, 2017 •

edited

Loading

inferno-chromium commented Nov 6, 2017

aawc commented Nov 10, 2017

oliverchang Nov 10, 2017

aawc Nov 10, 2017

oliverchang Nov 10, 2017

aawc Nov 10, 2017

oliverchang Nov 10, 2017 •

edited

Loading

aawc Nov 10, 2017

oliverchang Nov 10, 2017

aawc Nov 10, 2017 •

edited

Loading

aawc commented Nov 10, 2017

Dor1s Nov 12, 2017

aawc Nov 13, 2017

Dor1s Nov 12, 2017

aawc Nov 13, 2017 •

edited

Loading

Dor1s Nov 12, 2017

aawc Nov 13, 2017

inferno-chromium commented Nov 13, 2017

kcc commented Nov 15, 2017

aawc commented Nov 15, 2017 •

edited

Loading

oliverchang Nov 16, 2017

oliverchang Nov 16, 2017

aawc Nov 16, 2017

Dor1s Nov 16, 2017 •

edited

Loading

aawc Nov 16, 2017

kcc Nov 17, 2017

aawc Nov 17, 2017

aawc Nov 17, 2017

aawc Nov 29, 2017 •

edited

Loading

aawc Nov 30, 2017 •

edited

Loading

aawc commented Jan 24, 2018 •

edited

Loading

kcc commented Jan 24, 2018

inferno-chromium commented Jan 25, 2018

Setup simple fuzzing for unrar. #951

Setup simple fuzzing for unrar. #951

Conversation

aawc commented Nov 3, 2017 • edited Loading

inferno-chromium commented Nov 6, 2017

aawc commented Nov 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oliverchang Nov 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aawc Nov 10, 2017 • edited Loading

Choose a reason for hiding this comment

aawc commented Nov 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aawc Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inferno-chromium commented Nov 13, 2017

kcc commented Nov 15, 2017

aawc commented Nov 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dor1s Nov 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aawc Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

aawc Nov 30, 2017 • edited Loading

Choose a reason for hiding this comment

aawc commented Jan 24, 2018 • edited Loading

kcc commented Jan 24, 2018

inferno-chromium commented Jan 25, 2018

aawc commented Nov 3, 2017 •

edited

Loading

oliverchang Nov 10, 2017 •

edited

Loading

aawc Nov 10, 2017 •

edited

Loading

aawc Nov 13, 2017 •

edited

Loading

aawc commented Nov 15, 2017 •

edited

Loading

Dor1s Nov 16, 2017 •

edited

Loading

aawc Nov 29, 2017 •

edited

Loading

aawc Nov 30, 2017 •

edited

Loading

aawc commented Jan 24, 2018 •

edited

Loading