Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large control file does not seem to get deleted on error during merging #445

Closed
inferno-chromium opened this issue Mar 10, 2017 · 11 comments
Assignees

Comments

@inferno-chromium
Copy link
Collaborator

inferno-chromium commented Mar 10, 2017

We are seeing out of disk space errors in one case, and could be resulting from large control file (~4gb) not cleared after merge fails with global corpus. Kostya, can you check libFuzzer code to confirm that control file is cleared in all cases, even when oom occurs and exits out. If not, can you add the clear call.

{
insertId: "r7ea82f3a9kh6"
jsonPayload: {
error: {…}
created: "2017-03-09T21:24:11.138876Z"
extras: {…}
message: "Corpus pruning failed to merge global corpus: INFO: Seed: 3099641220
INFO: Loaded 1 modules (42932 guards): [0xe82e10, 0xeacce0),
INFO: -max_len is not provided, using 1048576
MERGE-OUTER: 320733 files, 1662 in the initial corpus
MERGE-OUTER: attempt 1
INFO: Seed: 3100510032
INFO: Loaded 1 modules (42932 guards): [0xe82e10, 0xeacce0),
INFO: -max_len is not provided, using 1048576
MERGE-INNER: using the control file '/tmp/libFuzzerTemp.1.txt'
MERGE-INNER: 320733 total files; 0 processed earlier; will process 320733 files now
#1 pulse cov: 2765 exec/s: 0 rss: 148Mb
#2 pulse cov: 2929 exec/s: 0 rss: 150Mb
#4 pulse cov: 3259 exec/s: 0 rss: 151Mb
#8 pulse cov: 4310 exec/s: 0 rss: 153Mb
#16 pulse cov: 4720 exec/s: 16 rss: 156Mb
#32 pulse cov: 5319 exec/s: 32 rss: 161Mb
#64 pulse cov: 5457 exec/s: 64 rss: 163Mb
#128 pulse cov: 5771 exec/s: 128 rss: 164Mb
#256 pulse cov: 5951 exec/s: 128 rss: 165Mb
#512 pulse cov: 6122 exec/s: 128 rss: 166Mb
#1024 pulse cov: 6257 exec/s: 128 rss: 166Mb
#2048 pulse cov: 6366 exec/s: 128 rss: 168Mb
#4096 pulse cov: 6366 exec/s: 99 rss: 168Mb
#8192 pulse cov: 6366 exec/s: 91 rss: 168Mb
#16384 pulse cov: 6366 exec/s: 82 rss: 170Mb
#32768 pulse cov: 6366 exec/s: 88 rss: 170Mb
#65536 pulse cov: 6366 exec/s: 84 rss: 176Mb
#131072 pulse cov: 6366 exec/s: 73 rss: 176Mb
#262144 pulse cov: 6366 exec/s: 71 rss: 176Mb
MERGE-OUTER: succesfull in 1 attempt(s)
MERGE-OUTER: the control file has 3798560768 bytes
==1== ERROR: libFuzzer: out-of-memory (used: 2052Mb; limit: 2048Mb)
To change the out-of-memory limit use -rss_limit_mb=

Live Heap Allocations: 1516586034 bytes in 806806 chunks; quarantined: 8889892 bytes in 5365 chunks; 211257 other chunks; total chunks: 1023428; showing top 95%
1376145888 byte(s) (90%) in 162011 allocation(s)
#0 0x5112c0 in operator new(unsigned long) asan_rtl
#1 0x55defc in __allocate /usr/local/include/c++/v1/new:227:10
#2 0x55defc in allocate /usr/local/include/c++/v1/memory:1771
#3 0x55defc in allocate /usr/local/include/c++/v1/memory:1526
#4 0x55defc in __split_buffer /usr/local/include/c++/v1/__split_buffer:311
#5 0x55defc in void std::__1::vector<unsigned int, std::__1::allocator >::__push_back_slow_path(unsigned int&&) /usr/local/include/c++/v1/vector:1570
#6 0x548aab in push_back /usr/local/include/c++/v1/vector:1611:9
#7 0x548aab in fuzzer::Merger::Parse(std::__1::basic_istream<char, std::__1::char_traits >&, bool) /src/libfuzzer/FuzzerMerge.cpp:100
#8 0x54efcd in ParseOrExit /src/libfuzzer/FuzzerMerge.cpp:30:8
#9 0x54efcd in fuzzer::Fuzzer::CrashResistantMerge(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&) /src/libfuzzer/FuzzerMerge.cpp:263
#10 0x51d783 in fuzzer::FuzzerDriver(int*, char***, int ()(unsigned char const, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:537:12
#11 0x516088 in main /src/libfuzzer/FuzzerMain.cpp:20:10
#12 0x7fd57d60582f in __libc_start_main

61085856 byte(s) (4%) in 320734 allocation(s)
#0 0x5112c0 in operator new(unsigned long) asan_rtl
#1 0x9dd79a in std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::__grow_by(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long)

42390320 byte(s) (2%) in 320735 allocation(s)
#0 0x5112c0 in operator new(unsigned long) asan_rtl
#1 0x9d801b in std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::basic_string(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&)

MS: 0 ; base unit: 0000000000000000000000000000000000000000

artifact_prefix='/bad_units/'; Test unit written to /bad_units/oom-da39a3ee5e6b4b0d3255bfef95601890afd80709
Base64:
SUMMARY: libFuzzer: out-of-memory
."

@kcc
Copy link
Contributor

kcc commented Mar 10, 2017

ACK, this is clearly mine. no promises for the nearest week :(

@Dor1s
Copy link
Contributor

Dor1s commented Mar 10, 2017

I've made an attempt to fix it via calling the final merge step in a separate process (similarly to CrashResistantMergeInternalStep): https://reviews.llvm.org/D30823

Not sure if you would be happy with this solution :) but I don't think that we can implement it in a single process anyway.

@inferno-chromium
Copy link
Collaborator Author

@kcc - no rush on this, just happens on 1-2 fuzzers, nothing blocking here. this is just a TODO :)

@kcc
Copy link
Contributor

kcc commented Mar 10, 2017

So there are actually two problems here:

  1. outer merge process taking too much RAM and causing failures.
  2. after such failure a file is not getting removed.

I should fix both, but I will probably attack 1 first.
I don't know a good way to implement 2 in portable C++ (and I don't want to have 3 versions for Linux/Mac/Windows).

@kcc
Copy link
Contributor

kcc commented Mar 11, 2017

@inferno-chromium may I ask you to give me one of those huge control files?
(no rush if you've killed all of them, just want to verify my assumptions)

@inferno-chromium
Copy link
Collaborator Author

Sorry they get recycled after every job, haven't saved them. I can give you the bot ssh instructions, that have the global corpus OR do you need just the control file ?

@kcc
Copy link
Contributor

kcc commented Mar 11, 2017

Just the control file, if you ever have a new one like this

jyknight pushed a commit to jyknight/llvm-monorepo-old1 that referenced this issue Mar 11, 2017
@kcc
Copy link
Contributor

kcc commented Mar 11, 2017

I've made a small improvement and added some logging.
I am reluctant to make bigger changes now (want to avoid extra complexity) -- let's see what crashes we get now. If possible, send me the control file next time this happens.

One more potential change is to not enforce max_rss during for the outer merge process and let it consume as much RAM as the VM has.

We can get a much bigger gain by changing the algorithm.
Right now it is 'read all data, then merge'.
Instead we can 'merge while reading the data'. But that's a bit harder to reason about.

jyknight pushed a commit to jyknight/llvm-monorepo-old1 that referenced this issue Mar 11, 2017
earl pushed a commit to earl/llvm-mirror that referenced this issue Mar 11, 2017
earl pushed a commit to earl/llvm-mirror that referenced this issue Mar 11, 2017
@inferno-chromium
Copy link
Collaborator Author

not happening anymore on the job where it originally happened - "corpus_pruning libFuzzer_openssl_server@201703082142 libfuzzer_asan_openssl". but we can check logs to see control file size growth (using stackdriver).

@kcc
Copy link
Contributor

kcc commented Mar 22, 2017

have anyone seen any more crashes like this?
The implementation is still not optimal in its RAM usage, but if it's good enough I'd suggest to close the bug for now.
If there are crashes, please give me the logs, and if possible, the control files.

@inferno-chromium
Copy link
Collaborator Author

I searched for string "the control file has " in last 2 days logs and I don't see it anymore. Also, i dont see any global corpus merge failures today.

Unrelated note, there are few that are stuck on initial corpus pruning step due to slow execution
Prune corpus for libFuzzer_libxml2_xml_regexp_compile_fuzzer@201703211614 libfuzzer_asan_libxml2 oss-fuzz-linux-zone1-0114 2017-03-21 19:31:03 2017-03-22 19:31:03
Prune corpus for libFuzzer_llvm_libcxxabi_cxa_demangle_fuzzer@201703211633 libfuzzer_asan_llvm_libcxxabi oss-fuzz-linux-zone1-0053 2017-03-21 19:12:36 2017-03-22 19:12:36
Prune corpus for libFuzzer_nss_mpi-add@201703211634 libfuzzer_asan_nss oss-fuzz-linux-zone1-0280 2017-03-21 19:05:18 2017-03-22 19:05:18
Prune corpus for libFuzzer_nss_mpi-addmod@201703211634 libfuzzer_asan_nss oss-fuzz-linux-zone1-0444 2017-03-21 19:29:11 2017-03-22 19:29:11
Prune corpus for libFuzzer_nss_mpi-invmod@201703211634 libfuzzer_asan_nss oss-fuzz-linux-zone1-0165 2017-03-21 19:13:09 2017-03-22 19:13:09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants