Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trufflehog hangs while scanning archived files #884

Closed
ankushgoel27 opened this issue Oct 30, 2022 · 13 comments
Closed

Trufflehog hangs while scanning archived files #884

ankushgoel27 opened this issue Oct 30, 2022 · 13 comments
Labels

Comments

@ankushgoel27
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

TruffleHog Version

3.16.1

Trace Output

hangs with
RAC[0002] Remaining buffer capacity: 43586
TRAC[0002] Handling extracted file. filename=wipefs
TRAC[0002] Remaining buffer capacity: 8130
DEBU[0002] Max archive size reached.
DEBU[0002] Error unarchiving chunk. error="archive/tar: invalid tar header"

Remaining

Expected Behavior

Should cleanly exit

Actual Behavior

hangs foreever

Steps to Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Environment

kali linux

Additional Context

References

  • #0000
@vbrinza
Copy link

vbrinza commented Oct 31, 2022

trufflehog 3.16.1

Looks like I have the same issue:

TRAC[0003] scanning file file_path=/Users/some_path/charts/kube-prometheus-stack-39.11.0.tgz
TRAC[0003] Handling extracted file. filename=Chart.yaml
TRAC[0003] Remaining buffer capacity: 20971520
...
TRAC[0003] Handling extracted file. filename=port-values.yaml
TRAC[0003] Remaining buffer capacity: 16883085

and here it hangs.

@timmessing-vivun
Copy link

I am experiencing this issue as well

@hlein
Copy link

hlein commented Nov 6, 2022

Same thing happened here using trufflehog-3.16.1.

I downgraded to trufflehog-3.10.3 (which I had handy) and it ran through the same set of (unfortunately proprietary) images just fine.

I'll start narrowing down the version and reproducability...

@hlein
Copy link

hlein commented Nov 6, 2022

I'll start narrowing down the version and reproducability...

Confirmed that the first version I see this with is 3.16.0, and it persists through 3.16.3.

Nothing in the changelog description between 3.15.1 and 3.16.0jumps out at me but I didn't look too closely yet. @dustin-decker @mac2000 any guesses?

Will try to bisect the changes next.

@mac2000
Copy link
Contributor

mac2000 commented Nov 8, 2022

I also was able to reproduce this one but was not able to figure out what’s wrong

Even so I believe than one more check shouldn’t do it, but indeed, you are correct my changes were in last release so I did tried to comment out sql server detector but nothing changed, so not sure if it may be a root cause, even more I reproducing this locally while commenting absolutely all detectors

From what I understand problem is somewhere in channels, archive module seems to be working but extracted file does not being passed further via pipeline

At moment it seems that for now workaround might be to add archive files to exclusions (but it works only for git check)

@mac2000
Copy link
Contributor

mac2000 commented Nov 8, 2022

BTW very good idea about bisect 👍 (I do not remember how to do it so did it semi manually)

from my experiments the last commit when everything was fine is:

034ca4f Add bytes counter to scans (#876)

and after next commit problem occur:

ab71b93 Add context to handler (#877)

@bill-rich if I understand everything right you have the most context around #877 and it seems that after that change issue occur (at least from my experiments but I might be wrong)

will be so nice if you have a look at the problem

@mac2000
Copy link
Contributor

mac2000 commented Nov 8, 2022

I confirm that if I’d change handlers.go back to original version with range everything starts work

but have no idea of what’s need to be done to this select to start working 🤷‍♂️

We need help of experienced go land engineers here

@hlein
Copy link

hlein commented Nov 8, 2022

Thanks for digging in! I got distracted.

I know nothing about debugging Golang code, so by the time-honored method of sprinkling printfs and running strace, I can see that for me, it consistently hangs inside the for...select loop in pkg/handlers/handlers.go:

diff -urP trufflehog-3.16.3.orig/pkg/handlers/handlers.go trufflehog-3.16.3/pkg/handlers/handlers.go
--- trufflehog-3.16.3.orig/pkg/handlers/handlers.go     2022-11-01 18:27:24.000000000 -0600
+++ trufflehog-3.16.3/pkg/handlers/handlers.go  2022-11-08 14:56:37.180789461 -0700
@@ -1,6 +1,8 @@
 package handlers
 
 import (
+       "fmt"
+       "os"
        "context"
        "io"
 
@@ -31,13 +33,17 @@
                for {
                        select {
                        case data := <-handlerChan:
+                               fmt.Fprintf(os.Stderr, "handlers.go::HandleFile::for::select data ready\n")
                                chunk := *chunkSkel
                                chunk.Data = data
                                chunksChan <- &chunk
+                               fmt.Fprintf(os.Stderr, "handlers.go::HandleFile::for::select data chunk read\n")
                        case <-ctx.Done():
+                               fmt.Fprintf(os.Stderr, "handlers.go::HandleFile::for::select reached ctx.Done\n")
                                return false
                        }
                        if handlerChan == nil {
+                               fmt.Fprintf(os.Stderr, "handlers.go::HandleFile::for reached handlerChan == nil\n")
                                break
                        }
                }

It'll get stuck with a bunch of threads endlessly reporting ready, read; ready, read; ready, read forever. Neither the ctx.Done -> return false nor the nil -> break are ever reached.

According to strace, basically nothing is happening between those loop iterations except some futex and nanosleep calls; I suspect those are just synchronizations between threads.

According to /proc/$pid/fdinfo/, when this happens it has gotten to the last byte of a .gz file (happens to be a manpage - nothing exciting). It never moves on.

@hlein
Copy link

hlein commented Nov 8, 2022

While digging in I checked the latest commits and I think the refactoring that occurred in ab54ec4 had a side effect of fixing the problem.

When I try using the tip of master (02ed33d) I can no longer reproduce the problem. Pls try that @mac2000 @timmessing-vivun @vbrinza @ankushgoel27

@mac2000
Copy link
Contributor

mac2000 commented Nov 9, 2022

confirming (I'm on ecd2578) everything works 🎉

@mcastorina 💪 thanks

@ankushgoel27
Copy link
Contributor Author

yes, it works

@mcastorina
Copy link
Collaborator

Awesome investigative work in this thread, thanks everyone!

@dustin-decker
Copy link
Contributor

Thanks everyone! Looks like that fix made it into v3.16.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants