Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3.Exists returns 404 even if module is present in S3 #1801

Closed
ngshiheng opened this issue Nov 18, 2022 · 1 comment · Fixed by #1802
Closed

s3.Exists returns 404 even if module is present in S3 #1801

ngshiheng opened this issue Nov 18, 2022 · 1 comment · Fixed by #1802

Comments

@ngshiheng
Copy link
Contributor

ngshiheng commented Nov 18, 2022

Describe the bug

The current s3.Exists method in checker.go may return a 404 even if the module cache is present in S3.:

lsParams := &s3.ListObjectsInput{
Bucket: aws.String(s.bucket),
Prefix: aws.String(fmt.Sprintf("%s/@v", module)),
}
loo, err := s.s3API.ListObjectsWithContext(ctx, lsParams)

Under the hood, ListObjectsWithContext (same as ListObjects) can only return up to 1000 objects; hence truncating the contents array

Using github.com/aws/aws-sdk-go/@v/v1.40.59.mod in the following examples.

If S3 were to contain most (if not all) of the existing versions of this module, it would easily amount up to > 1000 objects. As a result, if the module version that the user requested doesn't fall within the 1000 returned objects, it causes Athens to return 404.

Error Message

Here are the only 2 error messages emitted in our logs.

{"http-method":"GET","http-path":"/github.com/aws/aws-sdk-go/@v/v1.40.59.mod","kind":"Not Found","level":"info","module":"","msg":"Not Found","operation":"download.VersionModuleHandler","ops":["download.VersionModuleHandler","pool.GoMod","protocol.GoMod","s3.GoMod"],"request-id":"3270447c-39f8-47af-b74c-14dda9e4cc6f","time":"2022-11-18T06:57:32Z","version":""}

{"http-method":"GET","http-path":"/github.com/aws/aws-sdk-go/@v/v1.40.59.mod","http-status":"404","level":"info","msg":"incoming request","request-id":"3270447c-39f8-47af-b74c-14dda9e4cc6f","time":"2022-11-18T06:57:32Z"}

To Reproduce
Steps to reproduce the behavior:

  1. Fill Athens S3 disk storage with all releases of github.com/aws/aws-sdk-go
  2. curl https://goproxy.company.com/github.com/aws/aws-sdk-go/@v/v1.40.59.mod
  3. Athens will return 404. From the logs, you could see the error messages as shown above

Expected behavior

Athens to return response code 200 with the following:

module github.com/aws/aws-sdk-go

require (
	github.com/jmespath/go-jmespath v0.4.0
	github.com/pkg/errors v0.9.1
	golang.org/x/net v0.0.0-20210614182718-04defd469f4e
)

go 1.11

Environment (please complete the following information):

  • OS: Linux (Athens' Dockerfile)
  • Go version: 1.18
  • Proxy version : gomods/athens-dev:cc496af
  • Storage (fs/mongodb/s3 etc.) : S3

Additional context

Why this hasn't come up before?

The previous implementation of prefix string results in ListObjectsWithContext almost always returning <1000 objects due to its more precise filter:

# Previous prefix: 
{
  Bucket: "athens-cache",
  Prefix: "github.com/aws/aws-sdk-go/@v/v1.40.59."
}
# 3 objects


# Current prefix with this issue:

  Bucket: "athens-cache",
  Prefix: "github.com/aws/aws-sdk-go/@v"
}
# 1000 objects, redacted for brevity
@ngshiheng
Copy link
Contributor Author

ngshiheng commented Nov 18, 2022

From what I can deduce, we can either:

  1. Roll back Using directory as prefix for S3 #1720
  2. Update the current method to go over all the available pages with https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#S3.ListObjectsPagesWithContext) to look through all available objects in the S3 bucket.

However, for 2, currently I'm seeing 4000+ objects by using the prefix github.com/aws/aws-sdk-go/@v/ on my S3. Hence, I am not sure if iterating through all the available pages (objects) is good idea here because:

  • Slowness
  • Can generate multiple requests to a service

What do you guys think?

I'm happy to submit a PR (I also have access to AWS for testing). Let me know :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant