Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #1198: handle github ratelimit #1266

Merged
merged 8 commits into from
Oct 11, 2023

Conversation

ramonpetgrave64
Copy link
Contributor

Github has a requests-per-hour ratelimit that resets per hour.

Changes

  • when the remaining is lower than a threshold of 500, delays github graphql requests until the reset time.
  • changes the retry mode from constant to exponential backoff
  • change the page size of the repos query to 50 from 100. The github API would very frequently timeout, but is much better now.

Testing

  • added unit tests
  • manual testing

@achantavy achantavy changed the title github: handle ratelimit Fix #1198: handle github ratelimit Oct 11, 2023
@achantavy achantavy linked an issue Oct 11, 2023 that may be closed by this pull request
Check the remaining rate limit and sleep if remaining is below threshold
:param token: The Github API token as string.
'''
response = requests.get('https://api.github.com/rate_limit', headers={'Authorization': f"token {token}"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than add another call, would it be possible to use the rateLimit object in the graphql response object itself?

rateLimit {
limit
cost
remaining
resetAt
}

This would let us save an API call for each page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I tried originally, but if the graphql ratelimit is already fully consumed, then the query would not return a response. Calling the rest endpoint consumes a different ratelimit, core I believe. Test data shows that the limit for core is 60, but it's actually >5k for authenticated requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From conversation we had offline, if we start the sync without knowing that the remaining quota is already 0, then the graphql response won't even return and the code will crash.

Co-authored-by: Alex Chantavy <achantavy@lyft.com>
@ramonpetgrave64 ramonpetgrave64 merged commit 5afa5a9 into master Oct 11, 2023
5 checks passed
@ramonpetgrave64 ramonpetgrave64 deleted the ramonpetgrave64-githubratelimit branch October 11, 2023 19:50
chandanchowdhury pushed a commit to juju4/cartography that referenced this pull request Jun 26, 2024
Github has a requests-per-hour ratelimit that resets per hour.

### Changes

- when the `remaining` is lower than a threshold of `500`, delays github
graphql requests until the `reset` time.
- changes the retry mode from constant to exponential backoff
- change the page size of the repos query to `50` from `100`. The github
API would very frequently timeout, but is much better now.

### Testing

- added unit tests
- manual testing

---------

Co-authored-by: Alex Chantavy <achantavy@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Github: ratelimit handling
3 participants