Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of commits in upload large folder #2546

Merged
merged 3 commits into from
Sep 17, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Sep 16, 2024

Should fix #2518 (or at least make it more reasonable).

This PR reduces the number of commits produced for upload-large-folder command. The main change is to make the commit size bigger, going from 25 files to either 75 regular files or 150 LFS files. Also, the rules have now been slightly modified to avoid commits with only 1 or 2 files at the beginning of the process.

This should greatly reduce the number of rate limits when using this tool while staying robust.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me! Thanks @Wauplin

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to dig more into this part of the code but it seems all good to me :)

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 17, 2024

Thanks for the reviews! 🤗

@Wauplin Wauplin merged commit a06e246 into main Sep 17, 2024
19 checks passed
@Wauplin Wauplin deleted the 2518-reduce-number-of-commits-in-upload-large-folder branch September 17, 2024 07:42
@FurkanGozukara
Copy link

@Wauplin thank you so much

i just made a test and uploading 20400 files

it went very smooth until 5700 files were submitted

image

then errors started to pop up - i hadn't had used any api quote today

image

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 19, 2024

Has the 504 gateway timeout started to be a blocker? When such an error happens, the commit is usually retried. I suspect that with the git repository getting bigger and bigger (5700 files to handle), the commit size starts to be too big to process under the X second before the timeout is triggered. What we could do is a dynamic threshold that gets lower when a 504 is triggered.

But at least it's a completely different problem than the rate-limit one!

@FurkanGozukara
Copy link

Has the 504 gateway timeout started to be a blocker? When such an error happens, the commit is usually retried. I suspect that with the git repository getting bigger and bigger (5700 files to handle), the commit size starts to be too big to process under the X second before the timeout is triggered. What we could do is a dynamic threshold that gets lower when a 504 is triggered.

But at least it's a completely different problem than the rate-limit one!

yes it was a huge blockler, it really made it way way slower

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 19, 2024

Ok thanks for the feedback 👍 We'll think about how to update this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

api.upload_large_folder consumes api request limit even when verifying which files were previously uploaded
4 participants