Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add directory paging for rendering code trees #16432

Closed

Conversation

zeripath
Copy link
Contributor

This PR simply adds the ability to make directory rendering be paged.

Coupled with the previous improvements to last commit info generation
this should speed up directory rendering for large repositories.

Signed-off-by: Andrew Thornton art27@cantab.net

This PR simply adds the ability to make directory rendering be paged.

Coupled with the previous improvements to last commit info generation
this should speed up directory rendering for large repositories.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath added type/feature Completely new functionality. Can only be merged if feature freeze is not active. performance/bigrepo Performance Issues affecting Big Repositories labels Jul 14, 2021
@zeripath zeripath mentioned this pull request Jul 14, 2021
@codecov-commenter

This comment has been minimized.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jul 14, 2021
@lafriks lafriks added this to the 1.16.0 milestone Jul 15, 2021
@lafriks
Copy link
Member

lafriks commented Jul 26, 2021

Imho we should reconsider take on this, we shoud make entry list commit info as async instead of paging, or at least make paging larger like at least 100 + async commit info

@zeripath
Copy link
Contributor Author

zeripath commented Jul 26, 2021

Imho we should reconsider take on this, we shoud make entry list commit info as async instead of paging, or at least make paging larger like at least 100 + async commit info

You mean #16467?


The directories need to paged anyway. Consider this tree:

https://cgit.freebsd.org/ports/tree/devel

It's insane and even #16467 is slightly pathological for it because the javascript looking for the missing commits is quite slow due to the insane number of items it needs to generate commit information for.

50 will make paging even for gitea repo root directory listing...

Is that a bad idea? 63 items is kinda long to scroll past to get to the README.


I chose 50 because there is a genuine performance improvement below 70 items and the UI doesn't feel too restrictive. Generating last commit info for more than 70 means walking and generating all the last commit information (until you get below 70 items) because git's internal algorithm for shortcutting commits starts to fail at that point. That means that on large repositories every page would get the same initial delay.

(The 70 number comes from:

if len(paths) < 70 {

Feel free to play with that number there and a copy of git.git and ports.git. )

If you really want this to be 100 instead of 50 then to get a performance improvement from it we're going to have to change the algorithm to cache everything it sees up to the point it can start shortcutting. Otherwise this would actually represent an increase in load and likely a worsening of performance.


Changing the algorithm to cache everything it is seeing up to the point that it can start shortcutting whilst possible would require passing in two sets of information in to WalkGitLog.

func WalkGitLog(ctx context.Context, repo *Repository, head *Commit, treepath string, paths ...string) (map[string]string, error) {

The most performant solution would be to:

  • As per Defer Last Commit Info #16467 we'd need to pass in the cache into this function.
  • IFF we have a lastcommitcache & the number of interested paths > 70
  • Either: pass in all the paths in the treepath and a []bool slice marking them as interest with a total count OR: Generate these in WalkGitLog. (Clearly the second is quite inefficient but it's keeping the interface a bit cleaner.) Generating the data itself would be done around:
    if len(paths) == 0 {
  • Create allPath2idx, a allMaxpathlen and a allIdx2idx []int:
    path2idx := map[string]int{}
    maxpathlen := len(treepath)
  • Create allChanged []bool:
    changed := make([]bool, len(paths))
  • As we walk the commits we'd have to pass in the full path2idx and full changed - then map those on the initial map - likely we'd have to pass allPath2idx allIdx2Idx and allChanged
    current, err := g.Next(treepath, path2idx, changed, maxpathlen)

    caching results as we find them (as per Defer Last Commit Info #16467) and add additional cleaning check for original path2idx etc here:
    idx, ok := paths2ids[string(fnameBuf)]
    if !ok {
    fnameBuf = fnameBuf[:cap(fnameBuf)]
    continue diffloop
    }
    if ret.Paths == nil {
    ret.Paths = changed
    }
    changed[idx] = true
  • Then once we pass the check in
    if remaining <= nextRestart {
    and can restart the log walker we can just replace the allPath2Idx set with the remaining of interest set and stop the double checking.

routers/web/repo/view.go Outdated Show resolved Hide resolved
@6543
Copy link
Member

6543 commented Jul 26, 2021

I like the idear but it does not speed up the view - somehow each page does refill commit cache on first hit

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor Author

I like the idear but it does not speed up the view - somehow each page does refill commit cache on first hit

The circumstances in which this will definitely improve last commit cache performance would be dependent on a few factors.

The best improvement would be on pages that do not contain the pathological cases for the caching algorithm, that is long unchanged tree entries. If these are spread throughout the pages this PR won't help and if anything will make it worse.

somehow each page does refill commit cache on first hit

? I don't understand what you mean. If it's what I think you're implying that kind of thing would need the comments in #16432 (comment) implemented.


#16467 is the real significant performance improvement - not this.

@lafriks
Copy link
Member

lafriks commented Jul 26, 2021

Yes I meant that PR but had no time to look at it in details yet. 50 is not that much imho and while it helps performance it hurts usability. I don't think we need to sacrifice usability for large number of repositories to gain performance on very small number of extreme cases. We need to balance it carefully.

@6543
Copy link
Member

6543 commented Aug 12, 2021

I like the idear but it does not speed up the view - somehow each page does refill commit cache on first hit

solved?

@delvh
Copy link
Member

delvh commented Oct 16, 2021

Yes I meant that PR but had no time to look at it in details yet. 50 is not that much imho and while it helps performance it hurts usability. I don't think we need to sacrifice usability for large number of repositories to gain performance on very small number of extreme cases. We need to balance it carefully.

50 will make paging even for gitea repo root directory listing...

Well, let me put it this way: Could it be that Gitea is a large repository?
On top of that, the files inside gitea that would not be shown would be below the README and under normal circumstances, those are the really unimportant files:
Currently, these would be main.go, package-lock.json, package.json, and webpack.config.js. Those are not the files you typically look at using the web UI.

In the end, it comes down to how "aesthetic" we want to be:
Of course, we could use the 50 because it is a round number.
But we could just as well use 60, 65 or even 69 to still get the improved performance.
Given that there is no count and users would have to count it themselves, I would almost prefer 69 as it is the maximum number to still get the performance boost. And with 69, even Gitea, which definitely counts as a large repo, would still have lots of slots to fill before pagination will be needed.
I have to agree with @zeripath: If you have to scroll above 70 items, it is also not user-friendly.

There is only one potential problem I can foresee: What do you do if the README is below that limit? Will it simply not be displayed on the initial page, or what will happen?

Copy link
Member

@delvh delvh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding my prior comment:
My fear towards the README not being displayed seems not to be a problem.

@@ -1011,6 +1011,9 @@ PATH =
;; Number of issues that are displayed on one page
;ISSUE_PAGING_NUM = 10
;;
;; Number of entries are displayed on code rendering page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;; Number of entries are displayed on code rendering page
;; Number of entries displayed on file tree pages
;; Big enough repositories will load significantly slower if the value is >= 70

@@ -169,6 +169,7 @@ The following configuration set `Content-Type: application/vnd.android.package-a
- `EXPLORE_PAGING_NUM`: **20**: Number of repositories that are shown in one explore page.
- `ISSUE_PAGING_NUM`: **10**: Number of issues that are shown in one page (for all pages that list issues).
- `MEMBERS_PAGING_NUM`: **20**: Number of members that are shown in organization members.
- `DIRECTORY_PAGING_NUM`: **50**: Number of directory entries that are shown in one page.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `DIRECTORY_PAGING_NUM`: **50**: Number of directory entries that are shown in one page.
- `DIRECTORY_PAGING_NUM`: **50**: Number of directory entries that are shown in one page. Setting the value >= 70 will slow down loading big enough repositories significantly.

pageSize := setting.UI.DirectoryPagingNum

if pageSize > 1 {
pager := context.NewPagination(len(entries), pageSize, page, 5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the 5?
What does that do?
Is that perhaps the number of page literals to show?

@delvh
Copy link
Member

delvh commented Oct 16, 2021

Oh, one thing is unclear to me:
You said that values >= 70 affect performance negatively.
Does that only apply to big enough repos, or does it apply to every repo?

@6543
Copy link
Member

6543 commented Oct 16, 2021

@zeripath can you resolve conflict :)

@lunny lunny modified the milestones: 1.16.0, 1.17.0 Nov 15, 2021
@stale
Copy link

stale bot commented May 2, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions.

@stale stale bot added the issue/stale label May 2, 2022
@lunny lunny added the issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented label May 2, 2022
@stale stale bot removed the issue/stale label May 2, 2022
@lunny lunny modified the milestones: 1.17.0, 1.18.0 Jun 4, 2022
@lunny lunny modified the milestones: 1.18.0, 1.19.0 Oct 20, 2022
@jolheiser jolheiser modified the milestones: 1.19.0, 1.20.0 Jan 31, 2023
@zeripath zeripath closed this May 7, 2023
@GiteaBot GiteaBot removed this from the 1.20.0 milestone May 7, 2023
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Aug 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. performance/bigrepo Performance Issues affecting Big Repositories type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants