Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up extraction of global props, headlines, and title #1061

Merged
merged 4 commits into from
Aug 24, 2020

Conversation

kisaragi-hiu
Copy link
Contributor

Motivation for this change

Org-roam takes a long time to process files. In addition, big files (10000+ lines) can take several seconds to process after it is modified. This becomes unbearable for eg. habit (org-habit) tracking files.

This PR introduces some optimizations:

  • Replace unnecessary uses of org-element-parse-buffer
    • org-roam--extract-headlines: Use org-map-region and org-entry-get instead of org-element-parse-buffer.
    • org-roam--extract-titles-headline: Find the first headline with org-outline-regexp-bol instead of using org-element-parse-buffer.
  • If it's available, use org-collect-keywords which is a lot faster as it also avoids org-element-parse-buffer. (As it is a relatively new function in Org, added in April 2020, I have kept the previous implementation if it's not present.)

Performance comparison

vocab.org is an Org file with 14795 lines (1052 headings).

I'm running this on a Ryzen 1400 CPU.

(with-current-buffer "vocab.org"
  (list
   (list 'total 'gc-count 'gc-time)
   (benchmark-run 1
     (org-roam--extract-headlines))
   (benchmark-run 1
     (org-roam--extract-titles-title))
   (benchmark-run 1
     (org-roam--extract-titles-headline))
   (benchmark-run 1
     (org-roam--extract-titles-alias))))

Before:

total gc-count gc-time
1.771265479 1 0.7655653740000048
0.932756371 0 0.0
1.645399532 1 0.7050031620000254
1.701283592 1 0.742148186999998

After:

total gc-count gc-time
0.149613758 0 0.0
0.00223786 0 0.0
8.2539e-05 0 0.0
0.001961503 0 0.0

From 5 seconds down to 0.15 seconds.

- Utilize org-collect-keywords if it's available.
- Avoid org-element-parse-buffer if we can.
@jethrokuan
Copy link
Member

The changes LGTM.

It looks from the tests like the headline extraction now returns the candidates in reverse order, so the tests also need to be fixed here.

@kisaragi-hiu
Copy link
Contributor Author

It looks from the tests like the headline extraction now returns the candidates in reverse order, so the tests also need to be fixed here.

Ah, yes, because push adds to the front. I've made it so that the test uses :to-have-same-items-as to ignore the ordering of its items.

Copy link
Member

@jethrokuan jethrokuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, very nice 😄

@jethrokuan jethrokuan merged commit 82ac6b6 into org-roam:master Aug 24, 2020
@jethrokuan
Copy link
Member

Related to #834 #940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants