Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the speed of parsing markdown input five times #482

Merged
merged 6 commits into from
Aug 19, 2024

Conversation

Witiko
Copy link
Owner

@Witiko Witiko commented Aug 15, 2024

This PR makes the following changes:

  • Construct faster parsers.punctuation using prefix trees.

This change will close #474.

Following a successful CI run, this PR should repeat the experiment from #474 to see if this has sufficiently improved speed the speed of the Markdown package. If so, then we should also make the following changes:

  • Test that the markdown-cli command finishes under 1 second in the CI.

If not, this PR should continue by making the following changes:

  • Precompile parsers.punctuation to a separate Lua file.
  • Remove dependency on UnicodeData.txt.

These changes will close #458 and continue #402.

@Witiko Witiko added lua Related to the Lua interface and implementation speed Related to speed improvements labels Aug 15, 2024
@Witiko Witiko added this to the 3.7.0 milestone Aug 15, 2024
@Witiko Witiko self-assigned this Aug 15, 2024
@Yggdrasil128
Copy link

I took a closer look at the depth_first_search function, and while I can't pinpoint exactly where it goes wrong, I wrote a simpler, recursive alternative that seems to work for me.

local function depth_first_search(node, path, visit, leave)
  visit(node, path)
  for label, child in pairs(node) do
    if type(child) == "table" then
      depth_first_search(child, path .. label, visit, leave)
    else
      visit(child, path)
      leave(child, path)
    end
  end
  leave(node, path)
end

It needs to be called with an empty string as its second parameter:
depth_first_search(prefix_tree, "", function ...

Since I can't edit this PR directly, feel free to commit this change, and let's see what the CI tests have to say about it.

@Witiko
Copy link
Owner Author

Witiko commented Aug 16, 2024

Thanks, simpler is better. I will test out your code on Monday at the latest. Perhaps we can get rid of the leave() call for leaf nodes as well, since we don't need it.

@Witiko
Copy link
Owner Author

Witiko commented Aug 16, 2024

Since I can't edit this PR directly, feel free to commit this change, and let's see what the CI tests have to say about it.

As far as I know, you can create your own fork of this repository and open a second-order PR that would merge into the branch feat/improve-speed from this PR.

However, you don't need to: With your permission, I will use your code from #482 (comment) and commit it with a by-line that states that it is your contribution. This way, GitHub will register the commit as your contribution as well.

Witiko added a commit that referenced this pull request Aug 19, 2024
As discussed in
<#482 (comment)>
and below.

Co-authored-by: Tim Taubitz <taubitztim@gmail.com>
markdown.dtx Outdated Show resolved Hide resolved
As discussed in
<#482 (comment)>
and below.

Co-authored-by: Tim Taubitz <taubitztim@gmail.com>
@Witiko Witiko marked this pull request as ready for review August 19, 2024 12:56
@Witiko
Copy link
Owner Author

Witiko commented Aug 19, 2024

Following a successful CI run, this PR should repeat the experiment from #474 to see if this has sufficiently improved speed the speed of the Markdown package. If so, then we should also make the following changes:

  • Test that the markdown-cli command finishes under 1 second in the CI.

I verified on my Dell G5 15 notebook that after commit 6eff522 from this PR, the markdown-cli command finishes under 1 second:

$ docker run --rm -i ghcr.io/witiko/markdown:3.7.0-dev.1-7-g6eff5224-latest-minimal-no_docs bash -c 'time markdown-cli <<< foo'
\markdownRendererDocumentBegin
foo\markdownRendererDocumentEnd

real	0m0.258s
user	0m0.204s
sys	0m0.053s

In commit 3e8fe8a, I added a test that the markdown-cli command finishes under 1 second to the CI, so that we can catch similar speed regressions in the future.

If not, this PR should continue by making the following changes:

  • Precompile parsers.punctuation to a separate Lua file.
  • Remove dependency on UnicodeData.txt.

These changes will close #458 and continue #402.

Since updating the parser has been sufficient for fixing the speed issue, I postponed the removal of the dependency on UnicodeData.txt to the next release, see also ticket #486.

@Witiko Witiko changed the title Improve the speed of the Markdown package Improve the speed of parsing markdown input 5 times Aug 19, 2024
@Witiko Witiko merged commit 7df42d2 into main Aug 19, 2024
12 checks passed
@Witiko Witiko deleted the feat/improve-speed branch August 19, 2024 13:59
@Witiko Witiko changed the title Improve the speed of parsing markdown input 5 times Improve the speed of parsing markdown input five times Aug 19, 2024
Witiko added a commit to istqborg/istqb_product_base that referenced this pull request Aug 19, 2024
This PR applies the significant speed improvements from
Witiko/markdown#482.
Witiko added a commit to istqborg/istqb_product_base that referenced this pull request Aug 19, 2024
This PR applies the speed improvements from
<Witiko/markdown#482>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lua Related to the Lua interface and implementation speed Related to speed improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compile error with fresh MiKTeX installation Improve the speed of the Markdown package
2 participants