Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the speed of the Markdown package #474

Closed
Witiko opened this issue Aug 5, 2024 · 5 comments · Fixed by #482
Closed

Improve the speed of the Markdown package #474

Witiko opened this issue Aug 5, 2024 · 5 comments · Fixed by #482
Assignees
Labels
lua Related to the Lua interface and implementation speed Related to speed improvements
Milestone

Comments

@Witiko
Copy link
Owner

Witiko commented Aug 5, 2024

The current version of the Markdown package for TeX takes multiple seconds to initialize and process a markdown text:

$ docker run --rm -i witiko/markdown bash -c 'time markdown-cli <<< foo'
\markdownRendererDocumentBegin
foo\markdownRendererDocumentEnd

real	0m1.645s
user	0m1.430s
sys	0m0.215s

In a recent experiment, I processed a short text with historic versions of the Markdown package and I compared them with the current version of the Markdown package. The results show a more than 5× slow-down in version 3.4.3 of the Markdown package:

image

A PR that closes this ticket should take the following steps:

  1. Determine which of the eight PRs merged in version 3.4.3 caused the slow-down.
  2. Determine the exact cause of the slow-down and eliminate the slow-down.
  3. Test that processing a short text in the CI takes less than 1 second.
@Witiko Witiko added this to the 3.7.1 milestone Aug 5, 2024
@Witiko Witiko added lua Related to the Lua interface and implementation speed Related to speed improvements help wanted labels Aug 5, 2024
@Witiko
Copy link
Owner Author

Witiko commented Aug 5, 2024

In version 3.4.3, we upgraded to TeX Live 2024, which may be related to the slow-down. I will now run the same experiment using TeX Live 2022 for all versions of the Markdown package to control for this effect. With TeX Live, the potential sources of the slow-down would be the KPathSea library, which we use to locate external resources, and other Lua libraries that we use, which may have become slower in TeX Live 2024.

In commit efeaecb, I repeated the experiment using TeX Live 2022 for all versions of the Markdown package to control for this effect. The results show that the version of TeX Live is not a major factor (for markdown.lua) and version 3.4.3 still seems more than 5× slower than version 3.4.2.

@Witiko
Copy link
Owner Author

Witiko commented Aug 13, 2024

  1. Determine which of the eight PRs merged in version 3.4.3 caused the slow-down.

As discussed in #458 (comment), the issue is likely (also) with PRs #416 and #432, which started loading UnicodeData.txt and constructing a parser that recognizes all Unicode punctuation.

If this is the case, which is still to be determined, then pre-reading the file UnicodeData.txt in the CI and distributing a pre-compiled parser together with the rest of the Markdown package as a separate Lua file markdown-punctuation.lua would likely improve the speed and also make us independent on UnicodeData.txt. Furthermore, using a prefix tree to optimize the parser would further improve the speed and might close #458.

However, we may still wish to check if there is a more up-to-date version of UnicodeData.txt at runtime and, if there is, create a file markdown-punctuation.lua in the current working directory at runtime to override the outdated distribution file markdown-punctuation.lua.

@Witiko
Copy link
Owner Author

Witiko commented Aug 15, 2024

I continued the experiment to determine which of the eight PRs merged in version 3.4.3 caused the slow-down:

image

As assumed in #474 (comment), the more than 5× slow-down is caused by PRs #416, which started loading UnicodeData.txt and constructing a parser that recognizes all Unicode punctuation.

The solution is to use a prefix tree to optimize the parser, as described in #474 (comment). Precompiling the parser may bring a further improvement and help us close ticket #458 but will likely produce less improvement.

@Witiko
Copy link
Owner Author

Witiko commented Aug 21, 2024

In PR #482, the speed of the Markdown package has been significantly improved:

image

The speed improvement was achieved by using a prefix tree to construct a more efficient PEG parser of Unicode punctuation.

Many thanks to the contributor @Yggdrasil128 for their help with the fix!

@Witiko
Copy link
Owner Author

Witiko commented Aug 23, 2024

@Yggdrasil128: If you'd like, we have a Discord server and a space at Matrix.org. It can be faster to discuss the development of the Markdown package compare to GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lua Related to the Lua interface and implementation speed Related to speed improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants