[native] Add a rust implementation of whitespace_parser #452

bgw · 2021-01-24T08:09:56Z

Summary

This adds an experimental new native extension, with the goal of improving performance, currently just reimplementing whitespace_parser.

The implementation is in Rust with PyO3.

This appears to make parsing marginally faster overall, but the performance gains are pretty limited because whitespace parsing is a small portion of the overall parsing process, and there's some added overhead of converting types (mostly unicode strings) between Rust and Python. I haven't done much benchmarking or any profiling yet.

Instead, this helps show what's needed to port part of the codebase over to rust in an example that's small enough to be digestible, but big enough to be more than a trivial toy.

I originally started by trying to port the tokenizer to Rust, since it takes a larger portion of time and it's infrequently modified (making it a good candidate), but found issues with the implementation direction I was taking, so I scrapped that for now.

Test Plan

python -m unittest  # unit test without libcst_native
cd libcst_native
cargo test --no-default-features
maturin develop
cd ..
python -m unittest  # unit test with libcst_native

Performance Numbers

Here's some very rough performance numbers (Debian 10 with Python 3.7). I've run this a few times, and libcst_native seems to be consistently faster by a very small margin.

With `libcst_native` (compiled with `--release`)

$ sudo -E perf stat ../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py > /dev/null

 Performance counter stats for '../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py':

          1,482.96 msec task-clock                #    1.000 CPUs utilized
                 5      context-switches          #    0.003 K/sec
                 0      cpu-migrations            #    0.000 K/sec
            14,336      page-faults               #    0.010 M/sec
     5,300,759,928      cycles                    #    3.574 GHz
     8,510,025,585      instructions              #    1.61  insn per cycle
     1,858,762,891      branches                  # 1253.410 M/sec
        33,023,490      branch-misses             #    1.78% of all branches

       1.483411214 seconds time elapsed

       1.443375000 seconds user
       0.039982000 seconds sys

Without `libcst_native`

$ sudo -E perf stat ../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py > /dev/null

 Performance counter stats for '../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py':

          1,511.82 msec task-clock                #    1.000 CPUs utilized
                 2      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
            14,602      page-faults               #    0.010 M/sec
     5,395,093,161      cycles                    #    3.569 GHz
     8,766,729,446      instructions              #    1.62  insn per cycle
     1,909,995,536      branches                  # 1263.378 M/sec
        34,058,541      branch-misses             #    1.78% of all branches

       1.512193387 seconds time elapsed

       1.492202000 seconds user
       0.020002000 seconds sys

Stuff I've Learned

Passing around Python objects in Rust (e.g. PyObject) is cheap, but converting PyStrings to Rust Strings (or vice-versa) can easily add up and destroy performance. I added a PyCached struct to reduce the number of conversions I was doing, but it'd be best to pick what stuff to port to libcst_native based on how much data needs to be converted between languages.
Python subclasses and Rust are an odd couple. PyO3 doesn't allow Rust classes to extend non-native Python classes. PyO3 uses __new__ for construction instead of __init__, and when subclassing a rust class in Python, this may mean that you have to use __new__ instead of __init__ too. There are ways around these problems, but it requires some cleverness.
This isn't much of a surprise, but there's no dataclasses module in Rust, so things that exposed dataclasses will no longer expose dataclasses. This may result in some small but technically breaking API changes.
- If we wanted to port dataclass-heavy code like CSTNode to Rust, we could define some Rust macros to reimplement most of the dataclasses ergonomics and behavior in rust.

Meta-Commentary

It's totally reasonable if people don't feel like this should be merged yet. It doesn't currently provide any useful performance advantage, and I can't guarantee that I'll ever get libcst_native this to a state where it provides a useful performance advantage. I'm just playing with this in my spare time.
If we actually started shipping libcst_native as anything more than an experimental feature, I expect that we'd want to get rid of the Python versions of the modules that libcst_native provides implementations for since maintaining two implementations seems insane.
- This would make installation more difficult for users who can't use pre-built binary python wheels, since those users would have to install a rust toolchain. However, most users would probably just use the pre-built binaries and not notice the difference.
If we start taking this seriously, I can find somebody in the internal "Rust at Facebook" group with the expertise to help review this.

This adds an experimental new native extension, with the goal of improving performance, currently just reimplementing `whitespace_parser`. This appears to make parsing marginally faster overall, but the performance gains are pretty limited because whitespace parsing is a small portion of the overall parsing process, and there's some added overhead of converting types (mostly unicode strings) between rust and python. Instead, this helps show what's needed to port part of the codebase over to rust in an example that's small enough to be digestible, but big enough to be more than a trivial toy. I originally started by trying to port the tokenizer to rust, since it takes a larger portion of time and it's infrequently modified (making it a good candidate), but found issues with the implementation direction I was taking, so I scrapped that for now.

zsol · 2021-01-27T10:13:05Z

Interesting! I wonder how this compares to a mypyc-compiled version of the parsing code. I might experiment with that, I had really good results when playing with mypyc in black.

I haven't done much benchmarking or any profiling yet.

Would be really curious about the bottlenecks and if they're easy to fix.

It doesn't currently provide any useful performance advantage, and I can't guarantee that I'll ever get libcst_native this to a state where it provides a useful performance advantage.

Yep, agreed. We can keep the experiment in this PR for now until it provides a significant enough perf advantage to outweigh the maintenance and convenience (to users) cost

bgw · 2021-03-14T08:42:58Z

Status update!

I've got a working Rust-based tokenizer here: bgw@ad0d900

It's based on CPython's C-based tokenizer (not the regex-heavy pure-python parso tokenizer), so it's a little more complicated, but it can also (at least in theory) generate better error messages. It should also handle a few very weird edge cases around formfeed characters (#446) and mixed tab/space indentation better.

Benchmark on master:

$ sudo -E perf stat ../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py > /dev/null

 Performance counter stats for '../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py':

          1,919.38 msec task-clock                #    0.971 CPUs utilized
             1,074      context-switches          #    0.560 K/sec
                 1      cpu-migrations            #    0.001 K/sec
            15,501      page-faults               #    0.008 M/sec
     6,664,692,678      cycles                    #    3.472 GHz
     8,909,801,642      instructions              #    1.34  insn per cycle
     1,942,674,148      branches                  # 1012.135 M/sec
        35,986,805      branch-misses             #    1.85% of all branches

       1.977175312 seconds time elapsed

       1.837757000 seconds user
       0.082640000 seconds sys

Benchmark on top of bgw@ad0d900:

sudo -E perf stat ../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py > /dev/null
[sudo] password for bgw:

 Performance counter stats for '../libcst-env/bin/python -m libcst.tool print libcst/_nodes/expression.py':

          1,634.40 msec task-clock                #    0.984 CPUs utilized
               343      context-switches          #    0.210 K/sec
                 0      cpu-migrations            #    0.000 K/sec
            15,426      page-faults               #    0.009 M/sec
     5,695,460,558      cycles                    #    3.485 GHz
     7,868,764,292      instructions              #    1.38  insn per cycle
     1,721,549,089      branches                  # 1053.325 M/sec
        27,874,190      branch-misses             #    1.62% of all branches

       1.661251179 seconds time elapsed

       1.580016000 seconds user
       0.055578000 seconds sys

We're getting to the point where there's actually some useful performance gain!

However, I think most of the possible performance gains are in porting the CSTNode types, so that's likely my next target. The CST type definitions involve a lot more code than the tokenizer or whitespace parser though, so it'll take even more time. We currently frequently copy and traverse over CSTNodes; I think we could do a lot better with codegen and a native implementation.

bgw · 2021-03-14T08:58:23Z

Interesting! I wonder how this compares to a mypyc-compiled version of the parsing code. I might experiment with that, I had really good results when playing with mypyc in black.

Porting all of LibCST to mypyc would be a difficult project. The tokenizer and whitespace parser have type annotations, but the LL(1) parser is nearly impossible to type annotate as it's currently designed. I suppose you could just use mypyc on the parts that are already typechecked.

The CST node types could be compile with mypyc, but the biggest problems with them (unnecessary object clones during traversal) are actually going to need some sort of codegen (or macro) system to solve, so it's not actually a very compiler/language-specific problem.

My real (selfish) motivation behind this PR is that I've fallen in love with Rust, and I want to write more Rust code. If I was still working on LibCST as part of my day job, I'd probably aim for much lower-hanging fruit than rewriting everything in Rust.

Would be really curious about the bottlenecks and if they're easy to fix.

When we looked at this before, CST node traversal was disproportionately awful. As stated above, we'd need some sort of codegen (or macro system, cough Rust proc-macros cough) to solve this in any sort of sane way.

stroxler · 2021-12-22T18:35:52Z

Should we close this?

I believe #566 built off of a version of this work, so it's effectively been merged already?

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 24, 2021

This was referenced Mar 10, 2021

Soft Keywords and How to Implement Them davidhalter/parso#138

Open

Improve handling of raw fstrings (#462) #466

Merged

bgw closed this Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[native] Add a rust implementation of whitespace_parser #452

[native] Add a rust implementation of whitespace_parser #452

bgw commented Jan 24, 2021

zsol commented Jan 27, 2021

bgw commented Mar 14, 2021 •

edited

Loading

bgw commented Mar 14, 2021

stroxler commented Dec 22, 2021

[native] Add a rust implementation of whitespace_parser #452

[native] Add a rust implementation of whitespace_parser #452

Conversation

bgw commented Jan 24, 2021

Summary

Test Plan

Performance Numbers

With libcst_native (compiled with --release)

Without libcst_native

Stuff I've Learned

Meta-Commentary

zsol commented Jan 27, 2021

bgw commented Mar 14, 2021 • edited Loading

bgw commented Mar 14, 2021

stroxler commented Dec 22, 2021

With `libcst_native` (compiled with `--release`)

Without `libcst_native`

bgw commented Mar 14, 2021 •

edited

Loading