Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using std::simd to speed-up unfilter for Paeth for bpp=3 and bpp=6 #414

Merged
merged 5 commits into from
Nov 2, 2023

Commits on Sep 25, 2023

  1. Using std::simd to speed-up unfilter for Paeth / Three bpp.

    Results of running microbenchmarks on author's machine:
    
    ```
    $ bench --bench=unfilter --features=benchmarks,unstable -- --baseline=my_baseline filter=Paeth/bpp=3
    ...
    unfilter/filter=Paeth/bpp=3
                            time:   [21.337 µs 21.379 µs 21.429 µs]
                            thrpt:  [546.86 MiB/s 548.14 MiB/s 549.22 MiB/s]
                     change:
                            time:   [-42.023% -41.825% -41.619%] (p = 0.00 < 0.05)
                            thrpt:  [+71.288% +71.895% +72.482%]
                            Performance has improved.
    ```
    anforowicz committed Sep 25, 2023
    Configuration menu
    Copy the full SHA
    2c764e5 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2023

  1. Extending std::simd coverage to Paeth / Six bpp.

    Results of running microbenchmarks on author's machine:
    
    ```
    $ bench --bench=unfilter --features=unstable,benchmarks -- --baseline=my_baseline Paeth/bpp=6
    ...
    unfilter/filter=Paeth/bpp=6
                            time:   [22.346 µs 22.356 µs 22.367 µs]
                            thrpt:  [1.0233 GiB/s 1.0238 GiB/s 1.0242 GiB/s]
                     change:
                            time:   [-24.033% -23.941% -23.852%] (p = 0.00 < 0.05)
                            thrpt:  [+31.323% +31.476% +31.637%]
                            Performance has improved.
    ```
    anforowicz committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    2fca2aa View commit details
    Browse the repository at this point in the history
  2. Extract a separate struct PaethState and fn paeth_step.

    This refactoring is desirable because:
    
    * It removes a little bit of duplication between `unfilter_paeth3` and
      `unfilter_paeth6`
    * It helps in a follow-up CL, where we need to use `paeth_step` from
      more places.
    anforowicz committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    63222f6 View commit details
    Browse the repository at this point in the history
  3. simd::unfilter_paethN: Load 4 (or 8) bytes at a time (faster than 3…

    … or 6).
    
    This CL loads RGB data using 4-bytes-wide loads (and RRGGBB data using
    8-byte-wide loads), because:
    
    * This is faster as measured by the microbenchmarks below
    * It doesn't change the behavior - before and after these changes we
      were ignoring the 4th SIMD lane when processing RGB data (after this
      change the 4th SIMD lane will contain data from the next pixel, before
      this change it contained a 0 value)
    * This is safe as long as we have more than 4 bytes of remaining input
      data (we have to fall back to a 3-bytes-wide load for the last pixel).
    
    Results of running microbenchmarks on the author's machine:
    
    ```
    $ bench --bench=unfilter --features=unstable,benchmarks -- --baseline=simd1 Paeth/bpp=[36]
    ...
    unfilter/filter=Paeth/bpp=3
                            time:   [18.755 µs 18.761 µs 18.767 µs]
                            thrpt:  [624.44 MiB/s 624.65 MiB/s 624.83 MiB/s]
                     change:
                            time:   [-16.148% -15.964% -15.751%] (p = 0.00 < 0.05)
                            thrpt:  [+18.696% +18.997% +19.258%]
                            Performance has improved.
    ...
    unfilter/filter=Paeth/bpp=6
                            time:   [18.991 µs 19.000 µs 19.009 µs]
                            thrpt:  [1.2041 GiB/s 1.2047 GiB/s 1.2052 GiB/s]
                     change:
                            time:   [-15.161% -15.074% -14.987%] (p = 0.00 < 0.05)
                            thrpt:  [+17.629% +17.750% +17.871%]
                            Performance has improved.
    ```
    anforowicz committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    22295a5 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2023

  1. Configuration menu
    Copy the full SHA
    c9a7327 View commit details
    Browse the repository at this point in the history