Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function #69302

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kdjdbbfk
Copy link

@kdjdbbfk kdjdbbfk commented Sep 6, 2024

This commit enhances the performance of the MD5 functionality on ARM64 architecture by optimizing the ROUND3 function in the md5block_arm64.s assembly file.

  • Refactored the ROUND3 macro to improve the computation order, introducing a new ROUND3FIRST macro to handle the initial calculation more efficiently.
  • Optimized the XOR operations in the ROUND3 macro to reduce unnecessary instructions and improve parallelism within the ARM64 architecture.

Performance testing was conducted on an ARM64 Linux machine using Go's benchmark tool. The benchmarks were run 10 times each to ensure statistical significance. The following results were observed:

Benchmark Old Time (sec/op) New Time (sec/op) Change
Hash8Bytes-8 175.0ns ± 2% 175.0ns ± 1% ~
Hash1K-8 2.065µs ± 0% 2.060µs ± 0% -0.22%
Hash8K-8 15.31µs ± 0% 15.29µs ± 0% -0.11%
Hash8BytesUnaligned-8 174.0ns ± 1% 174.0ns ± 1% ~
Hash1KUnaligned-8 2.067µs ± 0% 2.059µs ± 0% -0.41%
Hash8KUnaligned-8 15.44µs ± 0% 15.45µs ± 0% ~
In terms of throughput:
Benchmark Old Throughput (B/s) New Throughput (B/s) Change
Hash8Bytes-8 43.58MiB/s ± 2% 43.69MiB/s ± 0% +0.24%
Hash1K-8 473.1MiB/s ± 0% 474.0MiB/s ± 0% +0.20%
Hash8K-8 510.4MiB/s ± 0% 511.0MiB/s ± 0% +0.11%
Hash8BytesUnaligned-8 43.80MiB/s ± 0% 43.82MiB/s ± 0% ~
Hash1KUnaligned-8 472.5MiB/s ± 0% 474.3MiB/s ± 0% +0.38%
Hash8KUnaligned-8 506.1MiB/s ± 0% 505.8MiB/s ± 0% ~

When testing with large files (e.g., a 3GB file), the runtime was reduced from 8.65 seconds to 7.39 seconds, resulting in an approximate 9% reduction in execution time. This demonstrates a more significant performance gain when handling larger datasets.

Overall, these optimizations provide modest improvements for small input sizes and more noticeable performance benefits when processing larger files, especially in memory-intensive workloads like file hashing.

@gopherbot
Copy link
Contributor

This PR (HEAD: 67f8686) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/611299.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/611299.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

This PR (HEAD: 85ec85f) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/611299.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

This commit enhances the performance of the MD5 functionality on ARM64 architecture by optimizing the ROUND3 function in the `md5block_arm64.s` assembly file.

1.Refactored the `ROUND3` macro to improve the computation order, introducing a new `ROUND3FIRST` macro to handle the initial calculation more efficiently.
2.Optimized the XOR operations in the `ROUND3` macro to reduce unnecessary instructions and improve parallelism within the ARM64 architecture.

Performance testing was conducted on an ARM64 Linux machine using Go's benchmark tool. The benchmarks were run 10 times each to ensure statistical significance. The following results were observed:

| Benchmark             | Old Time (sec/op) | New Time (sec/op) | Change |
|-----------------------|-------------------|-------------------|--------|
| Hash8Bytes-8          | 175.0ns  2%      | 175.0ns  1%      | ~      |
| Hash1K-8              | 2.065µs  0%      | 2.060µs  0%      | -0.22% |
| Hash8K-8              | 15.31µs  0%      | 15.29µs  0%      | -0.11% |
| Hash8BytesUnaligned-8 | 174.0ns  1%      | 174.0ns  1%      | ~      |
| Hash1KUnaligned-8     | 2.067µs  0%      | 2.059µs  0%      | -0.41% |
| Hash8KUnaligned-8     | 15.44µs  0%      | 15.45µs  0%      | ~      |

    In terms of throughput:

| Benchmark             | Old Throughput (B/s) | New Throughput (B/s) | Change |
|-----------------------|----------------------|----------------------|--------|
| Hash8Bytes-8          | 43.58MiB/s  2%      | 43.69MiB/s  0%      | +0.24% |
| Hash1K-8              | 473.1MiB/s  0%      | 474.0MiB/s  0%      | +0.20% |
| Hash8K-8              | 510.4MiB/s  0%      | 511.0MiB/s  0%      | +0.11% |
| Hash8BytesUnaligned-8 | 43.80MiB/s  0%      | 43.82MiB/s  0%      | ~      |
| Hash1KUnaligned-8     | 472.5MiB/s  0%      | 474.3MiB/s  0%      | +0.38% |
| Hash8KUnaligned-8     | 506.1MiB/s  0%      | 505.8MiB/s  0%      | ~      |

When testing with large files (e.g., a 3GB file), the runtime was reduced from 8.65 seconds to 7.39 seconds, resulting in an approximate 9% reduction in execution time. This demonstrates a more significant performance gain when handling larger datasets.

Overall, these optimizations provide modest improvements for small input sizes and more noticeable performance benefits when processing larger files, especially in memory-intensive workloads like file hashing.
@kdjdbbfk kdjdbbfk changed the title crypto/md5: Improve ARM64 MD5 performance by optimizing ROUND3 function crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function Sep 6, 2024
@gopherbot
Copy link
Contributor

Message from 赵静玉:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/611299.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

This PR (HEAD: 3149567) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/611299.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants