-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8359419: AArch64: Relax min vector length to 32-bit for short vectors #26057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back xgong! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@XiaohongGong The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
int size; | ||
switch(bt) { | ||
case T_BOOLEAN: | ||
// It needs to load/store a vector mask with only 2 elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// It needs to load/store a vector mask with only 2 elements | |
// Load/store a vector mask with only 2 elements |
Same with the other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your comment. I will fix them soon.
size = 2; | ||
break; | ||
default: | ||
// Limit the min vector length to 64-bit normally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Limit the min vector length to 64-bit normally. | |
// Limit the min vector length to 64-bit. |
case Op_MinReductionV: | ||
case Op_MaxReductionV: | ||
// Reductions with less than 8 bytes vector length are | ||
// not supported for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// not supported for now. | |
// not supported. |
Background
On AArch64, the minimum vector length supported is 64-bit for basic types, except for
byte
andboolean
(32-bit and 16-bit respectively to match special Vector API features). This limitation prevents intrinsification of vector type conversions betweenshort
and wider types (e.g.long/double
) in Vector API when the entire vector length is within 128 bits, resulting in degraded performance for such conversions.For example, type conversions between
ShortVector.SPECIES_128
andLongVector.SPECIES_128
are not supported on AArch64 NEON and SVE architectures with 128-bit max vector size. This occurs because the compiler would need to generate a vector with 2 short elements, resulting in a 32-bit vector size.To intrinsify such type conversion APIs, we need to relax the min vector length constraint from 64-bit to 32-bit for short vectors.
Impact Analysis
1. Vector types
Vectors only with
short
element types will be affected, as we just supported 32-bitshort
vectors in this change.2. Vector API
No impact on Vector API or the vector-specific nodes. The minimum vector shape at API level remains 64-bit. It's not possible to generate a final vector IR with 32-bit vector size. Type conversions may generate intermediate 32-bit vectors, but they will be resized or cast to vectors with at least 64-bit length.
3. Auto-vectorization
Enables vectorization of cases containing only 2
short
lanes, with significant performance improvements. Since we have supported 32-bit vectors forbyte
type for a long time, extending this toshort
did not introduce additional risks.4. Codegen of vector nodes
NEON doesn't support 32-bit SIMD instructions, so we use 64-bit instructions instead. For lanewise operations, this is safe because the higher half bits can be ignored.
Details:
min/max/mul/and
reductions. The min vector size for such operations should remain 64-bit. We've added assertions in match rules. Since it's currently not possible to generate such reductions (Vector API minimum is 64-bit, and SLP doesn't support subword type reductions), we maintain the status quo. However, adding an explicit vector size check inmatch_rule_supported_vector()
would be beneficial.Main changes:
T_BOOLEAN
: 2T_BYTE
/T_CHAR
: 4T_SHORT
: 2 (new supported)T_INT
/T_FLOAT
/T_LONG
/T_DOUBLE
: 2Vector[U]Cast
with 32-bit input or output vector size.VectorReinterpret
has already considered the 32-bit vector size cases.Test
Tested hotspot/jdk/langtools - all tests passed.
Performance
Following shows the performance improvement of relative VectorAPI JMHs on a NVIDIA Grace (128-bit SVE2) machine:
And here is the performance improvement of the added JMH on Grace:
We can observe the similar uplift on an AArch64 N1 (NEON) machine.
Progress
Integration blocker
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26057/head:pull/26057
$ git checkout pull/26057
Update a local copy of the PR:
$ git checkout pull/26057
$ git pull https://git.openjdk.org/jdk.git pull/26057/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26057
View PR using the GUI difftool:
$ git pr show -t 26057
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26057.diff
Using Webrev
Link to Webrev Comment