Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Splitting instrDesc for EVEX encoded instruction #87016

Closed
Ruihan-Yin opened this issue Jun 1, 2023 · 3 comments
Closed

[JIT] Splitting instrDesc for EVEX encoded instruction #87016

Ruihan-Yin opened this issue Jun 1, 2023 · 3 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@Ruihan-Yin
Copy link
Contributor

EVEX has introduced a group of extra bits in its encoding format:

  • EVEX.aaa: Embedded opmask register specifier
  • EVEX.Z: Zeroing/Merging
  • EVEX.b: Broadcast/RC/SAE Context
  • EVEX.L'L: Vector length/RC

dotnet#84821 has implemented the EVEX.b in the instrDecs data structure as an extra "small" constants. This implementation will introduce throughput regression due to the increased size for every instrDesc instance.

Given the fact that EVEX encoded instructions are rarely used at current stage, please consider splitting a new instrDecs for EVEX instructions alone, such that the non-EVEX instructions won't be affected by the introduction of these new bits in the instrDesc.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 1, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 1, 2023
@ghost
Copy link

ghost commented Jun 1, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

EVEX has introduced a group of extra bits in its encoding format:

  • EVEX.aaa: Embedded opmask register specifier
  • EVEX.Z: Zeroing/Merging
  • EVEX.b: Broadcast/RC/SAE Context
  • EVEX.L'L: Vector length/RC

dotnet#84821 has implemented the EVEX.b in the instrDecs data structure as an extra "small" constants. This implementation will introduce throughput regression due to the increased size for every instrDesc instance.

Given the fact that EVEX encoded instructions are rarely used at current stage, please consider splitting a new instrDecs for EVEX instructions alone, such that the non-EVEX instructions won't be affected by the introduction of these new bits in the instrDesc.

Author: Ruihan-Yin
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Jun 2, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jun 2, 2023
@tannergooding
Copy link
Member

Looking into this as part of #87097.

It looks like we may not need to split and that instead we may be able to use some already in use bits and tricks to make things more pay for play and avoid a throughput hit.

Notably, I've put up #87373 which optimizes all 8-bit constants (signed or unsigned) to fit into idSmallCns, this reduces the size of many SIMD nodes as they no longer need to track a "large constant" for [-128, -1].

We then have 4 existing defined bits that are only used for JMP and CALL descriptors:

    unsigned _idBound : 1;      // jump target / frame offset bound
    unsigned _idCallRegPtr : 1; // IL indirect calls: addr in reg
    unsigned _idCallAddr : 1;   // IL indirect calls: can make a direct call to iiaAddr
    unsigned _idNoGC : 1;       // Some helpers don't get recorded in GC tables

_idBound is only set/consumed for IF_LABEL/IF_*_LABEL which is already assumed to be instrDescJmp
_idCallRegPtr is only set/consumed for INS_call, INS_l_jmp, or INS_tail_i_jmp
_idCallAddr looks to be unused entirely
_idNoGC is only set/consumed for IF_METHOD/IF_METHPTR

Given this, we could move _idBound to instrDescJmp and we could remove the unused _idCallAddr giving us slightly more range for _idSmallCns, but it's not significantly beneficial to do so. For x64, this would take us from 11 to 13 bits, giving us [-2048, +2047] but that broadened range only covers less than 1% of encountered constants. Additionally, this then incurs a throughput hit since it requires doing additional checks to determine if we want a small instrDesc, instrDesc, or instrDescEvex; and correspondingly for instrDescDsp, instrDescAmd.

Instead, we can reuse these 4 bits to track additional EVEX state such as EVEX.b, EVEX.z, EVEX.L'L, and EVEX.aaa. The relevant methods that read those bits can add asserts to ensure that idIsBound is only called for the relevant instrDescJmp formats for example.

@tannergooding
Copy link
Member

Closing. We determined we don't need to split anything here and the current handling in #87373 is sufficient.

We now also can easily get metrics showing how constants are distributed in practice so future considerations are easier.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

3 participants