VM: Better strategy for TryReserveInitialMemory on arm64 (jump stubs) #70707

EgorBo · 2022-06-14T01:41:31Z

In #63842 we agreed that 128mb size is too small for the initial memory and might not fit typical real world apps. In this attempt I managed to get a good rate of successful attempts to reserve 500-700mb close to coreclr locally (~50% of test runs).

Also, I introduced a Release-internal flag to disable that behavior in order to get more stable results in microbenchmarks.

@jkotas @janvorli

src/coreclr/pal/src/map/virtual.cpp

jkotas · 2022-06-14T04:16:01Z

src/coreclr/pal/src/map/virtual.cpp

-    const int32_t MemoryProbingIncrement = 128 * 1024 * 1024;
+
+    // If we manage to reserve the initial memory close to coreclr we might get a better performance
+    // but it's better to turn it off when we run benchmarks for more stable results (always reserve far from coreclr)


It does not sound right for the benchmarks to measure something else than what customers see.

The problem that currently on ARM we have quite shaky results and it's difficult to measure improvements from various changes, e.g. note this "ampere" line compared to x64 windows and Linux

I'll experiment on crank what my flags do to a series of measurements and how big RPS is when we're lucky to reserve a piece next to coreclr (and how often) - according to my previous measurements it's 5% larger in that case

and in fact, in like 50% cases apps won't be able to reserve such a big chunk next to coreclr so that flag simulates, probably, the most common case.

EgorBo · 2022-06-14T12:49:52Z

So here is what happens on our ampere machine (linux-arm64) we use for TE benchmarks:

app is started, we know the module base address:

coreclrLoadAddress = (UINT_PTR)PAL_GetSymbolModuleBase((void*)VirtualAlloc);

then we append "guessed" CoreClrLibrarySize size to it (100mb) assuming we try to allocate above the location of libcoreclr:

preferredStartAddress = coreclrLoadAddress + CoreClrLibrarySize;

then in a loop we try to reserve 2Gb of memory from preferredStartAddress by appending 128Mb to it every iteration (and decreasing "desired" allocation size by the same number):

do
{
    m_startAddress = ReserveVirtualMemory(pthrCurrent, (void*)preferredStartAddress, sizeOfAllocation);
    if (m_startAddress != nullptr)
    {
        break;
    }

    // Try to allocate a smaller region (and futher)
    sizeOfAllocation -= 128 * 1024 * 1024;
    preferredStartAddress += 128 * 1024 * 1024;
} while (sizeOfAllocation >= MemoryProbingIncrement);

// Fallback:
if (m_startAddress == nullptr)
{
    // We were not able to reserve any memory near libcoreclr. Try to reserve approximately 2 GB of address space somewhere
    // anyway...

it means that after just one of such iteration we already out of RELOC's reach, futhermore 100Mb for "guessed" coreclr size is too much, 16Mb works just fine for me locally.

Next problem that it seems to be very difficult to find spare 2Gb of memory nearby - I was never able to do so on my Apple M1 but it seems to be possible on our Ampere linux-arm64 but only after ~3 iterations of appending 128mb to the base address. For my Apple M1 I was able with some low probability to reserve 1GB nearby and some decent probability for 500-700Mb

src/coreclr/pal/src/map/virtual.cpp

src/coreclr/pal/src/include/pal/virtual.h

…-for-jump-stubs

EgorBo · 2022-06-14T17:41:08Z

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Program
{
    public static void Main(string[] args) =>
        BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    [Benchmark]
    [Arguments(0.4f)]
    public float MathTest(float x) => 
        MathF.Cos(x) + MathF.Sin(x) + MathF.Tan(x) + MathF.Sin(x);
}

`

Method	Job	Toolchain	x	Mean
MathTest	Job-FIVHQM	/Core_Root/corerun	0.4	23.797 ns
MathTest	Job-VDOWXN	/Core_Root_PR/corerun	0.4	10.667 ns

in 70% cases the results are like this, in the rest 30% results are the same (24ns) but the baseline never managed to reserve 2Gb close to coreclr.

…-for-jump-stubs

src/coreclr/pal/src/map/virtual.cpp

…-for-jump-stubs

src/coreclr/pal/src/map/virtual.cpp

janvorli · 2022-06-16T15:22:43Z

src/coreclr/pal/src/map/virtual.cpp

+    // Smaller steps on ARM becuase we try hard finding a spare memory in a 128Mb
+    // distance from coreclr so e.g. all calls from corelib to coreclr could use relocs
+    const int32_t AddressProbingIncrement = 8 * 1024 * 1024;
+    const int32_t AllocSizeProbingDecrement = 64 * 1024 * 1024;


Why does the increment and decrement differ for arm?

Unified (and increased the initial size to 1Gb) - works pretty good on my Apple M1 and even better on linux-ampere (more hits of successful reservations)

in the worst case it does 8 probes (decreasing 1Gb by 128Mb each iteration and increasing startAddress by 8mb) and bails out

Co-authored-by: Jan Vorlicek <jan.vorlicek@volny.cz>

EgorBo · 2022-06-17T23:29:36Z

@janvorli @jkotas so does it look good? It now tries hard allocating 1Gb heap nearby and, according to logs, it manages to do so on our TE ampere linux machine with some good probability. It also does probing by increasing startAddress by 8mb each iteration and decreasing desired memory size by 128Mb (up to 8 iteration). On My M1 it usually manages to allocate around 500-700Mb heap and keeps relocs working.

I can remove the env.var I added if you don't think it brings value - I just wanted to experiment with it on dotnet/performance to reduce possible noise.

janvorli · 2022-06-20T21:54:13Z

I'd suggest removing the env var, even though the results of benchmarks are going to be more noisy, it will allow us to see what users really get.

EgorBo · 2022-06-21T11:27:35Z

I'd suggest removing the env var, even though the results of benchmarks are going to be more noisy, it will allow us to see what users really get.

Removed

janvorli · 2022-06-21T11:36:57Z

src/coreclr/pal/src/include/pal/virtual.h


+#ifdef TARGET_XARCH


This ifdef is incorrect, we don't define TARGET_XARCH out of JIT. Can you please use
#if defined(TARGET_ARM) || defined(TARGET_ARM64) here?

Ah, yes, again I am using JIT's flags in the VM 😞 the incorrect define slightly regressed x64/x86 by decreasing preferable size from 2Gb to 1Gb. Thanks, fixed

EgorBo · 2022-06-22T16:44:26Z

@janvorli does it look good now?

janvorli

LGTM, thank you!

Better strategy for ReserveInitialMemory on arm64

704b749

ghost assigned EgorBo Jun 14, 2022

dotnet-issue-labeler bot added the area-PAL-coreclr label Jun 14, 2022

EgorBo commented Jun 14, 2022

View reviewed changes

src/coreclr/pal/src/map/virtual.cpp Outdated Show resolved Hide resolved

jkotas reviewed Jun 14, 2022

View reviewed changes

EgorBo marked this pull request as draft June 14, 2022 12:28

janvorli reviewed Jun 14, 2022

View reviewed changes

src/coreclr/pal/src/map/virtual.cpp Outdated Show resolved Hide resolved

src/coreclr/pal/src/map/virtual.cpp Outdated Show resolved Hide resolved

src/coreclr/pal/src/include/pal/virtual.h Outdated Show resolved Hide resolved

EgorBo added 2 commits June 14, 2022 19:28

Merge branch 'main' of github.com:dotnet/runtime into yet-another-fix…

335e3cc

…-for-jump-stubs

Clean up

d252971

EgorBo marked this pull request as ready for review June 14, 2022 17:38

EgorBo added 3 commits June 15, 2022 14:40

Merge branch 'main' of github.com:dotnet/runtime into yet-another-fix…

f9fb6ac

…-for-jump-stubs

Address feedback

5897d6d

fix "below" case

6d0cdf1

janvorli reviewed Jun 15, 2022

View reviewed changes

src/coreclr/pal/src/map/virtual.cpp Outdated Show resolved Hide resolved

EgorBo added 3 commits June 16, 2022 12:59

Merge branch 'main' of github.com:dotnet/runtime into yet-another-fix…

1fc5970

…-for-jump-stubs

Address feedback

909cf7c

Update virtual.cpp

db57294

janvorli reviewed Jun 16, 2022

View reviewed changes

EgorBo and others added 3 commits June 16, 2022 21:53

Apply suggestions from code review

a5c76a3

Co-authored-by: Jan Vorlicek <jan.vorlicek@volny.cz>

Address feedback

c9e36ed

Fix defines

eab1661

EgorBo added 3 commits June 21, 2022 13:25

Remove env var

42b0ed6

clean up

5ddfd68

Clean up

168fe20

janvorli reviewed Jun 21, 2022

View reviewed changes

Update virtual.h

94146e2

janvorli approved these changes Jun 22, 2022

View reviewed changes

janvorli merged commit 4388cc5 into dotnet:main Jun 22, 2022

janvorli mentioned this pull request Jun 23, 2022

Initial executable memory allocation on ARM64 Linux is not right #65900

Closed

EgorBo mentioned this pull request Jul 10, 2022

Optimize jump stubs on arm64 #62302

Open

ghost locked as resolved and limited conversation to collaborators Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VM: Better strategy for TryReserveInitialMemory on arm64 (jump stubs) #70707

VM: Better strategy for TryReserveInitialMemory on arm64 (jump stubs) #70707

EgorBo commented Jun 14, 2022 •

edited

Loading

jkotas Jun 14, 2022

EgorBo Jun 14, 2022 •

edited

Loading

EgorBo Jun 14, 2022

EgorBo commented Jun 14, 2022 •

edited

Loading

EgorBo commented Jun 14, 2022 •

edited

Loading

janvorli Jun 16, 2022

EgorBo Jun 16, 2022

EgorBo Jun 16, 2022

EgorBo commented Jun 17, 2022

janvorli commented Jun 20, 2022

EgorBo commented Jun 21, 2022

janvorli Jun 21, 2022

EgorBo Jun 21, 2022

EgorBo commented Jun 22, 2022

janvorli left a comment


		#ifdef TARGET_XARCH

VM: Better strategy for TryReserveInitialMemory on arm64 (jump stubs) #70707

VM: Better strategy for TryReserveInitialMemory on arm64 (jump stubs) #70707

Conversation

EgorBo commented Jun 14, 2022 • edited Loading

jkotas Jun 14, 2022

Choose a reason for hiding this comment

EgorBo Jun 14, 2022 • edited Loading

Choose a reason for hiding this comment

EgorBo Jun 14, 2022

Choose a reason for hiding this comment

EgorBo commented Jun 14, 2022 • edited Loading

EgorBo commented Jun 14, 2022 • edited Loading

janvorli Jun 16, 2022

Choose a reason for hiding this comment

EgorBo Jun 16, 2022

Choose a reason for hiding this comment

EgorBo Jun 16, 2022

Choose a reason for hiding this comment

EgorBo commented Jun 17, 2022

janvorli commented Jun 20, 2022

EgorBo commented Jun 21, 2022

janvorli Jun 21, 2022

Choose a reason for hiding this comment

EgorBo Jun 21, 2022

Choose a reason for hiding this comment

EgorBo commented Jun 22, 2022

janvorli left a comment

Choose a reason for hiding this comment

EgorBo commented Jun 14, 2022 •

edited

Loading

EgorBo Jun 14, 2022 •

edited

Loading

EgorBo commented Jun 14, 2022 •

edited

Loading

EgorBo commented Jun 14, 2022 •

edited

Loading