Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Linux/arm64: 2 Regressions on 12/5/2023 8:53:15 AM #96499

Closed
performanceautofiler bot opened this issue Dec 14, 2023 · 19 comments
Closed

[Perf] Linux/arm64: 2 Regressions on 12/5/2023 8:53:15 AM #96499

performanceautofiler bot opened this issue Dec 14, 2023 · 19 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) PGO Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Dec 14, 2023

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline 88b5e3d4b77dd8238331ade1b31ac8ddc62f22f7
Compare 6798f84d1b50fa1aa76f6405f04cb3f2f77da641
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateFor<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
197.11 ns 219.99 ns 1.12 0.20 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateFor&lt;String&gt;*'

Payloads

Baseline
Compare

System.Collections.IterateFor<String>.ImmutableArray(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline 88b5e3d4b77dd8238331ade1b31ac8ddc62f22f7
Compare 6798f84d1b50fa1aa76f6405f04cb3f2f77da641
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
187.75 ns 215.96 ns 1.15 0.27 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;String&gt;*'

Payloads

Baseline
Compare

System.Collections.IterateForEach<String>.Span(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) PGO runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Dec 14, 2023
@EgorBo
Copy link
Member

EgorBo commented Dec 14, 2023

#95379 cc @AndyAyersMS

@EgorBo EgorBo transferred this issue from dotnet/perf-autofiling-issues Jan 4, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 4, 2024
@EgorBo EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 4, 2024
@ghost
Copy link

ghost commented Jan 4, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline 88b5e3d4b77dd8238331ade1b31ac8ddc62f22f7
Compare 6798f84d1b50fa1aa76f6405f04cb3f2f77da641
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateFor<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
197.11 ns 219.99 ns 1.12 0.20 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateFor&lt;String&gt;*'

Payloads

Baseline
Compare

System.Collections.IterateFor<String>.ImmutableArray(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 20.04
Queue AmpereUbuntu
Baseline 88b5e3d4b77dd8238331ade1b31ac8ddc62f22f7
Compare 6798f84d1b50fa1aa76f6405f04cb3f2f77da641
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
187.75 ns 215.96 ns 1.15 0.27 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;String&gt;*'

Payloads

Baseline
Compare

System.Collections.IterateForEach<String>.Span(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

arch-arm64, os-linux, area-CodeGen-coreclr, untriaged, runtime-coreclr, PGO

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Jan 4, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 4, 2024
@AndyAyersMS AndyAyersMS added the Priority:2 Work that is important, but not critical for the release label May 8, 2024
@EgorBo

This comment was marked as resolved.

@EgorBot

This comment was marked as resolved.

@EgorBo

This comment was marked as resolved.

@EgorBot
Copy link

EgorBot commented Jul 16, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-UZZAGX : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-GFJCJE : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Size Mean Error Ratio Code Size
Span Main 512 187.8 ns 0.05 ns 1.00 -
Span PR 512 219.7 ns 0.38 ns 1.17 312 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member

EgorBo commented Jul 16, 2024

hm.. weird, looks like regression is real and is caused by #95379 but my bot failed to collect native profile + BDN failed to collect any useful info with its [DisassemblyDiagnoset] 😕

@EgorBo
Copy link
Member

EgorBo commented Jul 16, 2024

@EgorBot -arm64 -profiler -commit 0c513d9 vs 829524b --disasm --envvars DOTNET_JitDisasm:MySpan

using BenchmarkDotNet.Attributes;
using System.Text;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

[GenericTypeArguments(typeof(string))] // reference type
public class IterateForEach<T>
{
    [Params(512)]
    public int Size;

    private T[] _array;

    [GlobalSetup(Targets = new[] { nameof(MySpan) })]
    public void SetupArray() => _array = ValuesGenerator.ArrayOfUniqueValues<T>(Size);

    [Benchmark]
    public T MySpan()
    {
        T result = default;
        var collection = new System.Span<T>(_array);
        foreach (var item in collection)
            result = item;
        return result;
    }
}

public static class ValuesGenerator
{
    private const int Seed = 12345;

    public static T[] ArrayOfUniqueValues<T>(int count)
    {
        if (count > 2 && typeof(T) == typeof(bool))
            throw new ArgumentOutOfRangeException("count", "Cannot exceed 2 for bool values");
        if (count > 255 && (typeof(T) == typeof(byte) || typeof(T) == typeof(sbyte)))
            throw new ArgumentOutOfRangeException("count", "Cannot exceed 255 for byte or sbyte values");
        T[] result = new T[count];
        var random = new Random(Seed);
        var uniqueValues = new HashSet<T>();
        while (uniqueValues.Count != count)
        {
            T value = GenerateValue<T>(random);
            if (!uniqueValues.Contains(value))
                uniqueValues.Add(value);
        }
        uniqueValues.CopyTo(result);
        return result;
    }

    private static T GenerateValue<T>(Random random)
    {
        if (typeof(T) == typeof(string))
            return (T)(object)GenerateRandomString(random, 1, 50);  // note: all strings have only the characters 'a'..'z', 'A'..'Z', or '0'..'9'
        throw new NotImplementedException($"{typeof(T).Name} is not implemented");
    }

    private static string GenerateRandomString(Random random, int minLength, int maxLength)
    {
        var length = random.Next(minLength, maxLength);
        var builder = new StringBuilder(length);
        for (int i = 0; i < length; i++)
        {
            var rangeSelector = random.Next(0, 3);
            if (rangeSelector == 0)
                builder.Append((char)random.Next('a', 'z'));
            else if (rangeSelector == 1)
                builder.Append((char)random.Next('A', 'Z'));
            else
                builder.Append((char)random.Next('0', '9'));
        }
        return builder.ToString();
    }
}

@dotnet dotnet deleted a comment from EgorBot Jul 16, 2024
@EgorBot
Copy link

EgorBot commented Jul 16, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-VPJYMP : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-WEDDUS : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
EnvironmentVariables=DOTNET_JitDisasm=MySpan
Method Toolchain Size Mean Error Ratio Code Size
MySpan Main 512 187.9 ns 0.03 ns 1.00 -
MySpan PR 512 217.9 ns 0.11 ns 1.16 312 B

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member

EgorBo commented Jul 17, 2024

Codegen diff for hot path: https://www.diffchecker.com/mqSBe7ft/ cc @AndyAyersMS 🤷‍♂️ (extracted from BDN_Artifacts.zip)

@AndyAyersMS
Copy link
Member

8.0 vs 9.0p6 on M1
BenchmarkDotNet v0.13.13-nightly.20240311.145, macOS Sonoma 14.5 (23F79) [Darwin 23.5.0]
Apple M1 Max, 1 CPU, 10 logical and 10 physical cores
.NET SDK 9.0.100-preview.6.24328.19
[Host] : .NET 8.0.7 (8.0.724.31311), Arm64 RyuJIT AdvSIMD
Job-DOVUON : .NET 8.0.7 (8.0.724.31311), Arm64 RyuJIT AdvSIMD
Job-LZVINR : .NET 9.0.0 (9.0.24.32707), Arm64 RyuJIT AdvSIMD

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method Runtime Size Mean Error StdDev Median Min Max Ratio Allocated Alloc Ratio
Span .NET 8.0 512 172.1 ns 0.82 ns 0.73 ns 171.9 ns 171.4 ns 173.6 ns 1.00 - NA
Span .NET 9.0 512 168.8 ns 0.80 ns 0.75 ns 168.4 ns 168.0 ns 170.0 ns 0.98 - NA

8.0 branched in August, so could be I need builds closer to the above.

@AndyAyersMS
Copy link
Member

image

Looks like there was a big improvement on 18-Oct-23 from #93371 and the regression here on 5-Dec-23.

@EgorBo

This comment was marked as resolved.

@EgorBot

This comment was marked as resolved.

@EgorBo

This comment was marked as resolved.

@EgorBot

This comment was marked as resolved.

@EgorBo
Copy link
Member

EgorBo commented Jul 17, 2024

@EgorBot -arm64 -profiler -commit 0c513d9 vs 829524b

using BenchmarkDotNet.Attributes;
using System.Text;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

[GenericTypeArguments(typeof(string))] // reference type
public class IterateForEach<T>
{
    [Params(512)]
    public int Size;

    private T[] _array;

    [GlobalSetup(Targets = new[] { nameof(MySpan) })]
    public void SetupArray() => _array = ValuesGenerator.ArrayOfUniqueValues<T>(Size);

    [Benchmark]
    public T MySpan()
    {
        T result = default;
        var collection = new System.Span<T>(_array);
        foreach (var item in collection)
            result = item;
        return result;
    }
}

public static class ValuesGenerator
{
    private const int Seed = 12345;

    public static T[] ArrayOfUniqueValues<T>(int count)
    {
        if (count > 2 && typeof(T) == typeof(bool))
            throw new ArgumentOutOfRangeException("count", "Cannot exceed 2 for bool values");
        if (count > 255 && (typeof(T) == typeof(byte) || typeof(T) == typeof(sbyte)))
            throw new ArgumentOutOfRangeException("count", "Cannot exceed 255 for byte or sbyte values");
        T[] result = new T[count];
        var random = new Random(Seed);
        var uniqueValues = new HashSet<T>();
        while (uniqueValues.Count != count)
        {
            T value = GenerateValue<T>(random);
            if (!uniqueValues.Contains(value))
                uniqueValues.Add(value);
        }
        uniqueValues.CopyTo(result);
        return result;
    }

    private static T GenerateValue<T>(Random random)
    {
        if (typeof(T) == typeof(string))
            return (T)(object)GenerateRandomString(random, 1, 50);  // note: all strings have only the characters 'a'..'z', 'A'..'Z', or '0'..'9'
        throw new NotImplementedException($"{typeof(T).Name} is not implemented");
    }

    private static string GenerateRandomString(Random random, int minLength, int maxLength)
    {
        var length = random.Next(minLength, maxLength);
        var builder = new StringBuilder(length);
        for (int i = 0; i < length; i++)
        {
            var rangeSelector = random.Next(0, 3);
            if (rangeSelector == 0)
                builder.Append((char)random.Next('a', 'z'));
            else if (rangeSelector == 1)
                builder.Append((char)random.Next('A', 'Z'));
            else
                builder.Append((char)random.Next('0', '9'));
        }
        return builder.ToString();
    }
}

@EgorBot
Copy link

EgorBot commented Jul 17, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-RJNSQN : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-YMBCTQ : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Size Mean Error Ratio
MySpan Main 512 188.2 ns 0.12 ns 1.00
MySpan PR 512 217.5 ns 0.08 ns 1.16

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@AndyAyersMS
Copy link
Member

Looks like this was fixed by #105131:
newplot - 2024-08-12T180623 693
newplot - 2024-08-12T180725 749

@github-actions github-actions bot locked and limited conversation to collaborators Sep 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro) PGO Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

5 participants