Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSP Performance benchmark #22

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

DSP Performance benchmark #22

wants to merge 1 commit into from

Conversation

JayFoxRox
Copy link
Owner

@JayFoxRox JayFoxRox commented Dec 26, 2018

This can be used to benchmark the (GP) DSP.

Example output:

1 frames: 6287446 ns elapsed (47 ns dryrun, 6287446 ns per frame [3.976193% realtime]), -6037445 ns of budget left

Settings

         unsigned int frames = 1; // Target should be ~15
         unsigned int cycles = 40000; // Target should be ~106666, minimum 40000

frames:

We only run every 15th APU frame, so if you want to batch all frames, you must set frames = 15. If frames is any less, we can't produce stable audio output.
For measuring performance, only doing every 15th frame is fine.

cycles:

Running 40000 cycles should be enough for some games, but the real DSP will run around 106k (probably). The more we can do, the better.
22000 cycles should be enough to finish a single APU frame in DirectSound.
1000 cycles is what XQEMU master revision uses.

If you keep increasing this value, you might see a performance improvement at some point.
That is because the DSP will be stuck on a simple loop instruction which is computationally less expensive than some others (= lower CPU usage / faster). If you pick a lower value the DSP might still be busy doing more complicated tasks when hitting the limit (= high CPU usage / slower).


There's certain combinations where the Xbox will refuse to boot. Either because the host CPU is busy and won't find time to update the UI, or because something in the DSP went wrong as it was spammed by frames / lacked frames.
It's also possible that there's bugs in our DSP which affect this test.

How to interpret the output

{A} frames: {B} ns elapsed ({C} ns dryrun, {D} ns per frame [{E} % realtime]), {F} ns of budget left

(ns = nanoseconds)

  • {A} frames: The number of frames that we measured
  • {B} ns elapsed: The time it took to emulate those APU frames using the DSP
  • {C} ns dryrun: How long an empty measure is (this can be used to see the timer resolution; if this is large or about as large as {B}, then your system timer resolutions are bad)
  • {D} ns per frame: elapsed time ({B}) divided by number of frames ({A}), so you know how long each frame took; this must be <= 666666 ns for realtime audio output.
  • {E} % realtime: We have 0.666ms per frame (160 MHz ~> 106666 cycles). However, if we take twice as long to compute 106666 cycles, we only run at 50% etc.; this must be >= 100% for realtime audio output.
  • {F} ns of budget left: You have a budget of 0.666ms per frame ({A}). If we run faster than realtime, then we have budget left (time we didn't need). If we run slower than realtime, we have a negative budget (the amount of time we spent, to catch up with realtime); this must be >= 0 for realtime audio output.

tl;dr: Make sure {C} is tiny, to verify timing works, optimize so {E} reaches 100%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant