Skip to content
Jannik Vogel edited this page Dec 26, 2017 · 3 revisions

(The features in this article will only work with the default Unicorn-Engine backend. KVM is not supported)

Tracing

From GDB you can enable tracing at almost any time by calling by calling SetTracing(1). Tracing will print parts of the CPU state after each instruction.

Example: Debug a faulty memory access

From previous runs, we know that there is an issue shortly after a call to IA3dSource__7. Therefore, we set a breakpoint on that API hook before running OpenSWE1R:

(gdb) break Hook_IA3dSource__7
(gdb) r

Now OpenSWE1R runs until the breakpoint is reached. Once the breakpoint is reached we enable tracing:

Thread 1 "openswe1r" hit Breakpoint 1, Hook_IA3dSource__7 (uc=0x5555557b97e0, _address=224853248, _size=0, _user_data=0x555556ab0470) at /home/fox/Data/Projects/OpenSWE1R/main.c:3246
3246	HACKY_COM_BEGIN(IA3dSource, 7)
(gdb) call SetTracing(1)
(gdb) c
Continuing.

The program continues and outputs the trace before crashing:

SetWaveFormat
p 0xD66E800
a 0xD68BB74
Stack at 0xC07FFB6C; returning EAX: 0x00000000
  46847 Emulation at 423494 ('IA3dSource__7') from 423494

      0 TRACE Emulation at 0x423494 (ESP: 0xC07FFB78); eax = 0x00000000 esi = 0x0002B110 (TS: 465105553)
      1 TRACE Emulation at 0x423496 (ESP: 0xC07FFB78); eax = 0x00000000 esi = 0x0002B110 (TS: 465105553)
      2 TRACE Emulation at 0x4234AD (ESP: 0xC07FFB78); eax = 0x00000000 esi = 0x0002B110 (TS: 465105553)
      3 TRACE Emulation at 0x4234AF (ESP: 0xC07FFB78); eax = 0x00000000 esi = 0x0002B110 (TS: 465105554)
      4 TRACE Emulation at 0x4234B0 (ESP: 0xC07FFB7C); eax = 0x00000000 esi = 0x0002B110 (TS: 465105554)
      5 TRACE Emulation at 0x4234B2 (ESP: 0xC07FFB7C); eax = 0x00000000 esi = 0x0002B110 (TS: 465105554)
      6 TRACE Emulation at 0x4234B5 (ESP: 0xC07FFB7C); eax = 0x00000000 esi = 0x0002B110 (TS: 465105554)
      7 TRACE Emulation at 0x4234B6 (ESP: 0xC07FFB7C); eax = 0x00000000 esi = 0x0002B110 (TS: 465105554)
      8 TRACE Emulation at 0x4234B7 (ESP: 0xC07FFB80); eax = 0x00000000 esi = 0x00000000 (TS: 465105554)
      9 TRACE Emulation at 0x4234B9 (ESP: 0xC07FFB80); eax = 0x00000000 esi = 0x00000000 (TS: 465105554)
     10 TRACE Emulation at 0x4234BA (ESP: 0xC07FFB84); eax = 0x00000000 esi = 0x00000000 (TS: 465105554)
     11 TRACE Emulation at 0x4234BC (ESP: 0xC07FFB84); eax = 0x0002B110 esi = 0x00000000 (TS: 465105554)
     12 TRACE Emulation at 0x4234BD (ESP: 0xC07FFB88); eax = 0x0002B110 esi = 0x00000000 (TS: 465105554)
     13 TRACE Emulation at 0xC07FFBB8 (ESP: 0xC07FFB8C); eax = 0x0002B110 esi = 0x00000000 (TS: 465105554)
     14 TRACE Emulation at 0xC07FFBBE (ESP: 0xC07FFB8C); eax = 0x0002B110 esi = 0x00000000 (TS: 465105554)
Unicorn-Engine error of type 19 at 0x2b110, size = 0x1
EIP: 0xC07FFBBE
ESP: 0xC07FFB8C
EBP: 0x00000000
EAX: 0x0002B110
EBX: 0x0002B110
ECX: 0x0D68BB74
EDX: 0x0002B110
ESI: 0x00000000
EDI: 0x004D4658
EFLAGS: 0x00000084
FPSW: 0x4000
FPCW: 0x0200
FPTAG: 0xFFFF
FP0: 0x80000000000000000000
FP1: 0x40079380000000000000
FP2: 0x4005E800000000000000
FP3: 0x4005C800000000000000
FP4: 0x4005C800000000000000
FP5: 0x4005C800000000000000
FP6: 0x4005C800000000000000
FP7: 0x00000000000000000000
Emulation returned C07FFBBE
Stack [0] = C07FFBB4
Stack [1] = 0
Stack [2] = 0
Stack [3] = 0
...

In this case, the trace helped to identify the instruction which set the "esi" register to a NULL-pointer.

Profiling

The profiler in OpenSWE1R can be useful to debug performance issues, but also to collect a variety information about the original game. Using the Tracing and Profiling features at the same time is currently untested and should be avoided.

When profiling is enabled, the number of executions and the duration per instruction will be stored. Each measured execution is referred to as a "sample". This accumulated sample information will be referred to as the instructions "heat".

To enable profiling, or clear the running profilers heat information, you can call SetProfiling(1). This can be done from GDB almost anytime. The recorded heat can be output to stdout by calling DumpProfilingHeat(0). Alternatively, you can output to a file by calling with a path: DumpProfilingHeat("./output.heat")

Creating a heat file

  • Start OpenSWE1R through GDB: gdb --args ./openswe1r.
  • Now I click through the menus to go ingame, as that is the scene I want to analyze.
  • Once the race has started, I press Ctrl + C to break execution of the game:
    ^C
    Thread 1 "openswe1r" received signal SIGINT, Interrupt.
    0x00007ffff73e2b54 in ?? () from /usr/lib/libunicorn.so.1
    (gdb)
    
  • Now I start the profiler withcall SetProfiling(1). Executing c will continue execution with profling enabled.
  • After letting the game run for a couple of seconds, I hit Ctrl + C again.
  • Using call DumpProfilingHeat("./3d-scene.heat") I generate a file with the profiling results.

Example: Identifying performance bottlenecks

This examples shows how to use "profile_functions.py". You'll need python3 to run this script.

In this example, I'll run OpenSWE1R with the demo version and try to find out the slowest functions used while being in a race. I follow the instructions above to create a heat file called "3d-scene.heat" after the race has started.

I now do post-processing on this file to identify the per-function statistics. I do this by running the following from the "scripts/" folder:

./profile_functions.py ../build/3d-scene.heat

The bottom of the output looks like this:

...
sub_4AC2D0 36 calls, 32 instructions, 1368 executions, duration 3042665
sub_4AC520 252 calls, 17 instructions, 4284 executions, duration 1953057
sub_4AC590 252 calls, 15 instructions, 3564 executions, duration 2166044
sub_4AEE20 36 calls, 49 instructions, 1798 executions, duration 4234593
sub_4B06A0 70 calls, 2 instructions, 106 executions, duration 1084042
sub_D01A000 36 calls, 4 instructions, 144 executions, duration 431773
sub_D035000 36 calls, 4 instructions, 144 executions, duration 1982903
sub_D056000 36 calls, 2 instructions, 144 executions, duration 5044761
total 32931 instructions, 105929904 executions, duration 20498903786

We can also sort this output to identify the slowest functions. On Unix this is possible with the sort utility:

./profile_functions.py ../build/3d-scene.heat | sort -h -k 9

The bottom of the output will look like this:

...
sub_4942D0 3796 calls, 235 instructions, 4289215 executions, duration 570076535
sub_4947C0 54307 calls, 142 instructions, 5589736 executions, duration 630567417
sub_498330 3796 calls, 64 instructions, 7413780 executions, duration 816905702
sub_495610 2042 calls, 115 instructions, 10471770 executions, duration 1146322314
sub_4A1AD0 42931 calls, 253 instructions, 15415683 executions, duration 1608369193
sub_492FF0 70 calls, 407 instructions, 22475620 executions, duration 5655155602

These are the functions which took up most of the time. As such, these should be rewritten in C soon.

Example: Finding a codepath which is used for a certain feature

This examples shows how to use "scripts/profile_encountered.py". You'll need python3 to run this script.

FIXME