Skip to content

Commit

Permalink
zero out sample struct between each SPE record processing. Rationale …
Browse files Browse the repository at this point in the history
…for instructions over branches inside.

desktop:~/sort-inject.txt:

0000000000000998 <sort_array>:
 998:   d112c3ff        sub     sp, sp, #0x4b0
 99c:   90000000        adrp    x0, 0 <_init-0x6d0>
 9a0:   912ae000        add     x0, x0, #0xab8
 9a4:   52802581        mov     w1, #0x12c                      // torvalds#300
 9a8:   a9bd7bfd        stp     x29, x30, [sp, #-48]!
 9ac:   910003fd        mov     x29, sp
 9b0:   f90013f5        str     x21, [sp, torvalds#32]
 9b4:   9100c3b5        add     x21, x29, #0x30
 9b8:   a90153f3        stp     x19, x20, [sp, torvalds#16]
 9bc:   911383b4        add     x20, x29, #0x4e0
 9c0:   aa1503f3        mov     x19, x21
 9c4:   97ffff6b        bl      770 <printf@plt>
 9c8:   97ffff5e        bl      740 <rand@plt>
 9cc:   b8004660        str     w0, [x19], #4
 9d0:   eb13029f        cmp     x20, x19
 9d4:   54ffffa1        b.ne    9c8 <sort_array+0x30>  // b.any
 9d8:   9112b2a3        add     x3, x21, #0x4ac
 9dc:   aa1503e0        mov     x0, x21
 9e0:   52800004        mov     w4, #0x0                        // #0
 9e4:   29400402        ldp     w2, w1, [x0]					5.33%       <-----\
 9e8:   6b02003f        cmp     w1, w2						5.08%             |
 9ec:   5400006a        b.ge    9f8 <sort_array+0x60>  // b.tcont		5.16%       >-\   |
 9f0:   52800024        mov     w4, #0x1                        // #1		1.56% (swap)  |   |
 9f4:   29000801        stp     w1, w2, [x0]					1.37% (swap)  |   |
 9f8:   91001000        add     x0, x0, #0x4					5.35%       <-/   |
 9fc:   eb00007f        cmp     x3, x0						5.11%             |
 a00:   54ffff21        b.ne    9e4 <sort_array+0x4c>  // b.any			5.21%       >-----/
 a04:   35fffec4        cbnz    w4, 9dc <sort_array+0x44>
 a08:   a94153f3        ldp     x19, x20, [sp, torvalds#16]
 a0c:   f94013f5        ldr     x21, [sp, torvalds#32]
 a10:   a8c37bfd        ldp     x29, x30, [sp], torvalds#48
 a14:   9112c3ff        add     sp, sp, #0x4b0
 a18:   d65f03c0        ret
 a1c:   00000000        .inst   0x00000000 ; undefined

 Are we reporting instructions as branches?  what does PT do?
                       vvvvvvvv  - above aren't all branches!!
Samples: 12K of event 'branches', Event count (approx.): 12560
  Children      Self  Command  Shared Object     Symbol
     5.35%     5.35%  :-1      [unknown]         [.] 0x0000aaaaaaaaa9f8                                                ◆
     5.33%     5.33%  :-1      [unknown]         [.] 0x0000aaaaaaaaa9e4                                                ▒
     5.21%     5.21%  :-1      [unknown]         [.] 0x0000aaaaaaaaaa00                                                ▒
     5.16%     5.16%  :-1      [unknown]         [.] 0x0000aaaaaaaaa9ec                                                ▒
     5.11%     5.11%  :-1      [unknown]         [.] 0x0000aaaaaaaaa9fc                                                ▒
     5.08%     5.08%  :-1      [unknown]         [.] 0x0000aaaaaaaaa9e8                                                ▒
     1.56%     1.56%  :-1      [unknown]         [.] 0x0000aaaaaaaaa9f0                                                ▒
     1.37%     1.37%  :-1      [unknown]         [.] 0x0000aaaaaaaaa9f4                                                ▒
     0.40%     0.40%  :-1      [unknown]         [k] 0x00ff0000081b2b0c                                                ▒
     0.33%     0.33%  :-1      [unknown]         [k] 0x00ff0000081b2b08                                                ▒
     0.33%     0.33%  :-1      [unknown]         [k] 0x00ff0000081b2b20                                                ▒
     0.30%     0.30%  :-1      [unknown]         [k] 0x00ff0000081b2b14                                                ▒
     0.21%     0.21%  :-1      [unknown]         [.] 0x0000ffffbf68568c                                                ▒
(rest are non-aaaa's.)

Intel-PT on sort:

Available samples
0 intel_pt//                                                                                                           ◆
0 dummy:u                                                                                                              ▒
0 dummy:u                                                                                                              ▒
1K instructions                                                                                                        ▒
0 transactions                                                                                                         ▒
0 ptwrite                                                                                                              ▒
1 cbr                                                                                                                  ▒

Samples: 1K of event 'instructions', Event count (approx.): 85205326
  Children      Self  Command  Shared Object      Symbol
+   99.56%     0.00%  sort     libc-2.27.so       [.] __libc_start_main
+   99.44%     0.00%  sort     sort               [.] _start
+   99.36%     0.00%  sort     sort               [.] main
+   99.36%    99.07%  sort     sort               [.] sort_array
+    0.58%     0.00%  sort     [kernel.kallsyms]  [k] __indirect_thunk_start
     0.32%     0.00%  sort     ld-2.27.so         [.] _dl_start_user
     0.29%     0.00%  sort     [kernel.kallsyms]  [k] page_fault
     0.29%     0.00%  sort     [kernel.kallsyms]  [k] do_page_fault

clicking on sort_array goes to annotate, showing 'skid'? on the jge 90's next
instruction (in the swap routine?):

       │     bubble_sort():                                                                                            ▒
       │             swap_flag = 0;                                                                                    ▒
       │       xor    %edx,%edx                                                                                        ▒
       │             for (i = 1; i < n; i++) {                                                                         ▒
       │       mov    $0x1,%ecx                                                                                        ▒
       │       nop                                                                                                     ▒
 10.28 │ 90:   cmp    %ecx,%ebx                                                                                        ▒
  3.43 │     ↓ jle    148                                                                                              ▒
 26.48 │ 98:   movslq %ecx,%rax                                                                                        ▒
  5.46 │       lea    0x0(%rbp,%rax,4),%rax                                                                            ▒
       │                 if (a[i] < a[i - 1]) {                                                                        ▒
 12.69 │ a0:   mov    (%rax),%esi                                                                                      ▒
  6.48 │       mov    -0x4(%rax),%edi                                                                                  ▒
  6.39 │       add    $0x1,%ecx                                                                                        ▒
  5.56 │       cmp    %edi,%esi                                                                                        ▒
  7.31 │     ↑ jge    90                                                                                               ▒
       │                     a[i] = a[i - 1];                                                                          ◆
  7.31 │       mov    %edi,(%rax)                                                                                      ▒
       │                     a[i - 1] = temp;                                                                          ▒
  1.39 │       mov    %esi,-0x4(%rax)                                                                                  ▒
  1.48 │       add    $0x4,%rax                                                                                        ▒
       │             for (i = 1; i < n; i++) {                                                                         ▒
  2.04 │       cmp    %ecx,%ebx                                                                                        ▒
       │                     swap_flag = 1;                                                                            ▒
  2.13 │       mov    $0x1,%edx                                                                                        ▒
       │             for (i = 1; i < n; i++) {                                                                         ▒
  1.57 │     ↑ jg     a0                                                                                               ▒
       │       xor    %edx,%edx                                                                                        ▒
       │       cmp    $0x1,%ebx                                                                                        ▒
       │       mov    $0x1,%ecx                                                                                        ▒
       │     ↑ jg     98                                                                                               ▒
       │     stop():                                                                                                   ▒

Anyway, branches aren't being reported, even with record -b -e intel_pt we get
'no samples in perf.data file'.  And perf report --branch-stack doesnt' run if
it sees record wasn't run with -b.

So, code is wrong.  Those samples are instructions, not branches, even though
the instructions are branches.

wrt symbols, vmlinux is being loaded and read, but no addresses are being
reported as kernel addresses, so no symbols get used.

In the aem-built-perf-report-vvvvv.out case, sort didn't match build-id-wise
because I didn't carry the archive... I should start using --symfs, which souds
like it's more embedded-friendly (android, e.g.).
  • Loading branch information
kim-phillips-arm committed Aug 16, 2018
1 parent 0da1447 commit 1d672e1
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion tools/perf/util/arm-spe.c
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,7 @@ static int arm_spe_process_buffer(struct arm_spe_queue *speq,
int ret = 0, pkt_len;
void *buf;

memset(&sample, 0, sizeof(sample));

if (buffer->use_data) {
sz = buffer->use_size;
Expand Down Expand Up @@ -512,7 +513,8 @@ static int arm_spe_process_buffer(struct arm_spe_queue *speq,
continue;

if (ret == ARM_SPE_BAD_PACKET || ret != 0) {
pr_debug("%s: error processing SPE packet data\n",__func__);
pr_debug("%s: error processing SPE packet data. Continuing anyway\n",__func__);
memset(&sample, 0, sizeof(sample));
continue;
}

Expand Down Expand Up @@ -565,6 +567,7 @@ static int arm_spe_process_buffer(struct arm_spe_queue *speq,
__func__, __LINE__);
continue;
}
memset(&sample, 0, sizeof(sample));
}

#if 0 /* not sure if we can do this: @thread_stack: feed branches to the thread_stack */
Expand Down

0 comments on commit 1d672e1

Please sign in to comment.