Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Android NDK r27 rc2 miscompiles code with indirect gotos #2040

Open
SanjaLV opened this issue Jul 11, 2024 · 18 comments
Open

[Bug]: Android NDK r27 rc2 miscompiles code with indirect gotos #2040

SanjaLV opened this issue Jul 11, 2024 · 18 comments
Assignees
Labels

Comments

@SanjaLV
Copy link

SanjaLV commented Jul 11, 2024

Description

Android NDK r27 rc 2 produces invalid code for any target architecture when compiling with any nonzero optimization level.

Bellow is attached minimized/striped sample that shows the problem (when targeting x86_64 with -O1)

extern int printf(const char *fmt, ...);

int main() {
  void* bytecode[2];
  bytecode[0] = &&VM__OP_1;
  bytecode[1] = &&VM__TERMINATE;

  int state = 0;
  int index = 0;

  while (1) {
    switch (state) {
    case 0:
      goto *bytecode[index];
    case 1:
      // NOTE: THIS IS ONLY REACHABLE VIA INDIRECT GOTOS
      VM__OP_1:
      state = 2;
      break;
    case 2:
      printf("OP_1:(instruction=%d)\n", index);
      index++;
      goto *bytecode[index];
    }
  }

VM__TERMINATE:
  printf("TERMINATE:(instruction=%d)\n", index);
  return 0;
}

Link to github project: https://github.com/SanjaLV/ndk-bug-reports/tree/main/r27_rc2

Prerequisites:

  1. Linux/macOS machine
  2. ANDROID_HOME env variable that will point to Android SDK root.
  3. ndk;26.3.11579264 / ndk;27.0.11902837 installed with SDK manager.

How to reproduce (invalid code):

  1. Run make local and observe correct behavior with system compiler
  2. Run make r26 and observe correct behavior when compiling with NDK r26d
  3. Run make r27 and observe incorrect program behavior.
  4. Run optnone and observe correct behavior with O0 optimization level.

Correct execution should yield the following output:

OP_1:(instruction=0)
TERMINATE:(instruction=1)

Incorrect NDK r27 execution results in the following output:

TERMINATE:(instruction=0)

Context:

Originally discovered that upgrading NDK from r26d to r27 r1/rc2 broke state-machine like bytecode interpreter. After some investigation, we found out that bug appears if and only if we enable INDIRECT GOTO optimizations.

Feel free to ask for more information.

Many thanks,
Aleksandrs

Upstream bug

No response

Commit to cherry-pick

No response

Affected versions

r27

Canary version

No response

Host OS

Linux

Host OS version

Ubuntu 22.04

Affected ABIs

armeabi-v7a, arm64-v8a, x86, x86_64

@SanjaLV SanjaLV added the bug label Jul 11, 2024
@SanjaLV
Copy link
Author

SanjaLV commented Jul 11, 2024

miscompilation is obvious if you inspect generated llvm bitcode:

27.0.11902837/toolchains/llvm/prebuilt/linux-x86_64/bin/clang -O1 test.c -S -emit-llvm & cat test.ll

...

; Function Attrs: nofree nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
  %1 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.1, i32 noundef 0)
  ret i32 0
}

...

@appujee
Copy link
Collaborator

appujee commented Jul 11, 2024

Can't repro on clang upstream https://godbolt.org/z/d9bGnhhv6 so the fix is likely to be in upstream.

@appujee
Copy link
Collaborator

appujee commented Aug 1, 2024

The bug is in SimplifyCFGPass

Before

define dso_local noundef i32 @main() #0 {
  %1 = alloca [2 x ptr], align 16
  store ptr blockaddress(@main, %7), ptr %1, align 16, !tbaa !5
  %2 = getelementptr inbounds [2 x ptr], ptr %1, i64 0, i64 1
  store ptr blockaddress(@main, %15), ptr %2, align 8, !tbaa !5
  br label %3

3:                                                ; preds = %0, %12
  %4 = phi i32 [ 0, %0 ], [ %13, %12 ]
  %5 = phi i32 [ 0, %0 ], [ %14, %12 ]
  switch i32 %4, label %12 [
    i32 0, label %6
    i32 1, label %7
    i32 2, label %9
  ]

6:                                                ; preds = %3
  br label %17

7:                                                ; preds = %3, %17
  %8 = phi i32 [ %18, %17 ], [ %5, %3 ]
  %8 = phi i32 [ %18, %17 ], [ %5, %3 ]
  br label %12

9:                                                ; preds = %3
  %10 = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %5)
  %11 = add nsw i32 %5, 1
  br label %17

12:                                               ; preds = %3, %7
  %13 = phi i32 [ %4, %3 ], [ 2, %7 ]
  %14 = phi i32 [ %5, %3 ], [ %8, %7 ]
  br label %3, !llvm.loop !9

15:                                               ; preds = %17
  %16 = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.1, i32 noundef %
  ret i32 0

17:                                               ; preds = %9, %6
  %18 = phi i32 [ %11, %9 ], [ %5, %6 ]
  %19 = sext i32 %18 to i64
  %20 = getelementptr inbounds [2 x ptr], ptr %1, i64 0, i64 %19
  %21 = load ptr, ptr %20, align 8, !tbaa !5
  indirectbr ptr %21, [label %7, label %15]
}

After

; *** IR Dump After SimplifyCFGPass on main ***
; Function Attrs: mustprogress norecurse uwtable
define dso_local noundef i32 @main() #0 {
  %1 = alloca [2 x ptr], align 16
  store ptr blockaddress(@main, %6), ptr %1, align 16, !tbaa !5
  %2 = getelementptr inbounds [2 x ptr], ptr %1, i64 0, i64 1
  store ptr blockaddress(@main, %12), ptr %2, align 8, !tbaa !5
  br label %3

3:                                                ; preds = %0, %10
  %4 = phi i32 [ 0, %0 ], [ %11, %10 ]
  %5 = phi i32 [ 0, %0 ], [ %5, %10 ]
  switch i32 %4, label %10 [
    i32 0, label %12
    i32 1, label %6
    i32 2, label %7
  ]

6:                                                ; preds = %3
  br label %10

7:                                                ; preds = %3
  %8 = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %5)
  %9 = add nsw i32 %5, 1
  br label %12

10:                                               ; preds = %3, %6
  %11 = phi i32 [ %4, %3 ], [ 2, %6 ]
  br label %3, !llvm.loop !9
12:                                               ; preds = %7, %3
  %13 = phi i32 [ %9, %7 ], [ %5, %3 ]
  %14 = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.1, i32 noundef %
  ret i32 0
}

@appujee
Copy link
Collaborator

appujee commented Aug 7, 2024

@DanAlbert
Copy link
Member

Do we need a separate fix for r28? That's currently on clang-r530567, and I think the plan was to keep on that one.

@pirama-arumuga-nainar
Copy link
Collaborator

The fix is already included in r28 (revision number for the fix is r523414).

@DanAlbert
Copy link
Member

Great, thanks for confirming.

@appujee
Copy link
Collaborator

appujee commented Aug 12, 2024

by the way, the patch i have uploaded 'hides' the issue. the real issue is still in llvm trunk. It could take a while to land a fix but I'm preparing a patch for review.

@DanAlbert
Copy link
Member

I see. Hidden well enough that it's not something a user would encounter any more, or just way less common?

@appujee
Copy link
Collaborator

appujee commented Aug 12, 2024

So the change in https://r.android.com/3211575 'updates the CFG' in a way that the later (buggy) pass can't see the problematic pattern anymore. Most likely the issue can't be reproduced in the new compiler but the current situation isn't ideal.

@DanAlbert
Copy link
Member

Right. So, fixed as far as any NDK developer would be able to tell, but there's room for improvement upstream. I'll mark it fixed here once we merge the fix into the NDK then.

@appujee
Copy link
Collaborator

appujee commented Aug 14, 2024

I believe llvm/llvm-project@fc6bdb8 is the patch causing the bug. I'm preparing a fix but reverting this should also work.

@appujee
Copy link
Collaborator

appujee commented Aug 14, 2024

https://r.android.com/3218272 reverts the above mentioned patch.

@SanjaLV
Copy link
Author

SanjaLV commented Sep 10, 2024

Hey, just wanted to update you that r27b indeed fixes minimal repro that was reported here, but sadly it still miscompiles the non-reduced code.

This is not a blocker for us, since we have disabled indirect goto optimizations for any r27 minor ndks builds, and any future NDK major release will have the proper fix.

I don't think it is worth re-opening the issue, because nobody else reported the same problem.

@pirama-arumuga-nainar
Copy link
Collaborator

Aditya has llvm/llvm-project#103688 to fix this more generally in upstream.

@DanAlbert
Copy link
Member

I don't think it is worth re-opening the issue, because nobody else reported the same problem.

We'll reopen in case we're able to opportunistically pick up the fix, but ack, we won't make a big deal out of it :)

@DanAlbert DanAlbert reopened this Sep 10, 2024
@appujee
Copy link
Collaborator

appujee commented Sep 11, 2024

Landed in upstream.

@appujee appujee changed the title [Bug]: Android NDK r27 rc2 miscompiles code with inderect gotos [Bug]: Android NDK r27 rc2 miscompiles code with indirect gotos Sep 11, 2024
@pirama-arumuga-nainar
Copy link
Collaborator

@appujee Can you backport to llvm-toolchain, llvm-r530567 and llvm-r522817 branches?

bvlgah pushed a commit to bvlgah/llvm_android_mirror that referenced this issue Sep 16, 2024
… code with inderect gotos

3c9022c965b Bail out jump threading on indirect branches (#103688)

This change is generated automatically by the script:
  cherrypick_cl.py --bug android/ndk#2040 --sha 3c9022c965b85951f30af140da591f819acef8a0 --start-version 522817

Bug: android/ndk#2040

Test: N/A
Change-Id: I467e15ba0b8795e4c8869fb304646b30415f1476
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Awaiting fix
Status: Needs prebuilt update
Status: Merged
Development

No branches or pull requests

4 participants