Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAVM WASM runtime failing just after startup #20134

Closed
ghost opened this issue Feb 28, 2022 · 9 comments
Closed

WAVM WASM runtime failing just after startup #20134

ghost opened this issue Feb 28, 2022 · 9 comments
Labels
area/wasm bug stale stalebot believes this issue/PR has not been touched recently

Comments

@ghost
Copy link

ghost commented Feb 28, 2022

I've built envoy with the WAVM WASM runtime enabled to allow me to compare performance of code running in the v8 runtime and the WAVM runtime.

(I have a question open on the envoy-dev Google group for how to do this properly. For now I've adjusted the bazel.release.server_only build to include "--define wasm=wavm".)

I see the following crash when starting to execute WASM code in an extension. This code works fine on the v8 runtime.

as-ext-proxy-1 | [2022-02-28 15:00:28.319][7][debug][wasm] [source/extensions/common/wasm/wasm.cc:88] Base Wasm created 1 now active
as-ext-proxy-1 | [2022-02-28 15:00:28.792][7][debug][wasm] [source/extensions/common/wasm/wasm.cc:106] Thread-Local Wasm created 2 now active
as-ext-proxy-1 | [2022-02-28 15:00:28.794][7][debug][wasm] [source/extensions/common/wasm/context.cc:1167] wasm log ZTWorker zt_worker_vm: ensureRootContext(root_context_id: 1)
as-ext-proxy-1 | [2022-02-28 15:00:28.796][7][debug][wasm] [source/extensions/common/wasm/context.cc:1167] wasm log ZTWorker zt_worker_vm: Registering new root context for ZTWorker with id: 1
as-ext-proxy-1 | [2022-02-28 15:00:28.797][7][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_context_create failed: wavm.outOfBoundsMemoryAccess
as-ext-proxy-1 | Proxy-Wasm plugin in-VM backtrace:
as-ext-proxy-1 | 1: wasm!zt_worker_vm!<function #62>+278
as-ext-proxy-1 | 2: wasm!zt_worker_vm!<function #65>+7
as-ext-proxy-1 |
as-ext-proxy-1 | [2022-02-28 15:00:28.799][7][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_vm_start failed: wavm.outOfBoundsMemoryAccess
as-ext-proxy-1 | Proxy-Wasm plugin in-VM backtrace:
as-ext-proxy-1 | 1: wasm!zt_worker_vm!<function #40>+3
as-ext-proxy-1 | 2: wasm!zt_worker_vm!<function #41>+2
as-ext-proxy-1 |
as-ext-proxy-1 | [2022-02-28 15:00:28.799][7][error][wasm] [source/extensions/common/wasm/wasm.cc:109] Wasm VM failed Failed to start base Wasm
...
as-ext-proxy-1 | [2022-02-28 15:00:28.800][7][critical][main] [source/server/server.cc:117] error initializing configuration '/etc/envoy-wavm.yaml': Unable to create Wasm HTTP filter ZTWorker

What is the status of the WAVM runtime? Is this expected to work?

If so, how do I go about collecting debug information to help determine the problem?

Thanks,

Frank.

@ghost ghost added bug triage Issue requires triage labels Feb 28, 2022
@rojkov rojkov added area/wasm and removed triage Issue requires triage labels Mar 1, 2022
@rojkov
Copy link
Member

rojkov commented Mar 1, 2022

/cc @PiotrSikora @mathetake @lizan

@ghost
Copy link
Author

ghost commented Mar 1, 2022

Note: this conversation on Google Groups is related to this question: https://groups.google.com/g/envoy-dev/c/pRWpAfix8K8

@daixiang0
Copy link
Member

ping @envoyproxy/wasm-dev

@PiotrSikora
Copy link
Contributor

wavm.outOfBoundsMemoryAccess suggests it's reading outside of the allocated memory space.

Are you using precompiled modules? There might be a version mismatch between the compiler and WAVM in Envoy, since the precompiled section generated by WAVM isn't versioned (...and if you don't, then I wouldn't recommend using WAVM in production, since the compilation at load-time takes prohibitively high amount of time).

It's also possible that you're running out of heap or stack, but I'd expect a different message then.

Do you have a minimal reproducer (config + Proxy-Wasm plugin sources) that you could share, by any chance?

As for the debugging, you can start Envoy with logging at the trace level (-l trace), which will print calls across the Wasm boundary. Unfortunately, there is no good story for debugging code running inside WasmVM, other than via application logs.

@ghost
Copy link
Author

ghost commented Mar 2, 2022

Thanks for the update.

I'll work at getting a minimal reproducer and check if the WASM I'm supplying is pre-compiled.

The trace level logging is not showing anything new. Here is the first mention of wasm to the crash. The only trace log has a blank message.

as-ext-proxy-1 | [2022-03-02 11:37:48.085][10][debug][wasm] [external/envoy/source/extensions/common/wasm/wasm.cc:88] Base Wasm created 1 now active
as-ext-proxy-1 | [2022-03-02 11:37:48.085][10][trace][wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:39]
as-ext-proxy-1 | [2022-03-02 11:37:55.123][10][debug][wasm] [external/envoy/source/extensions/common/wasm/wasm.cc:106] Thread-Local Wasm created 2 now active
as-ext-proxy-1 | [2022-03-02 11:37:55.139][10][debug][wasm] [external/envoy/source/extensions/common/wasm/context.cc:1167] wasm log ZTWorker zt_worker_vm: ensureRootContext(root_context_id: 1)
as-ext-proxy-1 | [2022-03-02 11:37:55.139][10][debug][wasm] [external/envoy/source/extensions/common/wasm/context.cc:1167] wasm log ZTWorker zt_worker_vm: Current context_map:
as-ext-proxy-1 | [2022-03-02 11:37:55.155][10][debug][wasm] [external/envoy/source/extensions/common/wasm/context.cc:1167] wasm log ZTWorker zt_worker_vm: Registering new root context for ZTWorker with id: 1
as-ext-proxy-1 | [2022-03-02 11:37:55.171][10][error][wasm] [external/envoy/source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_context_create failed: wavm.outOfBoundsMemoryAccess
as-ext-proxy-1 | Proxy-Wasm plugin in-VM backtrace:
as-ext-proxy-1 | 1: wasm!zt_worker_vm!<function #62>+278
as-ext-proxy-1 | 2: wasm!zt_worker_vm!<function #65>+7

How do we interpret the stack trace? What are the function #s? and the +278 elements after the function? Are they referring to the WASM code or the WAVM runtime?

(BTW, I see a 7 second delay between the blank trace line and the next log... I assume this is WAVM compiling the WASM code. Correct? Is this needed for the startup of each WAVM VM or do they share compiled code?)

@PiotrSikora
Copy link
Contributor

How do we interpret the stack trace? What are the function #s? and the +278 elements after the function? Are they referring to the WASM code or the WAVM runtime?

It looks like the Wasm module that you're using has debug symbols stripped, which is why you see function numbers instead of the function names.

(BTW, I see a 7 second delay between the blank trace line and the next log... I assume this is WAVM compiling the WASM code. Correct? Is this needed for the startup of each WAVM VM or do they share compiled code?)

OK, so you're not using precompiled modules.

Wasm module is compiled once for each WasmVM instance, and It's shared across worker threads via cloning of compiled cone, but I don't know if there is any built-in cache in WAVM to share compiled code across various WasmVM instances (again, using WAVM without precompiled modules isn't practical).

@ghost
Copy link
Author

ghost commented Mar 2, 2022

Here is a minimal repro: https://github.com/frankt-vmware/20134-repro (@162168d)

This shows approximately the same crash when loading a service extension:

20134-repro-proxy-1 | [2022-03-02 17:00:47.228][8][debug][wasm] [source/extensions/common/wasm/context.cc:1167] wasm log SharedService shared_service_vm: Registering new root context for SharedService with id: 1
20134-repro-proxy-1 | [2022-03-02 17:00:47.229][8][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_context_create failed: wavm.outOfBoundsMemoryAccess
20134-repro-proxy-1 | Proxy-Wasm plugin in-VM backtrace:
20134-repro-proxy-1 | 1: wasm!shared_service_vm!<function #62>+276
20134-repro-proxy-1 | 2: wasm!shared_service_vm!<function #64>+7

We can see it crashing in the proxy-runtime library:
service-min/node_modules/@solo-io/proxy-runtime/assembly/runtime.ts
ensureRootContext() (from line 994)

So we are executing WASM code in the extension.

Interestingly, I also updated to more recent dependent libraries for
assemblyscript(/loader) and I get a different error:

as-ext-proxy-1 | [2022-03-02 16:23:02.010][6][debug][wasm] [source/extensions/common/wasm/wasm.cc:88] Base Wasm created 1 now active
as-ext-proxy-1 | [2022-03-02 16:23:02.011][6][trace][wasm] [source/extensions/common/wasm/wasm_vm.cc:39]
as-ext-proxy-1 | [2022-03-02 16:23:02.368][6][debug][wasm] [source/extensions/common/wasm/wasm.cc:106] Thread-Local Wasm created 2 now active
as-ext-proxy-1 | unhandled SIGSEGV
as-ext-proxy-1 | Call stack:
as-ext-proxy-1 | /usr/local/bin/envoy-wavm4+30069222
as-ext-proxy-1 | /usr/local/bin/envoy-wavm4+31036397
as-ext-proxy-1 | /lib/x86_64-linux-gnu/libpthread.so.0+76159
as-ext-proxy-1 |

This looks like we don't even get to running the extension code at all. I
can provide the updated repro code for this if you want.

As I mentioned on the Google Groups thread. If v8 is already JIT compiled and
WAVM is unlikely to result in better performance then I am happy to drop
this... alternatively I can continue to help track this issue down if wanted.

@github-actions
Copy link

github-actions bot commented Apr 1, 2022

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 1, 2022
@github-actions
Copy link

github-actions bot commented Apr 8, 2022

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as completed Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/wasm bug stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

3 participants