Skip to content

Project Blog Post #539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 2025sp
Choose a base branch
from
Open

Conversation

ananyagoenka
Copy link
Contributor

@ananyagoenka ananyagoenka commented May 13, 2025

Closes #499

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall! The design and implementation both sound solid here. I have a few questions about your eval results that would be great to address before we publish!

title = "Sailing Bril’s IR into Concurrent Waters"
[extra]
bio = """
Ananya Goenka is an undergraduate studying CS at Cornell. While she isn’t nerding out over programming languages or weird ISA quirks, she’s writing for Creme de Cornell or getting way too invested in obscure books.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magazine looks cool! Would you be interested in adding a hyperlink? 😃


We introduced two new opcodes:

* { "op": "spawn", "dest": "t", "type": "thread", "funcs": \["worker"\], "args": \["x", "y"\] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be a little easier to read if it were in Markdown code backticks.


* **join**: An effect operation that takes a single thread handle argument and blocks until the corresponding thread completes.

To prevent clients from forging thread IDs, we defined a new primitive type thread in our TypeScript definitions (bril.ts). This opaque type ensures only genuine spawn instructions can produce valid thread handles.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More code backticks are in order:

  • thread -> thread
  • bril.ts -> bril.ts

…and consider doing this to filename and TypeScript symbol names throughout.

2. **Shared heap**: All heap allocations (alloc, load, store) target a single global Heap instance, exposing potential data races—a faithful reflection of real-world concurrency.


### 2.3 Interpreter Implementation
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a quick overview here to note that this is about an implementation in the reference interpreter, which is written in TS and runs on Deno.


#### Stubbed Concurrency (Option A)

Our first pass implemented spawn/join synchronously in-process: spawn would directly call evalFunc(...) and immediately resolve, making join a no-op. This stub served as a correctness check and allowed us to validate the grammar and TypeScript types without introducing asynchrony.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of referring to internal interpreter implementation details (the evalFunc function), it might be a little clearer to just say abstractly how this works: we recursively call the interpreter to run the function directly, just like a function call.


3. The main isolate is busy servicing heap requests from _both_ workers. The event loop context‐switching and message queue flooding create contention, so neither the “workers” nor the main thread run at full core capacity.

I'd expect that this implementation of concurrency would help, on moderately coarse workloads** (e.g. 100 k or splitting a 100 × 100 matrix) still see _some_ parallelism, because the computation per RPC is nontrivial (simple arithmetic plus pointer arithmetic inside V8). In our tests, the sequential run was ~4 s, the concurrent ~45 s—still slower, but less horrific than the 10 M sum case.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray **.


3. The main isolate is busy servicing heap requests from _both_ workers. The event loop context‐switching and message queue flooding create contention, so neither the “workers” nor the main thread run at full core capacity.

I'd expect that this implementation of concurrency would help, on moderately coarse workloads** (e.g. 100 k or splitting a 100 × 100 matrix) still see _some_ parallelism, because the computation per RPC is nontrivial (simple arithmetic plus pointer arithmetic inside V8). In our tests, the sequential run was ~4 s, the concurrent ~45 s—still slower, but less horrific than the 10 M sum case.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is because of the parenthetical but l 16st track of which benchmark you're talking about at which point. Also, this bit is confusing:

because the computation per RPC is nontrivial (simple arithmetic plus pointer arithmetic inside V8)

Is the computation per RPC simple or complex?


I'd expect that this implementation of concurrency would help, on moderately coarse workloads** (e.g. 100 k or splitting a 100 × 100 matrix) still see _some_ parallelism, because the computation per RPC is nontrivial (simple arithmetic plus pointer arithmetic inside V8). In our tests, the sequential run was ~4 s, the concurrent ~45 s—still slower, but less horrific than the 10 M sum case.

Some lessons/thoughts.  To actually _win_ with real parallelism under this design, we must batch memory operations. For example, transform long loops into single RPC calls that process entire slices (e.g. “sum these 1 000 elements” in one go), amortizing the message‐passing cost.SharedArrayBuffers could eliminate RPC entirely by mapping our Bril heap into a typed array visible in all workers. Then each load/store is a direct memory access, and you’d see true multicore speedups on large-N benchmarks. For an intermediate step, we could group every 1 000 loads/stores into one batched message, cutting messaging overhead by two orders of magnitude, which should already push the breakeven point down toward the 10 M-element range.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a space before SharedArrayBuffers.


[escape] main: 0/1 allocs are thread-local # spawn causes escape
[escape] mixed: 1/2 allocs are thread-local # one local, one escaping
```
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just pasting the output of your tool can you explain what this is saying? What are "main" and "mixed"? Why are the numbers so small (1 and 2)?


* **Interprocedural Escape Analysis**: Extend escape.py to track pointers across function boundaries and calls, increasing precision and enabling stack-based allocation for truly local objects.

* **Robust Testing Harness**: Integrate with continuous integration (CI) to run our concurrency and escape-analysis suites on every commit, ensuring regressions are caught early.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, aren’t your interpreter tests enabled by default already? It also seems pretty easy to add your escape analysis tests to the main test suite, assuming they already use Turnt.

@sampsyo sampsyo added the 2025sp label May 16, 2025
@sampsyo
Copy link
Owner

sampsyo commented May 28, 2025

Hi, @ananyagoenka! I would love to publish your blog post. Can you please wrap up the revisions discussed above so I can hit the green button?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Project Proposal: Concurrency Extension for Bril
2 participants