New GC: in-place compaction #2522

osa1 · 2021-05-06T11:03:51Z

This merges #2223 and master and enables the new collector with a new moc flag
--compacting-gc. The goal is to (1) avoid bitrot in #2223 (2) allow easy
testing and benchmarking of different collectors.

We include both collectors in the Wasms. Binary sizes grow between 0.8% (CanCan
backend) to 1.8% (in simple tests in run-drun).

Some benchmark results here:
#2033 (comment)

An improvement to the compacting GC is currently being implemented in branch
osa1/compacting_gc_3.

Native uses of alloc_array are duplicated in the native test files. Those will be removed when the tests are ported to Rust.

This also changes how we store the next free index a little bit. The constant `FULL` is gone, when the next free location is equal to the array length that's how we know the table is full now. Also, `FREE_SLOT` no longer next free location shifted left, it holds the next free location directly (not shifted).

- Introduced `as_blah` methods to convert a SkewedPtr or *Obj to other object types. These methods check the type in debug mode. - Improve error reporting, we know show details in assertion failures - Temporarily link with debug runtime to enable sanity checking by default for now

nomeata

Very nice! Only minor comments

rts/motoko-rts/src/gc/mark_compact.rs

nomeata · 2021-06-10T14:10:57Z

rts/motoko-rts/src/gc/mark_compact.rs

+    // NOTE: For this to work heap addresses need to be greater than the largest value for object
+    // headers. Currently this holds. TODO: Document this better.
+    let mut header = (*obj).tag;
+    while header > TAG_NULL {
+        // TODO: is `header > TAG_NULL` the best way to distinguish a tag from a pointer?


This is a bit hairy. Maybe leave a note in the cod where we define header tags that these tags must be smaller than any possible dynamic pointer?

Also, this would limit our wiggle room if we want to put variant tags or array lengths etc into the header… (#916)

I think the fact that you can distinguish headers from pointers is unfortunately fundamental to the threading algorithm, unless there's some other technique that can be used.

Right. It’ll take away roughly ¼ of the possible header values. Still plenty of room for other things.

…instead of None

crusso · 2021-06-10T14:29:02Z

rts/motoko-rts/src/gc/copying.rs

@@ -0,0 +1,242 @@
+use crate::alloc;


I assume the code in here has just moved, and isn't substantially changed right?

Yes, no need to review this file.

This file is now changed with 30d8b8f.

osa1 · 2021-06-10T14:32:21Z

rts/motoko-rts/src/mark_stack.rs

+    STACK_LEN += Words(1);
+}
+
+pub unsafe fn pop_mark_stack() -> Option<usize> {


I should try to get rid of this Option as well, I think it could also improve perf. (#2522 (comment))

I don't understand why but if I apply the same technique here the benchmark starts using more instructions. No idea why. The new pop_mark_stack is actually smaller in generated Wasm. This is the new version I've tried:

pub unsafe fn pop_mark_stack() -> usize { if STACK_LEN.0 == 0 { 0 } else { STACK_LEN -= Words(1); *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize) } }

Here's the patch for this change:

diff --git a/rts/motoko-rts-tests/src/mark_stack.rs b/rts/motoko-rts-tests/src/mark_stack.rs index 0499a31bf..4938f252a 100644 --- a/rts/motoko-rts-tests/src/mark_stack.rs +++ b/rts/motoko-rts-tests/src/mark_stack.rs @@ -24,7 +24,7 @@ fn test_(objs: Vec<usize>) -> TestResult { for obj in objs.iter().rev() { let popped = pop_mark_stack(); - if popped != Some(*obj) { + if popped != *obj { free_mark_stack(); // TODO: Does not really free return TestResult::error(format!( "Unexpected object popped, expected={:?}, popped={:?}", diff --git a/rts/motoko-rts/src/gc/mark_compact.rs b/rts/motoko-rts/src/gc/mark_compact.rs index ef8a814ee..5890d67db 100644 --- a/rts/motoko-rts/src/gc/mark_compact.rs +++ b/rts/motoko-rts/src/gc/mark_compact.rs @@ -80,7 +80,11 @@ unsafe fn push_mark_stack(obj: SkewedPtr, heap_base: u32) { } unsafe fn mark_stack(heap_base: u32) { - while let Some(obj) = pop_mark_stack() { + loop { + let obj = pop_mark_stack(); + if obj == 0 { + break; + } mark_fields(obj as *mut Obj, heap_base); } } diff --git a/rts/motoko-rts/src/mark_stack.rs b/rts/motoko-rts/src/mark_stack.rs index be87acc6f..1bf8145b9 100644 --- a/rts/motoko-rts/src/mark_stack.rs +++ b/rts/motoko-rts/src/mark_stack.rs @@ -50,12 +50,11 @@ pub unsafe fn push_mark_stack(obj: usize) { STACK_LEN += Words(1); } -pub unsafe fn pop_mark_stack() -> Option<usize> { +pub unsafe fn pop_mark_stack() -> usize { if STACK_LEN.0 == 0 { - None + 0 } else { STACK_LEN -= Words(1); - let p = *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize); - Some(p) + *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize) } }

Any idea why the cost went up?

Here's the diff of before vs. after this patch (mo-rts.wasm):

--- mo-rts.wat 2021-06-14 15:44:34.892004845 +0300 +++ ../../motoko/rts/mo-rts.wat 2021-06-14 15:44:34.180013747 +0300 @@ -19343,7 +19343,7 @@ local.get 5 i32.const 1 i32.add - local.set 10 + local.set 8 block ;; label = @1 local.get 5 i32.const 5 @@ -19352,7 +19352,7 @@ local.tee 2 i32.eqz br_if 0 (;@1;) - local.get 10 + local.get 8 i32.const 8 i32.add local.set 1 @@ -19402,7 +19402,6 @@ local.tee 2 i32.const 15796 i32.add - local.tee 8 local.get 1 i32.const -1 i32.add @@ -19419,21 +19418,27 @@ i32.const 8 i32.add i32.load + local.tee 1 + i32.eqz + br_if 1 (;@1;) + local.get 1 local.get 4 call $motoko_rts::gc::mark_compact::mark_fields::hc1bca295efbdeb41 - local.get 8 + global.get 1 + i32.const 15796 + i32.add i32.load local.tee 1 br_if 0 (;@2;) end end block ;; label = @1 - local.get 10 + local.get 8 i32.load offset=4 align=1 local.tee 2 i32.eqz br_if 0 (;@1;) - local.get 10 + local.get 8 i32.const 8 i32.add local.set 1

So the new version, after a lot of inlining, ends up being larger. No idea why though. This code is for compacting_gc, so mark_stack is inlined in mark_compact, which is inlined in compacting_gc. Somehow the version with this change has more stack traffic.

Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?

And maybe just maintain pointer to top of stack rather than computing from length on push and pop

And maybe just maintain pointer to top of stack rather than computing from length on push and pop

This made a benchmark use 2.3% less instructions. Pushed as 836992c.

Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?

We would need a conditional to avoid popping the sentinel value, right? Otherwise a push after popping the sentinel value would break the stack. So I think we can't remove the conditional in pop_mark_stack.

rts/motoko-rts/src/gc/mark_compact.rs

crusso · 2021-06-10T14:59:12Z

I guess one thing we could consider is using stable memory for the mark stack and bits, but maybe that's too slow. Certainly not for this PR.

Perf changes in CanCan backend: - Copying GC: -0.81% - Compacting GC: +3.06%

crusso · 2021-06-14T13:14:13Z

LGTM.

rts/motoko-rts/src/bitmap.rs

Co-authored-by: Claudio Russo <claudio@dfinity.org>

rts/motoko-rts/src/gc/copying.rs

crusso · 2021-06-14T13:39:12Z

rts/motoko-rts/src/gc/copying.rs

+) {
+    // The array and the objects pointed by the array are all static so we don't evacuate them. We
+    // only evacuate fields of objects in the array.
+    for i in 0..roots.len() {


Do we actually know that all the values in roots are pointers to heap objects and never unboxed value or even static objects? Otherwise this code might crash, right?

My reasoning is that env_static_roots eventually calls visit_pointer_fields, but there's no check that the obj is a proper object.

Yeah, so the static heap (other than static data/constants) have a an array of MutBoxes (roots in this function). These MutBoxes also live in static heap but they can point to dynamic heap. This code evacuates those MutBox fields.

Right, just remembered that after (re)seeing the explanation to a previous comment. Sorry about the gaslighting

Co-authored-by: Claudio Russo <claudio@dfinity.org>

Makes a CanCan backend benchmark use 2.3% less instructions

osa1 added 30 commits December 21, 2020 10:04

Move alloc_array to Rust

b8aeb64

Native uses of alloc_array are duplicated in the native test files. Those will be removed when the tests are ported to Rust.

Commit debug stuff

5eb1618

Move buf to Rust

822497d

[WIP] Trying to make lib compilable to native or wasi

cc5a954

Intro another Cargo.toml for generating rlib for RTS

8ff636b

Move closure table tests to Rust

61a77c0

Move alloc_blob to Rust, disable native tests for now

48cb96a

Move text functions to Rust [1/n]

f0f068d

Move text_iter_next to Rust

63e2b87

Move rest of text.c to Rust

3d6cb55

Move char to Rust

6db438e

Minor refactor

541c146

Move blob to Rust

1128ee5

Move utf8 stuff to Rust

fd013f2

Move leb128 stuff to Rust

6d214c0

Move version string, crc gen to Rust

bed0bcd

Move (s)leb128 encoding to Rust

19972d9

Move float functions to Rust

496a63e

Comments in alloc_blob

f0aabe0

Move tommath alloc functions to Rust

8cb19e2

Refactor Cargo files

3fc2b54

[WIP] Moving bigint functions to Rust

e7187ff

Fix BigInt GC broken in previous commits

1a4941b

Move rest of the bigint functions to Rust

aae504a

Minor bug fix, refactor

2819827

Moving principal id functions to Rust [1/n]

1eaabbd

Finish principal id stuff, start with idl

3f59326

Refactor trap functions, fix warnings

977aadf

nomeata reviewed Jun 10, 2021

View reviewed changes

Refactor bitmap iteration, use an invalid bit index as end-of-stream …

4b85578

…instead of None

crusso reviewed Jun 10, 2021

View reviewed changes

osa1 commented Jun 10, 2021

View reviewed changes

crusso reviewed Jun 10, 2021

View reviewed changes

rts/motoko-rts/src/gc/mark_compact.rs Show resolved Hide resolved

crusso reviewed Jun 10, 2021

View reviewed changes

rts/motoko-rts/src/gc/mark_compact.rs Outdated Show resolved Hide resolved

Fix bitmap tests after last commit

f23cedd

Update empty-actor test output

d2856d5

nomeata approved these changes Jun 10, 2021

View reviewed changes

osa1 added 5 commits June 14, 2021 09:30

Merge remote-tracking branch 'origin/master' into osa1/gc_flags

e2dec54

Remove invalid comment

fdf58cd

Free bitmap and mark stack in reverse order of allocation

51cabd5

Merge remote-tracking branch 'origin/master' into osa1/gc_flags

97f0f22

Add a ptr field visitor, use in both GCs

30d8b8f

Perf changes in CanCan backend: - Copying GC: -0.81% - Compacting GC: +3.06%

osa1 force-pushed the osa1/gc_flags branch from 9af5279 to 30d8b8f Compare June 14, 2021 11:20

crusso reviewed Jun 14, 2021

View reviewed changes

rts/motoko-rts/src/bitmap.rs Outdated Show resolved Hide resolved

Update rts/motoko-rts/src/bitmap.rs

644ec2e

Co-authored-by: Claudio Russo <claudio@dfinity.org>

crusso reviewed Jun 14, 2021

View reviewed changes

rts/motoko-rts/src/gc/copying.rs Outdated Show resolved Hide resolved

crusso reviewed Jun 14, 2021

View reviewed changes

osa1 and others added 4 commits June 15, 2021 08:24

Update rts/motoko-rts/src/gc/copying.rs

d7e6aa1

Co-authored-by: Claudio Russo <claudio@dfinity.org>

Merge remote-tracking branch 'origin/master' into osa1/gc_flags

c9a8df7

Optimize mark stack push/pop

836992c

Makes a CanCan backend benchmark use 2.3% less instructions

Merge remote-tracking branch 'origin/master' into osa1/gc_flags

5f7c3c4

osa1 added the automerge-squash When ready, merge (using squash) label Jun 15, 2021

mergify bot merged commit 2aa5073 into master Jun 15, 2021

mergify bot removed the automerge-squash When ready, merge (using squash) label Jun 15, 2021

osa1 deleted the osa1/gc_flags branch June 15, 2021 10:15

ggreif mentioned this pull request Jul 8, 2021

build: bump moc 0.6.1 -> 0.6.5 dfinity/sdk#1747

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New GC: in-place compaction #2522

New GC: in-place compaction #2522

osa1 commented May 6, 2021 •

edited

Loading

nomeata left a comment

nomeata Jun 10, 2021

crusso Jun 10, 2021

nomeata Jun 10, 2021

crusso Jun 10, 2021

osa1 Jun 10, 2021

osa1 Jun 14, 2021 •

edited

Loading

osa1 Jun 10, 2021 •

edited

Loading

crusso Jun 10, 2021

osa1 Jun 14, 2021

osa1 Jun 14, 2021

crusso Jun 14, 2021

osa1 Jun 14, 2021

crusso Jun 14, 2021

crusso Jun 14, 2021

osa1 Jun 15, 2021

osa1 Jun 15, 2021

crusso commented Jun 10, 2021 •

edited

Loading

crusso commented Jun 14, 2021

crusso Jun 14, 2021

crusso Jun 14, 2021 •

edited

Loading

osa1 Jun 14, 2021

crusso Jun 14, 2021

New GC: in-place compaction #2522

New GC: in-place compaction #2522

Conversation

osa1 commented May 6, 2021 • edited Loading

nomeata left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

osa1 Jun 14, 2021 • edited Loading

Choose a reason for hiding this comment

osa1 Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusso commented Jun 10, 2021 • edited Loading

crusso commented Jun 14, 2021

Choose a reason for hiding this comment

crusso Jun 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

osa1 commented May 6, 2021 •

edited

Loading

osa1 Jun 14, 2021 •

edited

Loading

osa1 Jun 10, 2021 •

edited

Loading

crusso commented Jun 10, 2021 •

edited

Loading

crusso Jun 14, 2021 •

edited

Loading