Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New GC: in-place compaction #2522

Merged
merged 177 commits into from
Jun 15, 2021
Merged

New GC: in-place compaction #2522

merged 177 commits into from
Jun 15, 2021

Conversation

osa1
Copy link
Contributor

@osa1 osa1 commented May 6, 2021

This merges #2223 and master and enables the new collector with a new moc flag
--compacting-gc. The goal is to (1) avoid bitrot in #2223 (2) allow easy
testing and benchmarking of different collectors.

We include both collectors in the Wasms. Binary sizes grow between 0.8% (CanCan
backend) to 1.8% (in simple tests in run-drun).

Some benchmark results here:
#2033 (comment)

An improvement to the compacting GC is currently being implemented in branch
osa1/compacting_gc_3.

Native uses of alloc_array are duplicated in the native test files.
Those will be removed when the tests are ported to Rust.
This also changes how we store the next free index a little bit. The
constant `FULL` is gone, when the next free location is equal to the
array length that's how we know the table is full now. Also, `FREE_SLOT`
no longer next free location shifted left, it holds the next free
location directly (not shifted).
- Introduced `as_blah` methods to convert a SkewedPtr or *Obj to other
  object types. These methods check the type in debug mode.

- Improve error reporting, we know show details in assertion failures

- Temporarily link with debug runtime to enable sanity checking by
  default for now
Copy link
Collaborator

@nomeata nomeata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Only minor comments

rts/motoko-rts/src/gc/mark_compact.rs Outdated Show resolved Hide resolved
rts/motoko-rts/src/gc/mark_compact.rs Outdated Show resolved Hide resolved
rts/motoko-rts/src/gc/mark_compact.rs Outdated Show resolved Hide resolved
Comment on lines +300 to +304
// NOTE: For this to work heap addresses need to be greater than the largest value for object
// headers. Currently this holds. TODO: Document this better.
let mut header = (*obj).tag;
while header > TAG_NULL {
// TODO: is `header > TAG_NULL` the best way to distinguish a tag from a pointer?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hairy. Maybe leave a note in the cod where we define header tags that these tags must be smaller than any possible dynamic pointer?

Also, this would limit our wiggle room if we want to put variant tags or array lengths etc into the header… (#916)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fact that you can distinguish headers from pointers is unfortunately fundamental to the threading algorithm, unless there's some other technique that can be used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It’ll take away roughly ¼ of the possible header values. Still plenty of room for other things.

@@ -0,0 +1,242 @@
use crate::alloc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the code in here has just moved, and isn't substantially changed right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no need to review this file.

Copy link
Contributor Author

@osa1 osa1 Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is now changed with 30d8b8f.

STACK_LEN += Words(1);
}

pub unsafe fn pop_mark_stack() -> Option<usize> {
Copy link
Contributor Author

@osa1 osa1 Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should try to get rid of this Option as well, I think it could also improve perf. (#2522 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why but if I apply the same technique here the benchmark starts using more instructions. No idea why. The new pop_mark_stack is actually smaller in generated Wasm. This is the new version I've tried:

pub unsafe fn pop_mark_stack() -> usize {
    if STACK_LEN.0 == 0 {
        0
    } else {
        STACK_LEN -= Words(1);
        *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize)
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the patch for this change:

diff --git a/rts/motoko-rts-tests/src/mark_stack.rs b/rts/motoko-rts-tests/src/mark_stack.rs
index 0499a31bf..4938f252a 100644
--- a/rts/motoko-rts-tests/src/mark_stack.rs
+++ b/rts/motoko-rts-tests/src/mark_stack.rs
@@ -24,7 +24,7 @@ fn test_(objs: Vec<usize>) -> TestResult {
 
         for obj in objs.iter().rev() {
             let popped = pop_mark_stack();
-            if popped != Some(*obj) {
+            if popped != *obj {
                 free_mark_stack(); // TODO: Does not really free
                 return TestResult::error(format!(
                     "Unexpected object popped, expected={:?}, popped={:?}",
diff --git a/rts/motoko-rts/src/gc/mark_compact.rs b/rts/motoko-rts/src/gc/mark_compact.rs
index ef8a814ee..5890d67db 100644
--- a/rts/motoko-rts/src/gc/mark_compact.rs
+++ b/rts/motoko-rts/src/gc/mark_compact.rs
@@ -80,7 +80,11 @@ unsafe fn push_mark_stack(obj: SkewedPtr, heap_base: u32) {
 }
 
 unsafe fn mark_stack(heap_base: u32) {
-    while let Some(obj) = pop_mark_stack() {
+    loop {
+        let obj = pop_mark_stack();
+        if obj == 0 {
+            break;
+        }
         mark_fields(obj as *mut Obj, heap_base);
     }
 }
diff --git a/rts/motoko-rts/src/mark_stack.rs b/rts/motoko-rts/src/mark_stack.rs
index be87acc6f..1bf8145b9 100644
--- a/rts/motoko-rts/src/mark_stack.rs
+++ b/rts/motoko-rts/src/mark_stack.rs
@@ -50,12 +50,11 @@ pub unsafe fn push_mark_stack(obj: usize) {
     STACK_LEN += Words(1);
 }
 
-pub unsafe fn pop_mark_stack() -> Option<usize> {
+pub unsafe fn pop_mark_stack() -> usize {
     if STACK_LEN.0 == 0 {
-        None
+        0
     } else {
         STACK_LEN -= Words(1);
-        let p = *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize);
-        Some(p)
+        *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize)
     }
 }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why the cost went up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the diff of before vs. after this patch (mo-rts.wasm):

--- mo-rts.wat  2021-06-14 15:44:34.892004845 +0300
+++ ../../motoko/rts/mo-rts.wat 2021-06-14 15:44:34.180013747 +0300
@@ -19343,7 +19343,7 @@
     local.get 5
     i32.const 1
     i32.add
-    local.set 10
+    local.set 8
     block  ;; label = @1
       local.get 5
       i32.const 5
@@ -19352,7 +19352,7 @@
       local.tee 2
       i32.eqz
       br_if 0 (;@1;)
-      local.get 10
+      local.get 8
       i32.const 8
       i32.add
       local.set 1
@@ -19402,7 +19402,6 @@
         local.tee 2
         i32.const 15796
         i32.add
-        local.tee 8
         local.get 1
         i32.const -1
         i32.add
@@ -19419,21 +19418,27 @@
         i32.const 8
         i32.add
         i32.load
+        local.tee 1
+        i32.eqz
+        br_if 1 (;@1;)
+        local.get 1
         local.get 4
         call $motoko_rts::gc::mark_compact::mark_fields::hc1bca295efbdeb41
-        local.get 8
+        global.get 1
+        i32.const 15796
+        i32.add
         i32.load
         local.tee 1
         br_if 0 (;@2;)
       end
     end
     block  ;; label = @1
-      local.get 10
+      local.get 8
       i32.load offset=4 align=1
       local.tee 2
       i32.eqz
       br_if 0 (;@1;)
-      local.get 10
+      local.get 8
       i32.const 8
       i32.add
       local.set 1

So the new version, after a lot of inlining, ends up being larger. No idea why though. This code is for compacting_gc, so mark_stack is inlined in mark_compact, which is inlined in compacting_gc. Somehow the version with this change has more stack traffic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe just maintain pointer to top of stack rather than computing from length on push and pop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe just maintain pointer to top of stack rather than computing from length on push and pop

This made a benchmark use 2.3% less instructions. Pushed as 836992c.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?

We would need a conditional to avoid popping the sentinel value, right? Otherwise a push after popping the sentinel value would break the stack. So I think we can't remove the conditional in pop_mark_stack.

@crusso
Copy link
Contributor

crusso commented Jun 10, 2021

I guess one thing we could consider is using stable memory for the mark stack and bits, but maybe that's too slow. Certainly not for this PR.

@crusso
Copy link
Contributor

crusso commented Jun 14, 2021

LGTM.

Co-authored-by: Claudio Russo <claudio@dfinity.org>
) {
// The array and the objects pointed by the array are all static so we don't evacuate them. We
// only evacuate fields of objects in the array.
for i in 0..roots.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually know that all the values in roots are pointers to heap objects and never unboxed value or even static objects? Otherwise this code might crash, right?

Copy link
Contributor

@crusso crusso Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is that env_static_roots eventually calls visit_pointer_fields, but there's no check that the obj is a proper object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the static heap (other than static data/constants) have a an array of MutBoxes (roots in this function). These MutBoxes also live in static heap but they can point to dynamic heap. This code evacuates those MutBox fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just remembered that after (re)seeing the explanation to a previous comment. Sorry about the gaslighting

@osa1 osa1 added the automerge-squash When ready, merge (using squash) label Jun 15, 2021
@mergify mergify bot merged commit 2aa5073 into master Jun 15, 2021
@mergify mergify bot removed the automerge-squash When ready, merge (using squash) label Jun 15, 2021
@osa1 osa1 deleted the osa1/gc_flags branch June 15, 2021 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants