-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New GC: in-place compaction #2522
Conversation
Native uses of alloc_array are duplicated in the native test files. Those will be removed when the tests are ported to Rust.
This also changes how we store the next free index a little bit. The constant `FULL` is gone, when the next free location is equal to the array length that's how we know the table is full now. Also, `FREE_SLOT` no longer next free location shifted left, it holds the next free location directly (not shifted).
- Introduced `as_blah` methods to convert a SkewedPtr or *Obj to other object types. These methods check the type in debug mode. - Improve error reporting, we know show details in assertion failures - Temporarily link with debug runtime to enable sanity checking by default for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Only minor comments
// NOTE: For this to work heap addresses need to be greater than the largest value for object | ||
// headers. Currently this holds. TODO: Document this better. | ||
let mut header = (*obj).tag; | ||
while header > TAG_NULL { | ||
// TODO: is `header > TAG_NULL` the best way to distinguish a tag from a pointer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hairy. Maybe leave a note in the cod where we define header tags that these tags must be smaller than any possible dynamic pointer?
Also, this would limit our wiggle room if we want to put variant tags or array lengths etc into the header… (#916)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fact that you can distinguish headers from pointers is unfortunately fundamental to the threading algorithm, unless there's some other technique that can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. It’ll take away roughly ¼ of the possible header values. Still plenty of room for other things.
@@ -0,0 +1,242 @@ | |||
use crate::alloc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the code in here has just moved, and isn't substantially changed right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no need to review this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is now changed with 30d8b8f.
STACK_LEN += Words(1); | ||
} | ||
|
||
pub unsafe fn pop_mark_stack() -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should try to get rid of this Option
as well, I think it could also improve perf. (#2522 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why but if I apply the same technique here the benchmark starts using more instructions. No idea why. The new pop_mark_stack
is actually smaller in generated Wasm. This is the new version I've tried:
pub unsafe fn pop_mark_stack() -> usize {
if STACK_LEN.0 == 0 {
0
} else {
STACK_LEN -= Words(1);
*(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the patch for this change:
diff --git a/rts/motoko-rts-tests/src/mark_stack.rs b/rts/motoko-rts-tests/src/mark_stack.rs
index 0499a31bf..4938f252a 100644
--- a/rts/motoko-rts-tests/src/mark_stack.rs
+++ b/rts/motoko-rts-tests/src/mark_stack.rs
@@ -24,7 +24,7 @@ fn test_(objs: Vec<usize>) -> TestResult {
for obj in objs.iter().rev() {
let popped = pop_mark_stack();
- if popped != Some(*obj) {
+ if popped != *obj {
free_mark_stack(); // TODO: Does not really free
return TestResult::error(format!(
"Unexpected object popped, expected={:?}, popped={:?}",
diff --git a/rts/motoko-rts/src/gc/mark_compact.rs b/rts/motoko-rts/src/gc/mark_compact.rs
index ef8a814ee..5890d67db 100644
--- a/rts/motoko-rts/src/gc/mark_compact.rs
+++ b/rts/motoko-rts/src/gc/mark_compact.rs
@@ -80,7 +80,11 @@ unsafe fn push_mark_stack(obj: SkewedPtr, heap_base: u32) {
}
unsafe fn mark_stack(heap_base: u32) {
- while let Some(obj) = pop_mark_stack() {
+ loop {
+ let obj = pop_mark_stack();
+ if obj == 0 {
+ break;
+ }
mark_fields(obj as *mut Obj, heap_base);
}
}
diff --git a/rts/motoko-rts/src/mark_stack.rs b/rts/motoko-rts/src/mark_stack.rs
index be87acc6f..1bf8145b9 100644
--- a/rts/motoko-rts/src/mark_stack.rs
+++ b/rts/motoko-rts/src/mark_stack.rs
@@ -50,12 +50,11 @@ pub unsafe fn push_mark_stack(obj: usize) {
STACK_LEN += Words(1);
}
-pub unsafe fn pop_mark_stack() -> Option<usize> {
+pub unsafe fn pop_mark_stack() -> usize {
if STACK_LEN.0 == 0 {
- None
+ 0
} else {
STACK_LEN -= Words(1);
- let p = *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize);
- Some(p)
+ *(STACK_PTR.payload_addr() as *mut usize).add(STACK_LEN.0 as usize)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why the cost went up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the diff of before vs. after this patch (mo-rts.wasm):
--- mo-rts.wat 2021-06-14 15:44:34.892004845 +0300
+++ ../../motoko/rts/mo-rts.wat 2021-06-14 15:44:34.180013747 +0300
@@ -19343,7 +19343,7 @@
local.get 5
i32.const 1
i32.add
- local.set 10
+ local.set 8
block ;; label = @1
local.get 5
i32.const 5
@@ -19352,7 +19352,7 @@
local.tee 2
i32.eqz
br_if 0 (;@1;)
- local.get 10
+ local.get 8
i32.const 8
i32.add
local.set 1
@@ -19402,7 +19402,6 @@
local.tee 2
i32.const 15796
i32.add
- local.tee 8
local.get 1
i32.const -1
i32.add
@@ -19419,21 +19418,27 @@
i32.const 8
i32.add
i32.load
+ local.tee 1
+ i32.eqz
+ br_if 1 (;@1;)
+ local.get 1
local.get 4
call $motoko_rts::gc::mark_compact::mark_fields::hc1bca295efbdeb41
- local.get 8
+ global.get 1
+ i32.const 15796
+ i32.add
i32.load
local.tee 1
br_if 0 (;@2;)
end
end
block ;; label = @1
- local.get 10
+ local.get 8
i32.load offset=4 align=1
local.tee 2
i32.eqz
br_if 0 (;@1;)
- local.get 10
+ local.get 8
i32.const 8
i32.add
local.set 1
So the new version, after a lot of inlining, ends up being larger. No idea why though. This code is for compacting_gc
, so mark_stack
is inlined in mark_compact
, which is inlined in compacting_gc
. Somehow the version with this change has more stack traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe just maintain pointer to top of stack rather than computing from length on push and pop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe just maintain pointer to top of stack rather than computing from length on push and pop
This made a benchmark use 2.3% less instructions. Pushed as 836992c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be shorter if you just stored the sentinel value 0 at the bottom of the stack and avoided doing the conditional testing on pop?
We would need a conditional to avoid popping the sentinel value, right? Otherwise a push after popping the sentinel value would break the stack. So I think we can't remove the conditional in pop_mark_stack
.
I guess one thing we could consider is using stable memory for the mark stack and bits, but maybe that's too slow. Certainly not for this PR. |
Perf changes in CanCan backend: - Copying GC: -0.81% - Compacting GC: +3.06%
LGTM. |
Co-authored-by: Claudio Russo <claudio@dfinity.org>
) { | ||
// The array and the objects pointed by the array are all static so we don't evacuate them. We | ||
// only evacuate fields of objects in the array. | ||
for i in 0..roots.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually know that all the values in roots are pointers to heap objects and never unboxed value or even static objects? Otherwise this code might crash, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning is that env_static_roots eventually calls visit_pointer_fields, but there's no check that the obj is a proper object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so the static heap (other than static data/constants) have a an array of MutBox
es (roots
in this function). These MutBox
es also live in static heap but they can point to dynamic heap. This code evacuates those MutBox
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, just remembered that after (re)seeing the explanation to a previous comment. Sorry about the gaslighting
Co-authored-by: Claudio Russo <claudio@dfinity.org>
Makes a CanCan backend benchmark use 2.3% less instructions
This merges #2223 and master and enables the new collector with a new moc flag
--compacting-gc
. The goal is to (1) avoid bitrot in #2223 (2) allow easytesting and benchmarking of different collectors.
We include both collectors in the Wasms. Binary sizes grow between 0.8% (CanCan
backend) to 1.8% (in simple tests in
run-drun
).Some benchmark results here:
#2033 (comment)
An improvement to the compacting GC is currently being implemented in branch
osa1/compacting_gc_3
.