This heap emulates the growth characteristics of the Ruby default GC's
heap. By default, the heap grows by 40%, requires at least 20% empty
after a GC, and allows at most 65% empty before it shrinks the heap. This
is all configurable via the same environment variables the default GC
uses (`RUBY_GC_HEAP_FREE_SLOTS_GOAL_RATIO`, `RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO`,
`RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO`, respectively).
The Ruby heap can be enabled via the `MMTK_HEAP_MODE=ruby` environment
variable.
Compared to the dynamic heap in MMTk (which uses the MemBalancer algorithm),
the Ruby heap allows the heap to grow more generously, which uses a bit
more memory but offers significant performance gains because it runs GC
much less frequently.
We can see in the benchmarks below that this Ruby heap heap gives faster
performance than the dynamic heap in every benchmark, with over 2x faster
in many of them. We see that memory is often around 10-20% higher with
certain outliers that use significantly more memory like hexapdf and
erubi-rails. We can also see that this brings MMTk's Ruby heap much
closer in performance to the default GC.
Ruby heap benchmark results:
-------------- -------------- ---------- ---------
bench ruby heap (ms) stddev (%) RSS (MiB)
activerecord 233.6 10.7 85.9
chunky-png 457.1 1.1 79.3
erubi-rails 1148.0 3.8 133.3
hexapdf 1570.5 2.4 403.0
liquid-c 42.8 5.3 43.4
liquid-compile 41.3 7.6 52.6
liquid-render 102.8 3.8 55.3
lobsters 651.9 8.0 426.3
mail 106.4 1.8 67.2
psych-load 1552.1 0.8 43.4
railsbench 1707.2 6.0 145.6
rubocop 127.2 15.3 148.8
ruby-lsp 136.6 11.7 113.7
sequel 47.2 5.9 44.4
shipit 1197.5 3.6 301.0
-------------- -------------- ---------- ---------
Dynamic heap benchmark results:
-------------- ----------------- ---------- ---------
bench dynamic heap (ms) stddev (%) RSS (MiB)
activerecord 845.3 3.1 76.7
chunky-png 525.9 0.4 38.9
erubi-rails 2694.9 3.4 115.8
hexapdf 2344.8 5.6 164.9
liquid-c 73.7 5.0 40.5
liquid-compile 107.1 6.8 40.3
liquid-render 147.2 1.7 39.5
lobsters 697.6 4.5 342.0
mail 224.6 2.1 64.0
psych-load 4326.7 0.6 37.4
railsbench 3218.0 5.5 124.7
rubocop 203.6 6.1 110.9
ruby-lsp 350.7 3.2 79.0
sequel 121.8 2.5 39.6
shipit 1510.1 3.1 220.8
-------------- ----------------- ---------- ---------
Default GC benchmark results:
-------------- --------------- ---------- ---------
bench default GC (ms) stddev (%) RSS (MiB)
activerecord 148.4 0.6 67.9
chunky-png 440.2 0.7 57.0
erubi-rails 722.7 0.3 97.8
hexapdf 1466.2 1.7 254.3
liquid-c 32.5 3.6 42.3
liquid-compile 31.2 1.9 35.4
liquid-render 88.3 0.7 30.8
lobsters 633.6 7.0 305.4
mail 76.6 1.6 53.2
psych-load 1166.2 1.3 29.1
railsbench 1262.9 2.3 114.7
rubocop 105.6 0.8 95.4
ruby-lsp 101.6 1.4 75.4
sequel 27.4 1.2 33.1
shipit 1083.1 1.5 163.4
-------------- --------------- ---------- ---------
https://github.com/ruby/mmtk/commit/c0ca29922d
Adding a fast path for bump pointer allocator can improve allocation
performance.
For the following microbenchmark with MMTK_HEAP_MIN=100MiB:
10_000_000.times { String.new }
Before:
810.7 ms ± 8.3 ms [User: 790.9 ms, System: 40.3 ms]
After:
777.9 ms ± 10.4 ms [User: 759.0 ms, System: 37.9 ms]
https://github.com/ruby/mmtk/commit/0ff5c9f579
This guard was removed in https://github.com/ruby/ruby/pull/13497
on the justification that some GC may need to be notified even for
immediate.
But the two currently available GCs don't, and there are plenty
of assumtions GCs don't everywhere, notably in YJIT and ZJIT.
This optimization is also not so micro (but not huge either).
I routinely see 1-2% wasted there on micro-benchmarks.
So perhaps if in the future we actually need this, it might make
sense to introduce a way for GCs to declare that as an option,
but in the meantime it's extra overhead with little gain.
This reverts commit 228d13f6ed914d1e7f6bd2416e3f5be8283be865.
This commit makes default.c and mmtk.c depend on shape.h, which prevents
them from building independently.
Attempt to fix the following SEGV:
```
ruby(gc_mark) ../src/gc/default/default.c:4429
ruby(gc_mark_children+0x45) [0x560b380bf8b5] ../src/gc/default/default.c:4625
ruby(gc_mark_stacked_objects) ../src/gc/default/default.c:4647
ruby(gc_mark_stacked_objects_all) ../src/gc/default/default.c:4685
ruby(gc_marks_rest) ../src/gc/default/default.c:5707
ruby(gc_marks+0x4e7) [0x560b380c41c1] ../src/gc/default/default.c:5821
ruby(gc_start) ../src/gc/default/default.c:6502
ruby(heap_prepare+0xa4) [0x560b380c4efc] ../src/gc/default/default.c:2074
ruby(heap_next_free_page) ../src/gc/default/default.c:2289
ruby(newobj_cache_miss) ../src/gc/default/default.c:2396
ruby(RB_SPECIAL_CONST_P+0x0) [0x560b380c5df4] ../src/gc/default/default.c:2420
ruby(RB_BUILTIN_TYPE) ../src/include/ruby/internal/value_type.h:184
ruby(newobj_init) ../src/gc/default/default.c:2136
ruby(rb_gc_impl_new_obj) ../src/gc/default/default.c:2500
ruby(newobj_of) ../src/gc.c:996
ruby(rb_imemo_new+0x37) [0x560b380d8bed] ../src/imemo.c:46
ruby(imemo_fields_new) ../src/imemo.c:105
ruby(rb_imemo_fields_new) ../src/imemo.c:120
```
I have no reproduction, but my understanding based on the backtrace
and error is that GC is triggered inside `newobj_init` causing the
new object to be marked while in a incomplete state.
I believe the fix is to pass the `shape_id` down to `newobj_init`
so it can be set before the GC has a chance to trigger.
This should be less common than than many of the other flags, so should
not inflate the heap too much. This is desirable because reducing the
number of remembered objects will improve minor GC speeds.
Since we do not run a Ractor barrier before forking, it's possible that
another other Ractor is halfway through allocating an object during forking.
This may lead to allocated_objects_count being off by one.
For example, the following script reproduces the bug:
100.times do |i|
Ractor.new(i) do |j|
10000.times do |i|
"#{j}-#{i}"
end
Ractor.receive
end
pid = fork { GC.verify_internal_consistency }
_, status = Process.waitpid2 pid
raise unless status.success?
end
We need to run with `taskset -c 1` to force it to use a single CPU core
to more consistenly reproduce the bug:
heap_pages_final_slots: 1, total_freed_objects: 16628
test.rb:8: [BUG] inconsistent live slot number: expect 19589, but 19588.
ruby 4.0.0dev (2025-11-25T03:06:55Z master 55892f5994) +PRISM [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0007 p:---- s:0029 e:000028 l:y b:---- CFUNC :verify_internal_consistency
c:0006 p:0004 s:0025 e:000024 l:n b:---- BLOCK test.rb:8 [FINISH]
c:0005 p:---- s:0022 e:000021 l:y b:---- CFUNC :fork
c:0004 p:0012 s:0018 E:0014c0 l:n b:---- BLOCK test.rb:8
c:0003 p:0024 s:0011 e:000010 l:y b:0001 METHOD <internal:numeric>:257
c:0002 p:0005 s:0006 E:001730 l:n b:---- EVAL test.rb:1 [FINISH]
c:0001 p:0000 s:0003 E:001d20 l:y b:---- DUMMY [FINISH]
-- Ruby level backtrace information ----------------------------------------
test.rb:1:in '<main>'
<internal:numeric>:257:in 'times'
test.rb:8:in 'block in <main>'
test.rb:8:in 'fork'
test.rb:8:in 'block (2 levels) in <main>'
test.rb:8:in 'verify_internal_consistency'
-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 1
-- C level backtrace information -------------------------------------------
ruby(rb_print_backtrace+0x14) [0x61b67ac48b60] vm_dump.c:1105
ruby(rb_vm_bugreport) vm_dump.c:1450
ruby(rb_bug_without_die_internal+0x5f) [0x61b67a818a28] error.c:1098
ruby(rb_bug) error.c:1116
ruby(gc_verify_internal_consistency_+0xbdd) [0x61b67a83d8ed] gc/default/default.c:5186
ruby(gc_verify_internal_consistency+0x2d) [0x61b67a83d960] gc/default/default.c:5241
ruby(rb_gc_verify_internal_consistency) gc/default/default.c:8950
ruby(gc_verify_internal_consistency_m) gc/default/default.c:8966
ruby(vm_call_cfunc_with_frame_+0x10d) [0x61b67a9e50fd] vm_insnhelper.c:3902
ruby(vm_sendish+0x111) [0x61b67a9eeaf1] vm_insnhelper.c:6124
ruby(vm_exec_core+0x84) [0x61b67aa07434] insns.def:903
ruby(vm_exec_loop+0xa) [0x61b67a9f8155] vm.c:2811
ruby(rb_vm_exec) vm.c:2787
ruby(vm_yield_with_cref+0x90) [0x61b67a9fd2ea] vm.c:1865
ruby(vm_yield) vm.c:1873
ruby(rb_yield) vm_eval.c:1362
ruby(rb_protect+0xef) [0x61b67a81fe6f] eval.c:1154
ruby(rb_f_fork+0x16) [0x61b67a8e98ab] process.c:4293
ruby(rb_f_fork) process.c:4284
In rb_gc_impl_before_fork, it locks the VM and barriers all the Ractors
before calling mmtk_before_fork. However, since rb_mmtk_block_for_gc is
a barrier point, one or more Ractors could be paused there. However,
mmtk_before_fork is not compatible with that because it assumes that the
MMTk workers are idle, but the workers are not idle because they are
busy working on a GC.
This commit essentially implements a trylock. It will optimistically
lock but will release the lock if it detects that any other Ractors are
waiting in rb_mmtk_block_for_gc.
For example, the following script demonstrates the issue:
puts "Hello #{Process.pid}"
100.times do |i|
puts "i = #{i}"
Ractor.new(i) do |j|
puts "Ractor #{j} hello"
1000.times do |i|
s = "#{j}-#{i}"
end
Ractor.receive
puts "Ractor #{j} goodbye"
end
pid = fork { }
puts "Child pid is #{pid}"
_, status = Process.waitpid2 pid
puts status.success?
end
puts "Goodbye"
We can see the MMTk worker thread is waiting to start the GC:
#4 0x00007ffff66538b1 in rb_mmtk_stop_the_world () at gc/mmtk/mmtk.c:101
#5 0x00007ffff6d04caf in mmtk_ruby::collection::{impl#0}::stop_all_mutators<mmtk::scheduler::gc_work::{impl#14}::do_work::{closure_env#0}<mmtk::plan::immix::gc_work::ImmixGCWorkContext<mmtk_ruby::Ruby, 0>>> (_tls=..., mutator_visitor=...) at src/collection.rs:23
However, the mutator thread is stuck in mmtk_before_fork trying to stop
that worker thread:
#4 0x00007ffff6c0b621 in std::sys:🧵:unix::Thread::join () at library/std/src/sys/thread/unix.rs:134
#5 0x00007ffff6658b6e in std:🧵:JoinInner<()>::join<()> (self=...)
#6 0x00007ffff6658d4c in std:🧵:JoinHandle<()>::join<()> (self=...)
#7 0x00007ffff665795e in mmtk_ruby::binding::RubyBinding::join_all_gc_threads (self=0x7ffff72462d0 <mmtk_ruby::BINDING+8>) at src/binding.rs:115
#8 0x00007ffff66561a8 in mmtk_ruby::api::mmtk_before_fork () at src/api.rs:309
#9 0x00007ffff66556ff in rb_gc_impl_before_fork (objspace_ptr=0x555555d17980) at gc/mmtk/mmtk.c:1054
#10 0x00005555556bbc3e in rb_gc_before_fork () at gc.c:5429
https://github.com/ruby/mmtk/commit/1a629504a7
We need the VM barrier in rb_gc_impl_before_fork to stop the other Ractors
because otherwise they could be allocating objects in the fast path which
could be calling mmtk_add_obj_free_candidate. Since mmtk_add_obj_free_candidate
acquires a lock on obj_free_candidates in weak_proc.rs, this lock may not
be released in the child process after the Ractor dies.
For example, the following script demonstrates the issue:
puts "Hello #{Process.pid}"
100.times do |i|
puts "i = #{i}"
Ractor.new(i) do |j|
puts "Ractor #{j} hello"
1000.times do |i|
s = "#{j}-#{i}"
end
Ractor.receive
puts "Ractor #{j} goodbye"
end
pid = fork { }
puts "Child pid is #{pid}"
_, status = Process.waitpid2 pid
puts status.success?
end
puts "Goodbye"
In the child process, we can see that it is stuck trying to acquire the
lock on obj_free_candidates:
#5 0x00007192bfb53f10 in mmtk_ruby::weak_proc::WeakProcessor::get_all_obj_free_candidates (self=0x7192c0657498 <mmtk_ruby::BINDING+72>) at src/weak_proc.rs:52
#6 0x00007192bfa634c3 in mmtk_ruby::api::mmtk_get_all_obj_free_candidates () at src/api.rs:295
#7 0x00007192bfa61d50 in rb_gc_impl_shutdown_call_finalizer (objspace_ptr=0x578c17abfc50) at gc/mmtk/mmtk.c:1032
#8 0x0000578c1601e48e in rb_ec_finalize (ec=0x578c17ac06d0) at eval.c:166
#9 rb_ec_cleanup (ec=<optimized out>, ex=<optimized out>) at eval.c:257
#10 0x0000578c1601ebf6 in ruby_cleanup (ex=<optimized out>) at eval.c:180
#11 ruby_stop (ex=<optimized out>) at eval.c:292
#12 0x0000578c16127124 in rb_f_fork (obj=<optimized out>) at process.c:4291
#13 rb_f_fork (obj=<optimized out>) at process.c:4281
https://github.com/ruby/mmtk/commit/eb4b229858
If we are using multiple Ractors, other Ractors may allocate objects after
rb_gc_impl_before_fork is ran because it does not lock the VM. This can cause
the GC to be in a bad state since rb_gc_impl_before_fork may have terminated
GC threads so a GC cannot run until rb_gc_impl_after_fork is ran.
https://github.com/ruby/mmtk/commit/e4bea5676d
to adopt strict shareable rule.
* (basically) shareable objects only refer shareable objects
* (exception) shareable objects can refere unshareable objects
but should not leak reference to unshareable objects to Ruby world
* `RB_OBJ_SET_SHAREABLE(obj)` makes obj shareable.
All of reachable objects from `obj` should be shareable.
* `RB_OBJ_SET_FROZEN_SHAREABLE(obj)` same as above
but freeze `obj` before making it shareable.
Also `rb_gc_verify_shareable(obj)` is introduced to check
the `obj` does not violate shareable rule (an shareable object
only refers shareable objects) strictly.
The rule has some exceptions (some shareable objects can refer to
unshareable objects, such as a Ractor object (which is a shareable
object) can refer to the Ractor local objects.
To handle such case, `check_shareable` flag is also introduced.
`STRICT_VERIFY_SHAREABLE` macro is also introduced to verify
the strict shareable rule at `SET_SHAREABLE`.
This isn't (yet?) safe to do because it concurrently modifies GC
structures and dfree functions are not necessarily safe to do without
stopping all Ractors.
If it was safe to do this we should also do it for
gc_enter_event_continue. I do think sweeping could be done concurrently
with the mutator and in parallel, but that requires more work first.
Previously on our mark_and_move we were calling rb_gc_mark, which isn't
safe to call at compaction time.
Co-authored-by: Luke Gruber <luke.gru@gmail.com>
When we mark a T_NONE, we crash with the object and parent object information
in the bug report. However, if the parent object is young then it is Qfalse.
For example, a bug report looks like:
[BUG] try to mark T_NONE object (obj: 0x00003990e42d7c70 T_NONE/, parent: (none))
This commit changes it to always set the parent object and also adds a
new field parent_object_old_p to quickly determine if the parent object
is old or not.
Setting v1, v2, v3 when we allocate an object assumes that we always
allocate 40 byte objects. By removing v1, v2, v3, we can make the base
slot size another size.