This commit adds a field handle_weak_references to rb_data_type_struct for
the callback when handling weak references. This avoids TypedData objects
from needing to expose their rb_data_type_struct and weak references function.
The flags for `rb_data_type_t::flags` are public constants for
defining `rb_data_type_t`. The embedded data flag and mask are
internal implementation detail.
[Feature #21084]
# Summary
The current way of marking weak references uses `rb_gc_mark_weak(VALUE *ptr)`.
This presents challenges because Ruby's GC is incremental, meaning that if the
`ptr` changes (e.g. realloc'd or free'd), then we could have an invalid memory
access. This also overwrites `*ptr = Qundef` if `*ptr` is dead, which prevents
any cleanup to be run (e.g. freeing memory or deleting entries from hash
tables). This ticket proposes `rb_gc_declare_weak_references` which declares
that an object has weak references and calls a cleanup function after marking,
allowing the object to clean up any memory for dead objects.
# Introduction
In [[Feature #19783]](https://bugs.ruby-lang.org/issues/19783), I introduced an
API allowing objects to mark weak references, the function signature looks like
this:
```c
void rb_gc_mark_weak(VALUE *ptr);
```
`rb_gc_mark_weak` is called during the marking phase of the GC to specify that
the memory at `ptr` holds a pointer to a Ruby object that is weakly referenced.
`rb_gc_mark_weak` appends this pointer to a list that is processed after the
marking phase of the GC. If the object at `*ptr` is no longer alive, then it
overwrites the object reference with a special value (`*ptr = Qundef`).
However, this API resulted in two challenges:
1. Ruby's default GC is incremental, which means that the GC is not ran in one
phase, but rather split into chunks of work that interleaves with Ruby
execution. The `ptr` passed into `rb_gc_mark_weak` could be on the malloc
heap, and that memory could be realloc'd or even free'd. We had to use
workarounds such as `rb_gc_remove_weak` to ensure that there were no illegal
memory accesses. This made `rb_gc_mark_weak` difficult to use, impacted
runtime performance, and increased memory usage.
2. When an object dies, `rb_gc_mark_weak` only overwites the reference with
`Qundef`. This means that if we want to do any cleanup (e.g. free a piece of
memory or delete a hash table entry), we could not do that and had to defer
this process elsewhere (e.g. during marking or runtime).
In this ticket, I'm proposing a new API for weak references. Instead of an
object marking its weak references during the marking phase, the object declares
that it has weak references using the `rb_gc_declare_weak_references` function.
This declaration occurs during runtime (e.g. after the object has been created)
rather than during GC.
After an object declares that it has weak references, it will have its callback
function called after marking as long as that object is alive. This callback
function can then call a special function `rb_gc_handle_weak_references_alive_p`
to determine whether its references are alive. This will allow the callback
function to do whatever it wants on the object, allowing it to perform any
cleanup work it needs.
This significantly simplifies the code for `ObjectSpace::WeakMap` and
`ObjectSpace::WeakKeyMap` because it no longer needs to have the workarounds for
the limitations of `rb_gc_mark_weak`.
# Performance
The performance results below demonstrate that `ObjectSpace::WeakMap#[]=` is now
about 60% faster because the implementation has been simplified and the number
of allocations has been reduced. We can see that there is not a significant
impact on the performance of `ObjectSpace::WeakMap#[]`.
Base:
```
ObjectSpace::WeakMap#[]=
4.620M (± 6.4%) i/s (216.44 ns/i) - 23.342M in 5.072149s
ObjectSpace::WeakMap#[]
30.967M (± 1.9%) i/s (32.29 ns/i) - 154.998M in 5.007157s
```
Branch:
```
ObjectSpace::WeakMap#[]=
7.336M (± 2.8%) i/s (136.31 ns/i) - 36.755M in 5.013983s
ObjectSpace::WeakMap#[]
30.902M (± 5.4%) i/s (32.36 ns/i) - 155.901M in 5.064060s
```
Code:
```
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "benchmark-ips"
end
wmap = ObjectSpace::WeakMap.new
key = Object.new
val = Object.new
wmap[key] = val
Benchmark.ips do |x|
x.report("ObjectSpace::WeakMap#[]=") do |times|
i = 0
while i < times
wmap[Object.new] = Object.new
i += 1
end
end
x.report("ObjectSpace::WeakMap#[]") do |times|
i = 0
while i < times
wmap[key]
wmap[val] # does not exist
i += 1
end
end
end
```
# Alternative designs
Currently, `rb_gc_declare_weak_references` is designed to be an internal-only
API. This allows us to assume the object types that call
`rb_gc_declare_weak_references`. In the future, if we want to open up this API
to third parties, we may want to change this function to something like:
```c
void rb_gc_add_cleaner(VALUE obj, void (*callback)(VALUE obj));
```
This will allow the third party to implement a custom `callback` that gets
called after the marking phase of GC to clean up any dead references. I chose
not to implement this design because it is less efficient as we would need to
store a mapping from `obj` to `callback`, which requires extra memory.
* `rb_intern_str`: the argument must be `T_STRING`, no conversion.
* `rb_intern_str`, `rb_check_id`, `rb_to_id`, `rb_check_symbol`: raise
`EncodingError` unless the "name" argument is a valid string in its
encoding.
This guard was removed in https://github.com/ruby/ruby/pull/13497
on the justification that some GC may need to be notified even for
immediate.
But the two currently available GCs don't, and there are plenty
of assumtions GCs don't everywhere, notably in YJIT and ZJIT.
This optimization is also not so micro (but not huge either).
I routinely see 1-2% wasted there on micro-benchmarks.
So perhaps if in the future we actually need this, it might make
sense to introduce a way for GCs to declare that as an option,
but in the meantime it's extra overhead with little gain.
The changes are to `io.c` and `thread.c`.
I changed the API of 2 exported thread functions from `internal/thread.h` that
didn't look like they had any use in C extensions:
* rb_thread_wait_for_single_fd
* rb_thread_io_wait
I didn't change the following exported internal function because it's
used in C extensions:
* rb_thread_fd_select
I added a comment to note that this function, although internal, is used
in C extensions.
Move these macros from include/ruby/backward.h to
include/ruby/internal/attr/deprecated.h, alongside the other similar
macros.
include/ruby/internal/intern/vm.h cannot currently use them because
include/ruby/backward.h is included too late.
`T_DATA` has a flag `RUBY_TYPED_FROZEN_SHAREABLE` which means
if the `T_DATA` object is frozen, it can be sharable.
On the `Ractor.make_sharable(obj)`, rechable objects from the
`T_DATA` object will be apply `Ractor.make_shareable` recursively.
`RUBY_TYPED_FROZEN_SHAREABLE_NO_REC` is similar to the
`RUBY_TYPED_FROZEN_SHAREABLE`, but doesn't apply `Ractor.make_sharable`
recursively for children.
If it refers to unshareable objects, it will simply raise an error.
I'm not sure this pattern is common or not, so it is not in public.
If we find more cases, we can discuss publication.
While profiling `Monitor#synchronize` and `Mutex#synchronize`
I noticed a fairly significant amount of time spent in
`rb_check_typeddata`.
By implementing a fast path that assumes the object is valid
and that can be inlined, it does make a significant difference:
Before:
```
Mutex 13.548M (± 3.6%) i/s (73.81 ns/i) - 68.566M in 5.067444
Monitor 10.497M (± 6.5%) i/s (95.27 ns/i) - 52.529M in 5.032698s
```
After:
```
Mutex 20.887M (± 0.3%) i/s (47.88 ns/i) - 106.021M in 5.075989s
Monitor 16.245M (±13.3%) i/s (61.56 ns/i) - 80.705M in 5.099680s
```
```ruby
require 'bundler/inline'
gemfile do
gem "benchmark-ips"
end
mutex = Mutex.new
require "monitor"
monitor = Monitor.new
Benchmark.ips do |x|
x.report("Mutex") { mutex.synchronize { } }
x.report("Monitor") { monitor.synchronize { } }
end
```