This rewrites the class allocator search to be faster. Instead of using
RCLASS_SUPER, which is now even slower due to Box, we can scan the
superclasses list to find a class where the allocator is defined.
This also disallows allocating from an ICLASS. Previously I believe that
was only done for FrozenCore, and that was changed in
e596cf6e93dbf121e197cccfec8a69902e00eda3.
This reverts commit 2998c8d6b99ec49925ebea42198b29c3e27b34a7.
We need to find a better way to fix this bug. Even with this refcount
change, errors were still being seen in CI. For now we need to remove
this failing test.
Calling the usual rb_iclass_classext_free() causes SEGV because
duplicating a newer classext of iclass had set the reference from superclass
to the newer classext, but calling rb_iclass_classext_free() deletes it.
to adopt strict shareable rule.
* (basically) shareable objects only refer shareable objects
* (exception) shareable objects can refere unshareable objects
but should not leak reference to unshareable objects to Ruby world
The st_insert in RCLASS_SET_NAMESPACE_CLASSEXT may overwrite an existing
rb_classext_t, causing it to leak memory. This commit changes it to use
st_update to free the existing one before overwriting it.
to fix inconsistent and wrong current namespace detections.
This includes:
* Moving load_path and related things from rb_vm_t to rb_namespace_t to simplify
accessing those values via namespace (instead of accessing either vm or ns)
* Initializing root_namespace earlier and consolidate builtin_namespace into root_namespace
* Adding VM_FRAME_FLAG_NS_REQUIRE for checkpoints to detect a namespace to load/require files
* Removing implicit refinements in the root namespace which was used to determine
the namespace to be loaded (replaced by VM_FRAME_FLAG_NS_REQUIRE)
* Removing namespaces from rb_proc_t because its namespace can be identified by lexical context
* Starting to use ep[VM_ENV_DATA_INDEX_SPECVAL] to store the current namespace when
the frame type is MAGIC_TOP or MAGIC_CLASS (block handlers don't exist in this case)
It is much more convenient than storing the klass, especially
when dealing with `object_id` as it allows to update the id2ref
table without having to dereference the owner, which may be
garbage at that point.
For now this doesn't change anything, but now that the table
is managed by GC, it opens the door to use RCU when in multi-ractor
mode, hence allow unsynchornized reads.
Previously we had an assertion that the method table was only set on
young objects, and a comment stating that was how it needed to be used.
I think that confused the complexity of the write barriers that may be
needed here.
* Setting an empty M_TBL never needs a write barrier
* T_CLASS and T_MODULE should always fire a write barrier to newly added
methods
* T_ICLASS only needs a write barrier to methods when
RCLASSEXT_ICLASS_IS_ORIGIN(x) && !RCLASSEXT_ICLASS_ORIGIN_SHARED_MTBL(x)
We shouldn't assume that the object being young is sufficient, because
we also need write barriers for incremental marking and it's unreliable.
Even when namespaces are enabled, only a few core classes created
during init will eventually be namespaced.
For these it's OK to allocate a 320B slot to hold the extra namespace
stuff.
But for any class created post init, we know we'll never need the
namespace and we can fit in a 160B slot.
Now that class fields have been deletated to a T_IMEMO/class_fields
when we're in multi-ractor mode, we can read and write class instance
variable in an atomic way using Read-Copy-Update (RCU).
Note when in multi-ractor mode, we always use RCU. In theory
we don't need to, instead if we ensured the field is written
before the shape is updated it would be safe.
Benchmark:
```ruby
Warning[:experimental] = false
class Foo
@foo = 1
@bar = 2
@baz = 3
@egg = 4
@spam = 5
class << self
attr_reader :foo, :bar, :baz, :egg, :spam
end
end
ractors = 8.times.map do
Ractor.new do
1_000_000.times do
Foo.bar + Foo.baz * Foo.egg - Foo.spam
end
end
end
if Ractor.method_defined?(:value)
ractors.each(&:value)
else
ractors.each(&:take)
end
```
This branch vs Ruby 3.4:
```bash
$ hyperfine -w 1 'ruby --disable-all ../test.rb' './miniruby ../test.rb'
Benchmark 1: ruby --disable-all ../test.rb
Time (mean ± σ): 3.162 s ± 0.071 s [User: 2.783 s, System: 10.809 s]
Range (min … max): 3.093 s … 3.337 s 10 runs
Benchmark 2: ./miniruby ../test.rb
Time (mean ± σ): 208.7 ms ± 4.6 ms [User: 889.7 ms, System: 6.9 ms]
Range (min … max): 202.8 ms … 222.0 ms 14 runs
Summary
./miniruby ../test.rb ran
15.15 ± 0.47 times faster than ruby --disable-all ../test.rb
```
This behave almost exactly as a T_OBJECT, the layout is entirely
compatible.
This aims to solve two problems.
First, it solves the problem of namspaced classes having
a single `shape_id`. Now each namespaced classext
has an object that can hold the namespace specific
shape.
Second, it open the door to later make class instance variable
writes atomics, hence be able to read class variables
without locking the VM.
In the future, in multi-ractor mode, we can do the write
on a copy of the `fields_obj` and then atomically swap it.
Considerations:
- Right now the `RClass` shape_id is always synchronized,
but with namespace we should likely mark classes that have
multiple namespace with a specific shape flag.
The type isn't opaque because Ruby isn't often compiled with LTO,
so for optimization purpose it's better to allow as much inlining
as possible.
However ideally only `shape.c` and `shape.h` should deal with
the actual struct, and everything else should just deal with opaque
`shape_id_t`.
Instead `shape_id_t` higher bits contain flags, and the first one
tells whether the shape is frozen.
This has multiple benefits:
- Can check if a shape is frozen with a single bit check instead of
dereferencing a pointer.
- Guarantees it is always possible to transition to frozen.
- This allow reclaiming `FL_FREEZE` (not done yet).
The downside is you have to be careful to preserve these flags
when transitioning.
MAX_IV_COUNT is a hint which determines the size of variable width
allocation we should use for a given class. We don't need to scope this
by namespace, if we end up with larger builtin objects on some
namespaces that isn't a user-visible problem, just extra memory use.
Similarly variation_count is used to track if a given object has had too
many branches in shapes it has used, and to use too_complex when that
happens. That's also just a hint, so we can use the same value across
namespaces without it being visible to users.
Previously variation_count was being incremented (written to) on the
RCLASS_EXT_READABLE ext, which seems incorrect if we wanted it to be
different across namespaces
Previously we used a flag to set whether a module was uninitialized.
When checked whether a class was initialized, we first had to check that
it had a non-zero superclass, as well as that it wasn't BasicObject.
With the advent of namespaces, RCLASS_SUPER is now an expensive
operation, and though we could just check for the prime superclass, we
might as well take this opportunity to use a flag so that we can perform
the initialized check with as few instructions as possible.
It's possible in the future that we could prevent uninitialized classes
from being available to the user, but currently there are a few ways to
do that.
This makes `RBobject` `4B` larger on 32 bit systems
but simplifies the implementation a lot.
[Feature #21353]
Co-authored-by: Jean Boussier <byroot@ruby-lang.org>
Superclasses can't be modified by user code, so do not need namespace
indirection. For example Object.superclass is always BasicObject, no
matter what modules are included onto it.
Given classes and modules have a different set of fields in every
namespace, we can't store the object_id in fields for them.
Given that some space was freed in `RClass` we can store it there
instead.
By making `super_classdepth` `uint16_t`, classes and modules can
now fit in 160B slots again.
The downside of course is that before `super_classdepth` was large
enough we never had to care about overflow, as you couldn't
realistically create enough classes to ever go over it.
With this change, while it is stupid, you could realistically
create an ancestor chain containing 65k classes and modules.
The macro RCLASS_EXT() accesses the prime classext directly, but it can be
valid only in a limited situation when namespace is enabled.
So, to prevent using RCLASS_EXT() in the wrong way, rename the macro and
let the developer check it is ok to access the prime classext or not.
To make RClass size smaller, move flags of prime classext readable/writable to:
readable - use ns_classext_tbl is NULL or not (if NULL, it's readable)
writable - use FL_USER2 of RBasic flags
Ivars will longer be the only thing stored inline
via shapes, so keeping the `iv_index` and `ivptr` names
would be confusing.
Instance variables won't be the only thing stored inline
via shapes, so keeping the `ivptr` name would be confusing.
`field` encompass anything that can be stored in a VALUE array.
Similarly, `gen_ivtbl` becomes `gen_fields_tbl`.