Related to [Bug #21842].
* rb_interned_str: document what decides whether the returned string is
in US-ASCII or BINARY encoding.
* rb_interned_str_cstr: include the same description as rb_interned_str
for the encoding. This one was still missing the update for US-ASCII
and erroneously said the returned string was alwasy in BINARY encoding
* rb_str_to_interned_str: document how the encoding of the result is
defined.
Co-authored-by: Herwin <herwinw@users.noreply.github.com>
`chompdirsep` searches from the start of the string each time, which
perhaps is necessary for certain encodings (not even sure?) but for
the common encodings it's very wasteful. Instead we can start from the
back of the string and only compare one or two characters in most cases.
Also replace `StringValueCStr` for the simpler `rb_str_null_check`
as we only care about whether the string contains `NULL` bytes, we
don't care whether it is NULL terminated or not.
We also only check the final string for NULLs.
```
compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71eaf) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-01-18T12:55:15Z spedup-file-join 5948e92e03) +PRISM [arm64-darwin25]
warming up....
| |compare-ruby|built-ruby|
|:-------------|-----------:|---------:|
|two_strings | 2.477M| 19.317M|
| | -| 7.80x|
|many_strings | 547.577k| 10.298M|
| | -| 18.81x|
|array | 515.280k| 523.291k|
| | -| 1.02x|
|mixed | 621.840k| 635.422k|
| | -| 1.02x|
```
`File.join` is a hotspot for common libraries such as Zeitwerk
and Bootsnap. It has a fairly flexible signature, but 99% of
the time it's called with just two (or a small number of) UTF-8 strings.
If we optimistically optimize for that use case we can cut down a large
number of type and encoding checks, significantly speeding up the method.
The one remaining expensive check we could try to optimize is `str_null_check`.
Given it's common to use the same base string for joining, we could memoize it.
Also we could precompute it for literal strings.
```
compare-ruby: ruby 4.1.0dev (2026-01-17T14:40:03Z master 00a3b71eaf) +PRISM [arm64-darwin25]
built-ruby: ruby 4.1.0dev (2026-01-18T12:10:38Z spedup-file-join 069bab58d4) +PRISM [arm64-darwin25]
warming up....
| |compare-ruby|built-ruby|
|:-------------|-----------:|---------:|
|two_strings | 2.475M| 9.444M|
| | -| 3.82x|
|many_strings | 551.975k| 2.346M|
| | -| 4.25x|
|array | 514.946k| 522.034k|
| | -| 1.01x|
|mixed | 621.236k| 633.189k|
| | -| 1.02x|
```
InvokeProc and HIR effects landed without an intermediate rebase so we
got a conflict in the form of a type checker error (not handled new
opcode in a new function).
**Progress**
I've added a new directory, `zjit/src/hir_effect`. It follows the same structure as `zjit/src/hir_type` and includes:
- a ruby script to generate a rust file containing a bitset of effects we want to track
- a modified `hir.rs` to include an `effects_of` function that catalogs effects for each HIR instruction, similar to `infer_type`. Right now these effects are not specialized, all instructions currently return the top of the lattice (any effect)
- a module file for effects at `zjit/src/hir_effect/mod.rs` that again, mirrors `zjit/src/hir_type/mod.rs`. This contains a lot of helper functions and lattice operations like union and intersection
**Design Idea**
The effect system is bitset-based rather than range-based. This is the first kind of effect system described in [Max's blog post](https://bernsteinbear.com/blog/compiler-effects/).
Practically, having effects defined for each HIR instruction should allow us to have better generalization than the implicit effect system we have for c functions that we annotation as elidable, leaf, etc. Additionally, this could allow us to reason about the effects of multiple HIR instructions unioned together, something I don't believe currently exists.
**Practical Goals**
This PR replaces `has_effects` with a new effects-based `is_elidable` function. This has no behavior change to the JIT, but will make it easier to reason about effects of basic blocks and CCalls with the new design. We may be able to accomplish other quality of life improvements, such as consolidation of `nogc`, `leaf`, and other annotations.
This is everything that `irb` uses. It works in their test-suite, but there are 20 failures when using the shim that I haven't looked into at all.
`parse` is not used by `irb`. `scan` is, and it's basically `parse` but also including errors. `irb` doesn't seem to care about the errors, so I didn't implement that.
https://github.com/ruby/prism/commit/2c5826b39f