mirror of
https://github.com/ruby/ruby.git
synced 2026-01-27 12:34:21 +00:00
558 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4a21b83693
|
ZJIT: Optimize common invokesuper cases (#15816)
* ZJIT: Profile `invokesuper` instructions * ZJIT: Introduce the `InvokeSuperDirect` HIR instruction The new instruction is an optimized version of `InvokeSuper` when we know the `super` target is an ISEQ. * ZJIT: Expand definition of unspecializable to more complex cases * ZJIT: Ensure `invokesuper` optimization works when the inheritance hierarchy is modified * ZJIT: Simplify `invokesuper` specialization to most common case Looking at ruby-bench, most `super` calls don't pass a block, which means we can use the already optimized `SendWithoutBlockDirect`. * ZJIT: Track `super` method entries directly to avoid GC issues Because the method entry isn't typed as a `VALUE`, we set up barriers on its `VALUE` fields. But, that was insufficient as the method entry itself could be collected in certain cases, resulting in dangling objects. Now we track the method entry as a `VALUE` and can more naturally mark it and its children. * ZJIT: Optimize `super` calls with simple argument forms * ZJIT: Report the reason why we can't optimize an `invokesuper` instance * ZJIT: Revise send fallback reasons for `super` calls * ZJIT: Assert `super` calls are `FCALL` and don't need visibily checks |
||
|
|
0eb53053f0
|
ZJIT: Specialize setinstancevariable when ivar is already in shape (#15290)
Don't support shape transitions for now. |
||
|
|
f52edf172d
|
ZJIT: Specialize monomorphic DefinedIvar (#15281)
This lets us constant-fold common monomorphic cases. |
||
|
|
d2a587c791 | renaming internal data structures and functions from namespace to box | ||
|
|
02267417da
|
ZJIT: Profile specific objects for invokeblock (#15051)
I made a special kind of `ProfiledType` that looks at specific objects, not just their classes/shapes (https://github.com/ruby/ruby/pull/15051). Then I profiled some of our benchmarks.
For lobsters:
```
Top-6 invokeblock handler (100.0% of total 1,064,155):
megamorphic: 494,931 (46.5%)
monomorphic_iseq: 337,171 (31.7%)
polymorphic: 113,381 (10.7%)
monomorphic_ifunc: 52,260 ( 4.9%)
monomorphic_other: 38,970 ( 3.7%)
no_profiles: 27,442 ( 2.6%)
```
For railsbench:
```
Top-6 invokeblock handler (100.0% of total 2,529,104):
monomorphic_iseq: 834,452 (33.0%)
megamorphic: 818,347 (32.4%)
polymorphic: 632,273 (25.0%)
monomorphic_ifunc: 224,243 ( 8.9%)
monomorphic_other: 19,595 ( 0.8%)
no_profiles: 194 ( 0.0%)
```
For shipit:
```
Top-6 invokeblock handler (100.0% of total 2,104,148):
megamorphic: 1,269,889 (60.4%)
polymorphic: 411,475 (19.6%)
no_profiles: 173,367 ( 8.2%)
monomorphic_other: 118,619 ( 5.6%)
monomorphic_iseq: 84,891 ( 4.0%)
monomorphic_ifunc: 45,907 ( 2.2%)
```
Seems like a monomorphic case for a specific ISEQ actually isn't a bad way of going about this, at least to start...
|
||
|
|
e047cea280
|
ZJIT: Optimize send with block into CCallWithFrame (#14863)
Since `Send` has a block iseq, I updated `CCallWithFrame` to take an optional `blockiseq` as well, and then generate `CCallWithFrame` for `Send` when the condition is right.
## Stats
`liquid-render` Benchmark
| Metric | Before | After | Change |
|----------------------|--------------------|--------------------|--------------------- |
| send_no_profiles | 3,209,418 (34.1%) | 4,119 (0.1%) | -3,205,299 (-99.9%) |
| dynamic_send_count | 9,410,758 (23.1%) | 6,459,678 (15.9%) | -2,951,080 (-31.4%) |
| optimized_send_count | 31,269,388 (76.9%) | 34,220,474 (84.1%) | +2,951,086 (+9.4%) |
`lobsters` Benchmark
| Metric | Before | After | Change |
|----------------------|------------|------------|---------------------|
| send_no_profiles | 10,769,052 | 2,902,865 | -7,866,187 (-73.0%) |
| dynamic_send_count | 45,673,185 | 42,880,160 | -2,793,025 (-6.1%) |
| optimized_send_count | 75,142,407 | 78,378,514 | +3,236,107 (+4.3%) |
### `liquid-render` Before
<details>
```
Average of last 22, non-warmup iters: 262ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (96.9% of total 10,370,809):
Kernel#respond_to?: 5,069,204 (48.9%)
Hash#key?: 2,394,488 (23.1%)
Set#include?: 778,429 ( 7.5%)
String#===: 326,134 ( 3.1%)
String#<<: 203,231 ( 2.0%)
Integer#<<: 166,768 ( 1.6%)
Kernel#is_a?: 164,272 ( 1.6%)
Kernel#format: 124,262 ( 1.2%)
Integer#/: 124,262 ( 1.2%)
Array#<<: 115,325 ( 1.1%)
Regexp.last_match: 94,862 ( 0.9%)
Hash#[]=: 88,485 ( 0.9%)
String#start_with?: 55,933 ( 0.5%)
CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%)
Array#shift: 55,298 ( 0.5%)
Regexp#===: 48,928 ( 0.5%)
String#=~: 48,477 ( 0.5%)
Array#unshift: 47,331 ( 0.5%)
String#empty?: 42,870 ( 0.4%)
Array#push: 41,215 ( 0.4%)
Top-20 not annotated C methods (97.1% of total 10,394,421):
Kernel#respond_to?: 5,069,204 (48.8%)
Hash#key?: 2,394,488 (23.0%)
Set#include?: 778,429 ( 7.5%)
String#===: 326,134 ( 3.1%)
Kernel#is_a?: 208,664 ( 2.0%)
String#<<: 203,231 ( 2.0%)
Integer#<<: 166,768 ( 1.6%)
Integer#/: 124,262 ( 1.2%)
Kernel#format: 124,262 ( 1.2%)
Array#<<: 115,325 ( 1.1%)
Regexp.last_match: 94,862 ( 0.9%)
Hash#[]=: 88,485 ( 0.9%)
String#start_with?: 55,933 ( 0.5%)
CGI::EscapeExt#escapeHTML: 55,471 ( 0.5%)
Array#shift: 55,298 ( 0.5%)
Regexp#===: 48,928 ( 0.5%)
String#=~: 48,477 ( 0.5%)
Array#unshift: 47,331 ( 0.5%)
String#empty?: 42,870 ( 0.4%)
Array#push: 41,215 ( 0.4%)
Top-2 not optimized method types for send (100.0% of total 2,382):
cfunc: 1,196 (50.2%)
iseq: 1,186 (49.8%)
Top-4 not optimized method types for send_without_block (100.0% of total 2,561,006):
iseq: 2,442,091 (95.4%)
optimized: 118,882 ( 4.6%)
alias: 20 ( 0.0%)
null: 13 ( 0.0%)
Top-9 not optimized instructions (100.0% of total 685,128):
invokeblock: 227,376 (33.2%)
opt_neq: 166,471 (24.3%)
opt_and: 166,471 (24.3%)
opt_eq: 66,721 ( 9.7%)
invokesuper: 39,363 ( 5.7%)
opt_le: 16,278 ( 2.4%)
opt_minus: 1,574 ( 0.2%)
opt_send_without_block: 772 ( 0.1%)
opt_or: 102 ( 0.0%)
Top-8 send fallback reasons (100.0% of total 9,410,758):
send_no_profiles: 3,209,418 (34.1%)
send_without_block_polymorphic: 2,858,558 (30.4%)
send_without_block_not_optimized_method_type: 2,561,006 (27.2%)
not_optimized_instruction: 685,128 ( 7.3%)
send_without_block_no_profiles: 91,913 ( 1.0%)
send_not_optimized_method_type: 2,382 ( 0.0%)
obj_to_string_not_string: 2,352 ( 0.0%)
send_without_block_cfunc_array_variadic: 1 ( 0.0%)
Top-3 unhandled YARV insns (100.0% of total 83,682):
getclassvariable: 83,431 (99.7%)
once: 137 ( 0.2%)
getconstant: 114 ( 0.1%)
Top-3 compile error reasons (100.0% of total 5,431,910):
register_spill_on_alloc: 4,665,393 (85.9%)
exception_handler: 766,347 (14.1%)
register_spill_on_ccall: 170 ( 0.0%)
Top-11 side exit reasons (100.0% of total 14,635,508):
compile_error: 5,431,910 (37.1%)
guard_shape_failure: 3,436,341 (23.5%)
guard_type_failure: 2,545,791 (17.4%)
unhandled_splat: 2,162,907 (14.8%)
unhandled_kwarg: 952,568 ( 6.5%)
unhandled_yarv_insn: 83,682 ( 0.6%)
unhandled_hir_insn: 19,112 ( 0.1%)
patchpoint_stable_constant_names: 1,608 ( 0.0%)
obj_to_string_fallback: 902 ( 0.0%)
patchpoint_method_redefined: 599 ( 0.0%)
block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%)
send_count: 40,680,153
dynamic_send_count: 9,410,758 (23.1%)
optimized_send_count: 31,269,395 (76.9%)
iseq_optimized_send_count: 13,886,902 (34.1%)
inline_cfunc_optimized_send_count: 7,011,684 (17.2%)
non_variadic_cfunc_optimized_send_count: 4,670,333 (11.5%)
variadic_cfunc_optimized_send_count: 5,700,476 (14.0%)
dynamic_getivar_count: 1,144,613
dynamic_setivar_count: 950,830
compiled_iseq_count: 402
failed_iseq_count: 48
compile_time: 976ms
profile_time: 3,223ms
gc_time: 22ms
invalidation_time: 0ms
vm_write_pc_count: 37,744,491
vm_write_sp_count: 37,511,865
vm_write_locals_count: 37,511,865
vm_write_stack_count: 37,511,865
vm_write_to_parent_iseq_local_count: 558,177
vm_read_from_parent_iseq_local_count: 14,317,032
code_region_bytes: 2,211,840
side_exit_count: 14,635,508
total_insn_count: 476,097,972
vm_insn_count: 253,795,154
zjit_insn_count: 222,302,818
ratio_in_zjit: 46.7%
```
</details>
### `liquid-render` After
<details>
```
Average of last 21, non-warmup iters: 272ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (96.8% of total 10,093,966):
Kernel#respond_to?: 4,932,224 (48.9%)
Hash#key?: 2,329,928 (23.1%)
Set#include?: 757,389 ( 7.5%)
String#===: 317,494 ( 3.1%)
String#<<: 197,831 ( 2.0%)
Integer#<<: 162,268 ( 1.6%)
Kernel#is_a?: 159,892 ( 1.6%)
Kernel#format: 120,902 ( 1.2%)
Integer#/: 120,902 ( 1.2%)
Array#<<: 112,225 ( 1.1%)
Regexp.last_match: 92,382 ( 0.9%)
Hash#[]=: 86,145 ( 0.9%)
String#start_with?: 54,953 ( 0.5%)
Array#shift: 54,038 ( 0.5%)
CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%)
Regexp#===: 47,848 ( 0.5%)
String#=~: 47,237 ( 0.5%)
Array#unshift: 46,051 ( 0.5%)
String#empty?: 41,750 ( 0.4%)
Array#push: 40,115 ( 0.4%)
Top-20 not annotated C methods (97.1% of total 10,116,938):
Kernel#respond_to?: 4,932,224 (48.8%)
Hash#key?: 2,329,928 (23.0%)
Set#include?: 757,389 ( 7.5%)
String#===: 317,494 ( 3.1%)
Kernel#is_a?: 203,084 ( 2.0%)
String#<<: 197,831 ( 2.0%)
Integer#<<: 162,268 ( 1.6%)
Kernel#format: 120,902 ( 1.2%)
Integer#/: 120,902 ( 1.2%)
Array#<<: 112,225 ( 1.1%)
Regexp.last_match: 92,382 ( 0.9%)
Hash#[]=: 86,145 ( 0.9%)
String#start_with?: 54,953 ( 0.5%)
Array#shift: 54,038 ( 0.5%)
CGI::EscapeExt#escapeHTML: 53,971 ( 0.5%)
Regexp#===: 47,848 ( 0.5%)
String#=~: 47,237 ( 0.5%)
Array#unshift: 46,051 ( 0.5%)
String#empty?: 41,750 ( 0.4%)
Array#push: 40,115 ( 0.4%)
Top-2 not optimized method types for send (100.0% of total 182,938):
iseq: 178,414 (97.5%)
cfunc: 4,524 ( 2.5%)
Top-4 not optimized method types for send_without_block (100.0% of total 2,492,246):
iseq: 2,376,511 (95.4%)
optimized: 115,702 ( 4.6%)
alias: 20 ( 0.0%)
null: 13 ( 0.0%)
Top-9 not optimized instructions (100.0% of total 667,727):
invokeblock: 221,375 (33.2%)
opt_neq: 161,971 (24.3%)
opt_and: 161,971 (24.3%)
opt_eq: 64,921 ( 9.7%)
invokesuper: 39,243 ( 5.9%)
opt_le: 15,838 ( 2.4%)
opt_minus: 1,534 ( 0.2%)
opt_send_without_block: 772 ( 0.1%)
opt_or: 102 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 6,287,956):
send_without_block_polymorphic: 2,782,058 (44.2%)
send_without_block_not_optimized_method_type: 2,492,246 (39.6%)
not_optimized_instruction: 667,727 (10.6%)
send_not_optimized_method_type: 182,938 ( 2.9%)
send_without_block_no_profiles: 89,613 ( 1.4%)
send_polymorphic: 66,962 ( 1.1%)
send_no_profiles: 4,059 ( 0.1%)
obj_to_string_not_string: 2,352 ( 0.0%)
send_without_block_cfunc_array_variadic: 1 ( 0.0%)
Top-3 unhandled YARV insns (100.0% of total 81,482):
getclassvariable: 81,231 (99.7%)
once: 137 ( 0.2%)
getconstant: 114 ( 0.1%)
Top-3 compile error reasons (100.0% of total 5,286,310):
register_spill_on_alloc: 4,540,413 (85.9%)
exception_handler: 745,727 (14.1%)
register_spill_on_ccall: 170 ( 0.0%)
Top-12 side exit reasons (100.0% of total 14,244,881):
compile_error: 5,286,310 (37.1%)
guard_shape_failure: 3,346,873 (23.5%)
guard_type_failure: 2,477,071 (17.4%)
unhandled_splat: 2,104,447 (14.8%)
unhandled_kwarg: 926,828 ( 6.5%)
unhandled_yarv_insn: 81,482 ( 0.6%)
unhandled_hir_insn: 18,672 ( 0.1%)
patchpoint_stable_constant_names: 1,608 ( 0.0%)
obj_to_string_fallback: 902 ( 0.0%)
patchpoint_method_redefined: 599 ( 0.0%)
block_param_proxy_not_iseq_or_ifunc: 88 ( 0.0%)
interrupt: 1 ( 0.0%)
send_count: 39,591,410
dynamic_send_count: 6,287,956 (15.9%)
optimized_send_count: 33,303,454 (84.1%)
iseq_optimized_send_count: 13,514,283 (34.1%)
inline_cfunc_optimized_send_count: 6,823,745 (17.2%)
non_variadic_cfunc_optimized_send_count: 7,417,432 (18.7%)
variadic_cfunc_optimized_send_count: 5,547,994 (14.0%)
dynamic_getivar_count: 1,110,647
dynamic_setivar_count: 927,309
compiled_iseq_count: 403
failed_iseq_count: 48
compile_time: 968ms
profile_time: 3,547ms
gc_time: 22ms
invalidation_time: 0ms
vm_write_pc_count: 36,735,108
vm_write_sp_count: 36,508,262
vm_write_locals_count: 36,508,262
vm_write_stack_count: 36,508,262
vm_write_to_parent_iseq_local_count: 543,097
vm_read_from_parent_iseq_local_count: 13,930,672
code_region_bytes: 2,228,224
side_exit_count: 14,244,881
total_insn_count: 463,357,969
vm_insn_count: 247,003,727
zjit_insn_count: 216,354,242
ratio_in_zjit: 46.7%
```
</details>
### `lobsters` Before
<details>
```
Average of last 10, non-warmup iters: 898ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (61.3% of total 19,495,906):
String#<<: 1,764,437 ( 9.1%)
Kernel#is_a?: 1,615,120 ( 8.3%)
Hash#[]=: 1,159,455 ( 5.9%)
Regexp#match?: 777,496 ( 4.0%)
String#empty?: 722,953 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,017 ( 3.1%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.3%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 405,271 ( 2.1%)
Hash#fetch: 382,302 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,341 ( 1.7%)
Kernel#dup: 328,162 ( 1.7%)
String.new: 306,667 ( 1.6%)
String#==: 287,549 ( 1.5%)
BasicObject#!=: 284,642 ( 1.5%)
String#length: 256,070 ( 1.3%)
Top-20 not annotated C methods (62.4% of total 19,796,172):
Kernel#is_a?: 1,993,676 (10.1%)
String#<<: 1,764,437 ( 8.9%)
Hash#[]=: 1,159,634 ( 5.9%)
Regexp#match?: 777,496 ( 3.9%)
String#empty?: 738,030 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,017 ( 3.0%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.2%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 425,813 ( 2.2%)
Hash#fetch: 382,302 ( 1.9%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,375 ( 1.7%)
Kernel#dup: 328,169 ( 1.7%)
String.new: 306,667 ( 1.5%)
String#==: 293,520 ( 1.5%)
BasicObject#!=: 284,825 ( 1.4%)
String#length: 256,070 ( 1.3%)
Top-2 not optimized method types for send (100.0% of total 115,007):
cfunc: 76,172 (66.2%)
iseq: 38,835 (33.8%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641):
iseq: 3,999,211 (50.0%)
bmethod: 1,750,271 (21.9%)
optimized: 1,653,426 (20.7%)
alias: 591,342 ( 7.4%)
null: 8,174 ( 0.1%)
cfunc: 1,217 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 7,590,826):
invokesuper: 4,335,446 (57.1%)
invokeblock: 1,329,215 (17.5%)
sendforward: 841,463 (11.1%)
opt_eq: 810,614 (10.7%)
opt_plus: 141,773 ( 1.9%)
opt_minus: 52,270 ( 0.7%)
opt_send_without_block: 43,248 ( 0.6%)
opt_neq: 15,047 ( 0.2%)
opt_mult: 13,824 ( 0.2%)
opt_or: 7,451 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 45,673,212):
send_without_block_polymorphic: 17,390,335 (38.1%)
send_no_profiles: 10,769,053 (23.6%)
send_without_block_not_optimized_method_type: 8,003,641 (17.5%)
not_optimized_instruction: 7,590,826 (16.6%)
send_without_block_no_profiles: 1,757,109 ( 3.8%)
send_not_optimized_method_type: 115,007 ( 0.3%)
send_without_block_cfunc_array_variadic: 31,149 ( 0.1%)
obj_to_string_not_string: 15,518 ( 0.0%)
send_without_block_direct_too_many_args: 574 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 1,242,228):
expandarray: 622,203 (50.1%)
checkkeyword: 316,111 (25.4%)
getclassvariable: 120,540 ( 9.7%)
getblockparam: 88,480 ( 7.1%)
invokesuperforward: 78,842 ( 6.3%)
opt_duparray_send: 14,149 ( 1.1%)
getconstant: 1,588 ( 0.1%)
checkmatch: 288 ( 0.0%)
once: 27 ( 0.0%)
Top-3 compile error reasons (100.0% of total 6,769,693):
register_spill_on_alloc: 6,188,305 (91.4%)
register_spill_on_ccall: 347,108 ( 5.1%)
exception_handler: 234,280 ( 3.5%)
Top-17 side exit reasons (100.0% of total 20,142,827):
compile_error: 6,769,693 (33.6%)
guard_type_failure: 5,169,050 (25.7%)
guard_shape_failure: 3,726,362 (18.5%)
unhandled_yarv_insn: 1,242,228 ( 6.2%)
block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%)
unhandled_kwarg: 800,154 ( 4.0%)
unknown_newarray_send: 539,317 ( 2.7%)
patchpoint_stable_constant_names: 340,283 ( 1.7%)
unhandled_splat: 229,440 ( 1.1%)
unhandled_hir_insn: 147,351 ( 0.7%)
patchpoint_no_singleton_class: 128,856 ( 0.6%)
patchpoint_method_redefined: 32,718 ( 0.2%)
block_param_proxy_modified: 25,274 ( 0.1%)
patchpoint_no_ep_escape: 7,559 ( 0.0%)
obj_to_string_fallback: 24 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 16 ( 0.0%)
send_count: 120,815,640
dynamic_send_count: 45,673,212 (37.8%)
optimized_send_count: 75,142,428 (62.2%)
iseq_optimized_send_count: 32,188,039 (26.6%)
inline_cfunc_optimized_send_count: 23,458,483 (19.4%)
non_variadic_cfunc_optimized_send_count: 14,809,797 (12.3%)
variadic_cfunc_optimized_send_count: 4,686,109 ( 3.9%)
dynamic_getivar_count: 13,023,437
dynamic_setivar_count: 12,311,158
compiled_iseq_count: 4,806
failed_iseq_count: 466
compile_time: 8,943ms
profile_time: 99ms
gc_time: 45ms
invalidation_time: 239ms
vm_write_pc_count: 113,652,291
vm_write_sp_count: 111,209,623
vm_write_locals_count: 111,209,623
vm_write_stack_count: 111,209,623
vm_write_to_parent_iseq_local_count: 516,800
vm_read_from_parent_iseq_local_count: 11,225,587
code_region_bytes: 22,609,920
side_exit_count: 20,142,827
total_insn_count: 926,088,942
vm_insn_count: 297,636,255
zjit_insn_count: 628,452,687
ratio_in_zjit: 67.9%
```
</details>
### `lobsters` After
<details>
```
Average of last 10, non-warmup iters: 919ms
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (61.3% of total 19,495,868):
String#<<: 1,764,437 ( 9.1%)
Kernel#is_a?: 1,615,110 ( 8.3%)
Hash#[]=: 1,159,455 ( 5.9%)
Regexp#match?: 777,496 ( 4.0%)
String#empty?: 722,953 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,016 ( 3.1%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.3%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 405,271 ( 2.1%)
Hash#fetch: 382,302 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,341 ( 1.7%)
Kernel#dup: 328,162 ( 1.7%)
String.new: 306,667 ( 1.6%)
String#==: 287,545 ( 1.5%)
BasicObject#!=: 284,642 ( 1.5%)
String#length: 256,070 ( 1.3%)
Top-20 not annotated C methods (62.4% of total 19,796,134):
Kernel#is_a?: 1,993,666 (10.1%)
String#<<: 1,764,437 ( 8.9%)
Hash#[]=: 1,159,634 ( 5.9%)
Regexp#match?: 777,496 ( 3.9%)
String#empty?: 738,030 ( 3.7%)
Hash#key?: 685,258 ( 3.5%)
Kernel#respond_to?: 602,016 ( 3.0%)
TrueClass#===: 447,671 ( 2.3%)
FalseClass#===: 439,276 ( 2.2%)
Array#include?: 426,758 ( 2.2%)
Kernel#block_given?: 425,813 ( 2.2%)
Hash#fetch: 382,302 ( 1.9%)
ObjectSpace::WeakKeyMap#[]: 356,654 ( 1.8%)
String#start_with?: 353,793 ( 1.8%)
Kernel#kind_of?: 340,375 ( 1.7%)
Kernel#dup: 328,169 ( 1.7%)
String.new: 306,667 ( 1.5%)
String#==: 293,516 ( 1.5%)
BasicObject#!=: 284,825 ( 1.4%)
String#length: 256,070 ( 1.3%)
Top-4 not optimized method types for send (100.0% of total 4,749,678):
iseq: 2,563,391 (54.0%)
cfunc: 2,064,888 (43.5%)
alias: 118,577 ( 2.5%)
null: 2,822 ( 0.1%)
Top-6 not optimized method types for send_without_block (100.0% of total 8,003,641):
iseq: 3,999,211 (50.0%)
bmethod: 1,750,271 (21.9%)
optimized: 1,653,426 (20.7%)
alias: 591,342 ( 7.4%)
null: 8,174 ( 0.1%)
cfunc: 1,217 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 7,590,818):
invokesuper: 4,335,442 (57.1%)
invokeblock: 1,329,215 (17.5%)
sendforward: 841,463 (11.1%)
opt_eq: 810,610 (10.7%)
opt_plus: 141,773 ( 1.9%)
opt_minus: 52,270 ( 0.7%)
opt_send_without_block: 43,248 ( 0.6%)
opt_neq: 15,047 ( 0.2%)
opt_mult: 13,824 ( 0.2%)
opt_or: 7,451 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-10 send fallback reasons (100.0% of total 43,152,037):
send_without_block_polymorphic: 17,390,322 (40.3%)
send_without_block_not_optimized_method_type: 8,003,641 (18.5%)
not_optimized_instruction: 7,590,818 (17.6%)
send_not_optimized_method_type: 4,749,678 (11.0%)
send_no_profiles: 2,893,666 ( 6.7%)
send_without_block_no_profiles: 1,757,109 ( 4.1%)
send_polymorphic: 719,562 ( 1.7%)
send_without_block_cfunc_array_variadic: 31,149 ( 0.1%)
obj_to_string_not_string: 15,518 ( 0.0%)
send_without_block_direct_too_many_args: 574 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 1,242,215):
expandarray: 622,203 (50.1%)
checkkeyword: 316,111 (25.4%)
getclassvariable: 120,540 ( 9.7%)
getblockparam: 88,467 ( 7.1%)
invokesuperforward: 78,842 ( 6.3%)
opt_duparray_send: 14,149 ( 1.1%)
getconstant: 1,588 ( 0.1%)
checkmatch: 288 ( 0.0%)
once: 27 ( 0.0%)
Top-3 compile error reasons (100.0% of total 6,769,688):
register_spill_on_alloc: 6,188,305 (91.4%)
register_spill_on_ccall: 347,108 ( 5.1%)
exception_handler: 234,275 ( 3.5%)
Top-17 side exit reasons (100.0% of total 20,144,372):
compile_error: 6,769,688 (33.6%)
guard_type_failure: 5,169,204 (25.7%)
guard_shape_failure: 3,726,374 (18.5%)
unhandled_yarv_insn: 1,242,215 ( 6.2%)
block_param_proxy_not_iseq_or_ifunc: 984,480 ( 4.9%)
unhandled_kwarg: 800,154 ( 4.0%)
unknown_newarray_send: 539,317 ( 2.7%)
patchpoint_stable_constant_names: 340,283 ( 1.7%)
unhandled_splat: 229,440 ( 1.1%)
unhandled_hir_insn: 147,351 ( 0.7%)
patchpoint_no_singleton_class: 130,252 ( 0.6%)
patchpoint_method_redefined: 32,716 ( 0.2%)
block_param_proxy_modified: 25,274 ( 0.1%)
patchpoint_no_ep_escape: 7,559 ( 0.0%)
obj_to_string_fallback: 24 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
interrupt: 19 ( 0.0%)
send_count: 120,812,030
dynamic_send_count: 43,152,037 (35.7%)
optimized_send_count: 77,659,993 (64.3%)
iseq_optimized_send_count: 32,187,900 (26.6%)
inline_cfunc_optimized_send_count: 23,458,491 (19.4%)
non_variadic_cfunc_optimized_send_count: 17,327,499 (14.3%)
variadic_cfunc_optimized_send_count: 4,686,103 ( 3.9%)
dynamic_getivar_count: 13,023,424
dynamic_setivar_count: 12,310,991
compiled_iseq_count: 4,806
failed_iseq_count: 466
compile_time: 9,012ms
profile_time: 104ms
gc_time: 44ms
invalidation_time: 239ms
vm_write_pc_count: 113,648,665
vm_write_sp_count: 111,205,997
vm_write_locals_count: 111,205,997
vm_write_stack_count: 111,205,997
vm_write_to_parent_iseq_local_count: 516,800
vm_read_from_parent_iseq_local_count: 11,225,587
code_region_bytes: 23,052,288
side_exit_count: 20,144,372
total_insn_count: 926,090,214
vm_insn_count: 297,647,811
zjit_insn_count: 628,442,403
ratio_in_zjit: 67.9%
```
</details>
|
||
|
|
1d95d75c3f
|
ZJIT: Profile opt_succ and inline Integer#succ for Fixnum (#14846)
This is only really called a lot in the benchmark harness, as far as I can tell. |
||
|
|
de9298635d
|
ZJIT: Profile opt_size, opt_length, opt_regexpmatch2 (#14837)
These bring `send_without_block_no_profiles` numbers down more.
On lobsters:
Before: send_without_block_no_profiles: 1,293,375
After: send_without_block_no_profiles: 998,724
all stats before:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (71.1% of total 15,575,335):
Hash#[]: 4,519,774 (29.0%)
Kernel#is_a?: 1,030,758 ( 6.6%)
String#<<: 851,929 ( 5.5%)
Hash#[]=: 742,941 ( 4.8%)
Regexp#match?: 399,889 ( 2.6%)
String#empty?: 353,775 ( 2.3%)
Hash#key?: 349,129 ( 2.2%)
String#start_with?: 334,961 ( 2.2%)
Kernel#respond_to?: 316,527 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.4%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 181,792 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,340 ( 1.2%)
BasicObject#!=: 175,997 ( 1.1%)
Class#new: 168,078 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (71.6% of total 15,737,478):
Hash#[]: 4,519,784 (28.7%)
Kernel#is_a?: 1,212,649 ( 7.7%)
String#<<: 851,929 ( 5.4%)
Hash#[]=: 743,120 ( 4.7%)
Regexp#match?: 399,889 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,129 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,527 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 191,661 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,347 ( 1.1%)
BasicObject#!=: 176,181 ( 1.1%)
Class#new: 168,078 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,648):
iseq: 2,271,904 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,702 (21.0%)
alias: 310,746 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,096):
invokesuper: 2,373,391 (55.3%)
invokeblock: 811,872 (18.9%)
sendforward: 505,448 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,225 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,824,463):
send_without_block_polymorphic: 9,721,727 (37.6%)
send_no_profiles: 5,894,760 (22.8%)
send_without_block_not_optimized_method_type: 4,523,648 (17.5%)
not_optimized_instruction: 4,293,096 (16.6%)
send_without_block_no_profiles: 1,293,386 ( 5.0%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,502):
register_spill_on_alloc: 3,457,791 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,787):
compile_error: 3,752,502 (34.6%)
guard_type_failure: 2,638,903 (24.3%)
guard_shape_failure: 1,917,195 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,787 ( 4.9%)
unhandled_kwarg: 421,943 ( 3.9%)
patchpoint: 370,449 ( 3.4%)
unknown_newarray_send: 314,785 ( 2.9%)
unhandled_splat: 122,060 ( 1.1%)
unhandled_hir_insn: 76,396 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 504 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,801
dynamic_send_count: 25,824,463 (38.6%)
optimized_send_count: 41,121,338 (61.4%)
iseq_optimized_send_count: 18,587,368 (27.8%)
inline_cfunc_optimized_send_count: 6,958,635 (10.4%)
non_variadic_cfunc_optimized_send_count: 12,911,155 (19.3%)
variadic_cfunc_optimized_send_count: 2,664,180 ( 4.0%)
dynamic_getivar_count: 7,365,975
dynamic_setivar_count: 7,245,897
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 760ms
profile_time: 9ms
gc_time: 8ms
invalidation_time: 55ms
vm_write_pc_count: 64,284,053
vm_write_sp_count: 62,940,297
vm_write_locals_count: 62,940,297
vm_write_stack_count: 62,940,297
vm_write_to_parent_iseq_local_count: 292,446
vm_read_from_parent_iseq_local_count: 6,470,923
code_region_bytes: 23,019,520
side_exit_count: 10,860,787
total_insn_count: 517,576,320
vm_insn_count: 163,188,910
zjit_insn_count: 354,387,410
ratio_in_zjit: 68.5%
```
all stats after:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (70.4% of total 15,740,856):
Hash#[]: 4,519,792 (28.7%)
Kernel#is_a?: 1,030,776 ( 6.5%)
String#<<: 851,940 ( 5.4%)
Hash#[]=: 742,914 ( 4.7%)
Regexp#match?: 399,887 ( 2.5%)
String#empty?: 353,775 ( 2.2%)
Hash#key?: 349,139 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 181,788 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,341 ( 1.1%)
BasicObject#!=: 175,996 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (70.9% of total 15,902,999):
Hash#[]: 4,519,802 (28.4%)
Kernel#is_a?: 1,212,667 ( 7.6%)
String#<<: 851,940 ( 5.4%)
Hash#[]=: 743,093 ( 4.7%)
Regexp#match?: 399,887 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,139 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,381 ( 1.3%)
Hash#fetch: 204,702 ( 1.3%)
Kernel#block_given?: 191,657 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.1%)
Kernel#dup: 179,348 ( 1.1%)
BasicObject#!=: 176,180 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.0%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,637):
iseq: 2,271,900 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,695 (21.0%)
alias: 310,746 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,128):
invokesuper: 2,373,401 (55.3%)
invokeblock: 811,890 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,530,605):
send_without_block_polymorphic: 9,722,499 (38.1%)
send_no_profiles: 5,894,763 (23.1%)
send_without_block_not_optimized_method_type: 4,523,637 (17.7%)
not_optimized_instruction: 4,293,128 (16.8%)
send_without_block_no_profiles: 998,732 ( 3.9%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,500):
register_spill_on_alloc: 3,457,792 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,360 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,797):
compile_error: 3,752,500 (34.6%)
guard_type_failure: 2,638,909 (24.3%)
guard_shape_failure: 1,917,203 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%)
unhandled_kwarg: 421,947 ( 3.9%)
patchpoint: 370,474 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,067 ( 1.1%)
unhandled_hir_insn: 76,395 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 469 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,326
dynamic_send_count: 25,530,605 (38.1%)
optimized_send_count: 41,414,721 (61.9%)
iseq_optimized_send_count: 18,587,439 (27.8%)
inline_cfunc_optimized_send_count: 7,086,426 (10.6%)
non_variadic_cfunc_optimized_send_count: 13,076,682 (19.5%)
variadic_cfunc_optimized_send_count: 2,664,174 ( 4.0%)
dynamic_getivar_count: 7,365,985
dynamic_setivar_count: 7,245,954
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 748ms
profile_time: 9ms
gc_time: 8ms
invalidation_time: 58ms
vm_write_pc_count: 64,155,801
vm_write_sp_count: 62,812,041
vm_write_locals_count: 62,812,041
vm_write_stack_count: 62,812,041
vm_write_to_parent_iseq_local_count: 292,448
vm_read_from_parent_iseq_local_count: 6,470,939
code_region_bytes: 23,052,288
side_exit_count: 10,860,797
total_insn_count: 517,576,915
vm_insn_count: 163,192,099
zjit_insn_count: 354,384,816
ratio_in_zjit: 68.5%
```
|
||
|
|
d75207d004
|
ZJIT: Profile opt_ltlt and opt_aset (#14834)
These bring `send_without_block_no_profiles` numbers down dramatically.
On lobsters:
Before: send_without_block_no_profiles: 3,466,375
After: send_without_block_no_profiles: 1,293,375
all stats before:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (70.4% of total 14,174,061):
Hash#[]: 4,519,771 (31.9%)
Kernel#is_a?: 1,030,757 ( 7.3%)
Regexp#match?: 399,885 ( 2.8%)
String#empty?: 353,775 ( 2.5%)
Hash#key?: 349,125 ( 2.5%)
Hash#[]=: 344,348 ( 2.4%)
String#start_with?: 334,961 ( 2.4%)
Kernel#respond_to?: 316,527 ( 2.2%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%)
TrueClass#===: 235,770 ( 1.7%)
FalseClass#===: 231,143 ( 1.6%)
Array#include?: 211,383 ( 1.5%)
Hash#fetch: 204,702 ( 1.4%)
Kernel#block_given?: 181,793 ( 1.3%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%)
Kernel#dup: 179,341 ( 1.3%)
BasicObject#!=: 175,996 ( 1.2%)
Class#new: 168,079 ( 1.2%)
Kernel#kind_of?: 165,600 ( 1.2%)
String#==: 157,734 ( 1.1%)
Top-20 not annotated C methods (71.1% of total 14,336,035):
Hash#[]: 4,519,781 (31.5%)
Kernel#is_a?: 1,212,647 ( 8.5%)
Regexp#match?: 399,885 ( 2.8%)
String#empty?: 361,013 ( 2.5%)
Hash#key?: 349,125 ( 2.4%)
Hash#[]=: 344,348 ( 2.4%)
String#start_with?: 334,961 ( 2.3%)
Kernel#respond_to?: 316,527 ( 2.2%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.7%)
TrueClass#===: 235,770 ( 1.6%)
FalseClass#===: 231,143 ( 1.6%)
Array#include?: 211,383 ( 1.5%)
Hash#fetch: 204,702 ( 1.4%)
Kernel#block_given?: 191,662 ( 1.3%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.3%)
Kernel#dup: 179,348 ( 1.3%)
BasicObject#!=: 176,180 ( 1.2%)
Class#new: 168,079 ( 1.2%)
Kernel#kind_of?: 165,634 ( 1.2%)
String#==: 163,666 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,536,895):
iseq: 2,281,897 (50.3%)
bmethod: 985,679 (21.7%)
optimized: 952,914 (21.0%)
alias: 310,745 ( 6.8%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,123):
invokesuper: 2,373,396 (55.3%)
invokeblock: 811,891 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,227 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 27,795,022):
send_without_block_polymorphic: 9,505,835 (34.2%)
send_no_profiles: 5,894,763 (21.2%)
send_without_block_not_optimized_method_type: 4,536,895 (16.3%)
not_optimized_instruction: 4,293,123 (15.4%)
send_without_block_no_profiles: 3,466,407 (12.5%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,918 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,391):
register_spill_on_alloc: 3,457,680 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,852,021):
compile_error: 3,752,391 (34.6%)
guard_type_failure: 2,630,877 (24.2%)
guard_shape_failure: 1,917,208 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,784 ( 4.9%)
unhandled_kwarg: 421,989 ( 3.9%)
patchpoint: 369,799 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,062 ( 1.1%)
unhandled_hir_insn: 76,394 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 468 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,989,407
dynamic_send_count: 27,795,022 (41.5%)
optimized_send_count: 39,194,385 (58.5%)
iseq_optimized_send_count: 18,060,194 (27.0%)
inline_cfunc_optimized_send_count: 6,960,130 (10.4%)
non_variadic_cfunc_optimized_send_count: 11,523,682 (17.2%)
variadic_cfunc_optimized_send_count: 2,650,379 ( 4.0%)
dynamic_getivar_count: 7,365,982
dynamic_setivar_count: 7,245,929
compiled_iseq_count: 4,795
failed_iseq_count: 449
compile_time: 846ms
profile_time: 12ms
gc_time: 9ms
invalidation_time: 61ms
vm_write_pc_count: 64,326,442
vm_write_sp_count: 62,982,524
vm_write_locals_count: 62,982,524
vm_write_stack_count: 62,982,524
vm_write_to_parent_iseq_local_count: 292,448
vm_read_from_parent_iseq_local_count: 6,471,353
code_region_bytes: 22,708,224
side_exit_count: 10,852,021
total_insn_count: 517,550,288
vm_insn_count: 162,946,459
zjit_insn_count: 354,603,829
ratio_in_zjit: 68.5%
```
all stats after:
```
***ZJIT: Printing ZJIT statistics on exit***
Top-20 not inlined C methods (71.1% of total 15,575,343):
Hash#[]: 4,519,778 (29.0%)
Kernel#is_a?: 1,030,758 ( 6.6%)
String#<<: 851,931 ( 5.5%)
Hash#[]=: 742,938 ( 4.8%)
Regexp#match?: 399,886 ( 2.6%)
String#empty?: 353,775 ( 2.3%)
Hash#key?: 349,127 ( 2.2%)
String#start_with?: 334,961 ( 2.2%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,380 ( 1.4%)
Hash#fetch: 204,701 ( 1.3%)
Kernel#block_given?: 181,792 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,341 ( 1.2%)
BasicObject#!=: 175,997 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,600 ( 1.1%)
Top-20 not annotated C methods (71.6% of total 15,737,486):
Hash#[]: 4,519,788 (28.7%)
Kernel#is_a?: 1,212,649 ( 7.7%)
String#<<: 851,931 ( 5.4%)
Hash#[]=: 743,117 ( 4.7%)
Regexp#match?: 399,886 ( 2.5%)
String#empty?: 361,013 ( 2.3%)
Hash#key?: 349,127 ( 2.2%)
String#start_with?: 334,961 ( 2.1%)
Kernel#respond_to?: 316,529 ( 2.0%)
ObjectSpace::WeakKeyMap#[]: 238,978 ( 1.5%)
TrueClass#===: 235,771 ( 1.5%)
FalseClass#===: 231,144 ( 1.5%)
Array#include?: 211,380 ( 1.3%)
Hash#fetch: 204,701 ( 1.3%)
Kernel#block_given?: 191,661 ( 1.2%)
ActiveSupport::OrderedOptions#_get: 181,272 ( 1.2%)
Kernel#dup: 179,348 ( 1.1%)
BasicObject#!=: 176,181 ( 1.1%)
Class#new: 168,079 ( 1.1%)
Kernel#kind_of?: 165,634 ( 1.1%)
Top-2 not optimized method types for send (100.0% of total 72,318):
cfunc: 48,055 (66.4%)
iseq: 24,263 (33.6%)
Top-6 not optimized method types for send_without_block (100.0% of total 4,523,650):
iseq: 2,271,911 (50.2%)
bmethod: 985,636 (21.8%)
optimized: 949,696 (21.0%)
alias: 310,747 ( 6.9%)
null: 5,106 ( 0.1%)
cfunc: 554 ( 0.0%)
Top-13 not optimized instructions (100.0% of total 4,293,126):
invokesuper: 2,373,395 (55.3%)
invokeblock: 811,894 (18.9%)
sendforward: 505,449 (11.8%)
opt_eq: 451,754 (10.5%)
opt_plus: 74,403 ( 1.7%)
opt_minus: 36,228 ( 0.8%)
opt_send_without_block: 21,792 ( 0.5%)
opt_neq: 7,231 ( 0.2%)
opt_mult: 6,752 ( 0.2%)
opt_or: 3,753 ( 0.1%)
opt_lt: 348 ( 0.0%)
opt_ge: 91 ( 0.0%)
opt_gt: 36 ( 0.0%)
Top-9 send fallback reasons (100.0% of total 25,824,512):
send_without_block_polymorphic: 9,721,725 (37.6%)
send_no_profiles: 5,894,761 (22.8%)
send_without_block_not_optimized_method_type: 4,523,650 (17.5%)
not_optimized_instruction: 4,293,126 (16.6%)
send_without_block_no_profiles: 1,293,404 ( 5.0%)
send_not_optimized_method_type: 72,318 ( 0.3%)
send_without_block_cfunc_array_variadic: 15,134 ( 0.1%)
obj_to_string_not_string: 9,765 ( 0.0%)
send_without_block_direct_too_many_args: 629 ( 0.0%)
Top-9 unhandled YARV insns (100.0% of total 690,482):
expandarray: 328,490 (47.6%)
checkkeyword: 190,694 (27.6%)
getclassvariable: 59,901 ( 8.7%)
invokesuperforward: 49,503 ( 7.2%)
getblockparam: 48,651 ( 7.0%)
opt_duparray_send: 11,978 ( 1.7%)
getconstant: 952 ( 0.1%)
checkmatch: 290 ( 0.0%)
once: 23 ( 0.0%)
Top-3 compile error reasons (100.0% of total 3,752,504):
register_spill_on_alloc: 3,457,793 (92.1%)
register_spill_on_ccall: 176,348 ( 4.7%)
exception_handler: 118,363 ( 3.2%)
Top-14 side exit reasons (100.0% of total 10,860,754):
compile_error: 3,752,504 (34.6%)
guard_type_failure: 2,638,901 (24.3%)
guard_shape_failure: 1,917,198 (17.7%)
unhandled_yarv_insn: 690,482 ( 6.4%)
block_param_proxy_not_iseq_or_ifunc: 535,785 ( 4.9%)
unhandled_kwarg: 421,947 ( 3.9%)
patchpoint: 370,447 ( 3.4%)
unknown_newarray_send: 314,786 ( 2.9%)
unhandled_splat: 122,065 ( 1.1%)
unhandled_hir_insn: 76,395 ( 0.7%)
block_param_proxy_modified: 19,193 ( 0.2%)
obj_to_string_fallback: 566 ( 0.0%)
interrupt: 463 ( 0.0%)
guard_type_not_failure: 22 ( 0.0%)
send_count: 66,945,926
dynamic_send_count: 25,824,512 (38.6%)
optimized_send_count: 41,121,414 (61.4%)
iseq_optimized_send_count: 18,587,430 (27.8%)
inline_cfunc_optimized_send_count: 6,958,641 (10.4%)
non_variadic_cfunc_optimized_send_count: 12,911,166 (19.3%)
variadic_cfunc_optimized_send_count: 2,664,177 ( 4.0%)
dynamic_getivar_count: 7,365,985
dynamic_setivar_count: 7,245,942
compiled_iseq_count: 4,794
failed_iseq_count: 450
compile_time: 852ms
profile_time: 13ms
gc_time: 11ms
invalidation_time: 63ms
vm_write_pc_count: 64,284,194
vm_write_sp_count: 62,940,427
vm_write_locals_count: 62,940,427
vm_write_stack_count: 62,940,427
vm_write_to_parent_iseq_local_count: 292,447
vm_read_from_parent_iseq_local_count: 6,470,931
code_region_bytes: 23,019,520
side_exit_count: 10,860,754
total_insn_count: 517,576,267
vm_insn_count: 163,188,187
zjit_insn_count: 354,388,080
ratio_in_zjit: 68.5%
```
|
||
|
|
d7f2a1ec9a
|
ZJIT: Profile opt_aref (#14778)
* ZJIT: Profile opt_aref * ZJIT: Add test for opt_aref * ZJIT: Move test and add hash opt test * ZJIT: Update zjit bindgen * ZJIT: Add inspect calls to opt_aref tests |
||
|
|
4f47327287 |
Update current namespace management by using control frames and lexical contexts
to fix inconsistent and wrong current namespace detections. This includes: * Moving load_path and related things from rb_vm_t to rb_namespace_t to simplify accessing those values via namespace (instead of accessing either vm or ns) * Initializing root_namespace earlier and consolidate builtin_namespace into root_namespace * Adding VM_FRAME_FLAG_NS_REQUIRE for checkpoints to detect a namespace to load/require files * Removing implicit refinements in the root namespace which was used to determine the namespace to be loaded (replaced by VM_FRAME_FLAG_NS_REQUIRE) * Removing namespaces from rb_proc_t because its namespace can be identified by lexical context * Starting to use ep[VM_ENV_DATA_INDEX_SPECVAL] to store the current namespace when the frame type is MAGIC_TOP or MAGIC_CLASS (block handlers don't exist in this case) |
||
|
|
f7fe43610b
|
ZJIT: Optimize ObjToString with type guards (#14469)
* failing test for ObjToString optimization with GuardType * profile ObjToString receiver and rewrite with guard * adjust integration tests for objtostring type guard optimization * Implement new GuardTypeNot HIR; objtostring sends to_s directly on profiled nonstrings * codegen for GuardTypeNot * typo fixes * better name for tests; fix side exit reason for GuardTypeNot * revert accidental change * make bindgen * Fix is_string to identify subclasses of String; fix codegen for identifying if val is String |
||
|
|
799d57bd01 |
insns.def: Drop unused leafness_of_check_ints
It was used to let MJIT override the leafness of the instruction when it decides to remove check_ints for it. Now that MJIT is gone, nobody needs to "override" the leafness using this. |
||
|
|
b6f4b5399d
|
ZJIT: Specialize monomorphic GetIvar (#14388)
Specialize monomorphic `GetIvar` into: * `GuardType(HeapObject)` * `GuardShape` * `LoadIvarEmbedded` or `LoadIvarExtended` This requires profiling self for `getinstancevariable` (it's not on the operand stack). This also optimizes `GetIvar`s that happen as a result of inlining `attr_reader` and `attr_accessor`. Also move some (newly) shared JIT helpers into jit.c. |
||
|
|
4652879f43
|
ZJIT: Specialize some Sends (#14363)
* ZJIT: Profile and specialize Array#empty? * ZJIT: Specialize BasicObject#== * ZJIT: Specialize Hash#empty? * ZJIT: Specialize BasicObject#! Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> |
||
|
|
fb6e3a8000 |
Remove opt_aref_with and opt_aset_with
When these instructions were introduced it was common to read from a hash with mutable string literals. However, these days, I think these instructions are fairly rare. I tested this with the lobsters benchmark, and saw no difference in speed. In order to be sure, I tracked down every use of this instruction in the lobsters benchmark, and there were only 4 places where it was used. Additionally, this patch fixes a case where "chilled strings" should emit a warning but they don't. ```ruby class Foo def self.[](x)= x.gsub!(/hello/, "hi") end Foo["hello world"] ``` Removing these instructions shows this warning: ``` > ./miniruby -vw test.rb ruby 3.5.0dev (2025-08-25T21:36:50Z rm-opt_aref_with dca08e286c) +PRISM [arm64-darwin24] test.rb:2: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information) ``` [Feature #21553] |
||
|
|
10b582dab6 |
ZJIT: Profile opt_and and opt_or instructions
|
||
|
|
79915e6f78 |
ZJIT: Profile nil? calls
This allows ZJIT to profile `nil?` calls and create type guards for its receiver. - Add `zjit_profile` to `opt_nil_p` insn - Start profiling `opt_nil_p` calls - Use `runtime_exact_ruby_class` instead of `exact_ruby_class` to determine the profiled receiver class |
||
|
|
55c9c75b47 |
Maintain same behavior regardless of tracepoint state
Always use opt_new behavior regardless of tracepoint state. |
||
|
|
b42c8398ba |
Don't support blockarg in opt_new
We don't calculate the correct argc so the bookkeeping slot is something else (unexpected) instead of Qnil (expected). |
||
|
|
ec3b48d3da | Deopt if iseq trace events are enabled | ||
|
|
8ac8225c50 |
Inline Class#new.
This commit inlines instructions for Class#new. To make this work, we
added a new YARV instructions, `opt_new`. `opt_new` checks whether or
not the `new` method is the default allocator method. If it is, it
allocates the object, and pushes the instance on the stack. If not, the
instruction jumps to the "slow path" method call instructions.
Old instructions:
```
> ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
```
New instructions:
```
> ./miniruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
```
This commit speeds up basic object allocation (`Foo.new`) by 60%, but
classes that take keyword parameters see an even bigger benefit because
no hash is allocated when instantiating the object (3x to 6x faster).
Here is an example that uses `Hash.new(capacity: 0)`:
```
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.004 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.076 s … 1.088 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.9 ms ± 3.5 ms [User: 622.7 ms, System: 4.8 ms]
Range (min … max): 622.7 ms … 633.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.01 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
```
This commit changes the backtrace for `initialize`:
```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize
puts caller
end
end
def hello
Foo.new
end
hello
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:8:in 'Class#new'
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
aaron@tc ~/g/ruby (inline-new)> ./miniruby -v test.rb
ruby 3.5.0dev (2025-03-28T23:59:40Z inline-new c4157884e4) +PRISM [arm64-darwin24]
test.rb:8:in 'Object#hello'
test.rb:11:in '<main>'
```
It also increases memory usage for calls to `new` by 122 bytes:
```
aaron@tc ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def hello
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:hello)))
aaron@tc ~/g/ruby (inline-new)> make runruby
RUBY_ON_BUG='gdb -x ./.gdbinit -p' ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems ./test.rb
656
aaron@tc ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
```
Thanks to @ko1 for coming up with this idea!
Co-Authored-By: John Hawthorn <john@hawthorn.email>
|
||
|
|
2cf769376b |
Add profiling for opt_send_without_block
Split out from the CCall changes since we discussed during pairing that this is useful to unblock some other changes. No tests since no one consumes this profiling data yet. |
||
|
|
8b2a4625cb |
Profile instructions for fixnum arithmetic (https://github.com/Shopify/zjit/pull/24)
* Profile instructions for fixnum arithmetic * Drop PartialEq from Type * Do not push PatchPoint onto the stack * Avoid pushing the output of the guards * Pop operands after guards * Test HIR from profiled runs * Implement Display for new instructions * Drop unused FIXNUM_BITS * Use a Rust function to split lines * Use Display for GuardType operands Co-authored-by: Max Bernstein <max@bernsteinbear.com> * Fix tests with Display-ed values --------- Co-authored-by: Max Bernstein <max@bernsteinbear.com> |
||
|
|
0a543daf15 |
Add zjit_* instructions to profile the interpreter (https://github.com/Shopify/zjit/pull/16)
* Add zjit_* instructions to profile the interpreter * Rename FixnumPlus to FixnumAdd * Update a comment about Invalidate * Rename Guard to GuardType * Rename Invalidate to PatchPoint * Drop unneeded debug!() * Plan on profiling the types * Use the output of GuardType as type refined outputs |
||
|
|
c9d433947e
|
Adjust style [ci skip] | ||
|
|
ae7890df33 |
Use the EC parameter in instructions.
The forwarding instructions should use the `ec` parameter passed to vm_exec_core instead of trying to look up the EC via `GET_EC()`. It's cheaper to get the local than to try looking up a global |
||
|
|
1dd40ec18a
|
Optimize instructions when creating an array just to call include? (#12123)
* Add opt_duparray_send insn to skip the allocation on `#include?` If the method isn't going to modify the array we don't need to copy it. This avoids the allocation / array copy for things like `[:a, :b].include?(x)`. This adds a BOP for include? and tracks redefinition for it on Array. Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * YJIT: Implement opt_duparray_send include_p Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> * Update opt_newarray_send to support simple forms of include?(arg) Similar to opt_duparray_send but for non-static arrays. * YJIT: Implement opt_newarray_send include_p --------- Co-authored-by: Andrew Novoselac <andrew.novoselac@shopify.com> |
||
|
|
bf9879791a |
Optimized instruction for Hash#freeze
If a Hash which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_hash_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> |
||
|
|
a99707cd9c |
Optimized instruction for Array#freeze
If an Array which is empty or only using literals is frozen, we detect this as a peephole optimization and change the instructions to be `opt_ary_freeze`. [Feature #20684] Co-authored-by: Jean Boussier <byroot@ruby-lang.org> |
||
|
|
525008cd78
|
Delete newarraykwsplat
The pushtoarraykwsplat instruction was designed to replace newarraykwsplat, and we now meet the condition for deletion mentioned in 77c1233f79a0f96a081b70da533fbbde4f3037fa. |
||
|
|
acbb8d4fb5 |
Expand opt_newarray_send to support Array#pack with buffer keyword arg
Use an enum for the method arg instead of needing to add an id
that doesn't map to an actual method name.
$ ruby --dump=insns -e 'b = "x"; [v].pack("E*", buffer: b)'
before:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring "x" ( 1)[Li]
0002 setlocal_WC_0 b@0
0004 putself
0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 newarray 1
0009 putchilledstring "E*"
0011 getlocal_WC_0 b@0
0013 opt_send_without_block <calldata!mid:pack, argc:2, kw:[#<Symbol:0x000000000023110c>], KWARG>
0015 leave
```
after:
```
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,34)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] b@0
0000 putchilledstring "x" ( 1)[Li]
0002 setlocal_WC_0 b@0
0004 putself
0005 opt_send_without_block <calldata!mid:v, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0007 putchilledstring "E*"
0009 getlocal b@0, 0
0012 opt_newarray_send 3, 5
0015 leave
```
|
||
|
|
a661c82972 |
Refactor so we don't have _cd
This should make the diff more clean |
||
|
|
cc97a27008 |
Add two new instructions for forwarding calls
This commit adds `sendforward` and `invokesuperforward` for forwarding parameters to calls Co-authored-by: Matt Valentine-House <matt@eightbitraptor.com> |
||
|
|
cdf33ed5f3 |
Optimized forwarding callers and callees
This patch optimizes forwarding callers and callees. It only optimizes methods that only take `...` as their parameter, and then pass `...` to other calls.
Calls it optimizes look like this:
```ruby
def bar(a) = a
def foo(...) = bar(...) # optimized
foo(123)
```
```ruby
def bar(a) = a
def foo(...) = bar(1, 2, ...) # optimized
foo(123)
```
```ruby
def bar(*a) = a
def foo(...)
list = [1, 2]
bar(*list, ...) # optimized
end
foo(123)
```
All variants of the above but using `super` are also optimized, including a bare super like this:
```ruby
def foo(...)
super
end
```
This patch eliminates intermediate allocations made when calling methods that accept `...`.
We can observe allocation elimination like this:
```ruby
def m
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def bar(a) = a
def foo(...) = bar(...)
def test
m { foo(123) }
end
test
p test # allocates 1 object on master, but 0 objects with this patch
```
```ruby
def bar(a, b:) = a + b
def foo(...) = bar(...)
def test
m { foo(1, b: 2) }
end
test
p test # allocates 2 objects on master, but 0 objects with this patch
```
How does it work?
-----------------
This patch works by using a dynamic stack size when passing forwarded parameters to callees.
The caller's info object (known as the "CI") contains the stack size of the
parameters, so we pass the CI object itself as a parameter to the callee.
When forwarding parameters, the forwarding ISeq uses the caller's CI to determine how much stack to copy, then copies the caller's stack before calling the callee.
The CI at the forwarded call site is adjusted using information from the caller's CI.
I think this description is kind of confusing, so let's walk through an example with code.
```ruby
def delegatee(a, b) = a + b
def delegator(...)
delegatee(...) # CI2 (FORWARDING)
end
def caller
delegator(1, 2) # CI1 (argc: 2)
end
```
Before we call the delegator method, the stack looks like this:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # |
5| delegatee(...) # CI2 (FORWARDING) |
6| end |
7| |
8| def caller |
-> 9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The ISeq for `delegator` is tagged as "forwardable", so when `caller` calls in
to `delegator`, it writes `CI1` on to the stack as a local variable for the
`delegator` method. The `delegator` method has a special local called `...`
that holds the caller's CI object.
Here is the ISeq disasm fo `delegator`:
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
The local called `...` will contain the caller's CI: CI1.
Here is the stack when we enter `delegator`:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
-> 4| # | CI1 (argc: 2)
5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller |
9| delegator(1, 2) # CI1 (argc: 2) |
10| end |
```
The CI at `delegatee` on line 5 is tagged as "FORWARDING", so it knows to
memcopy the caller's stack before calling `delegatee`. In this case, it will
memcopy self, 1, and 2 to the stack before calling `delegatee`. It knows how much
memory to copy from the caller because `CI1` contains stack size information
(argc: 2).
Before executing the `send` instruction, we push `...` on the stack. The
`send` instruction pops `...`, and because it is tagged with `FORWARDING`, it
knows to memcopy (using the information in the CI it just popped):
```
== disasm: #<ISeq:delegator@-e:1 (1,0)-(1,39)>
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] "..."@0
0000 putself ( 1)[LiCa]
0001 getlocal_WC_0 "..."@0
0003 send <calldata!mid:delegatee, argc:0, FCALL|FORWARDING>, nil
0006 leave [Re]
```
Instruction 001 puts the caller's CI on the stack. `send` is tagged with
FORWARDING, so it reads the CI and _copies_ the callers stack to this stack:
```
Executing Line | Code | Stack
---------------+---------------------------------------+--------
1| def delegatee(a, b) = a + b | self
2| | 1
3| def delegator(...) | 2
4| # | CI1 (argc: 2)
-> 5| delegatee(...) # CI2 (FORWARDING) | cref_or_me
6| end | specval
7| | type
8| def caller | self
9| delegator(1, 2) # CI1 (argc: 2) | 1
10| end | 2
```
The "FORWARDING" call site combines information from CI1 with CI2 in order
to support passing other values in addition to the `...` value, as well as
perfectly forward splat args, kwargs, etc.
Since we're able to copy the stack from `caller` in to `delegator`'s stack, we
can avoid allocating objects.
I want to do this to eliminate object allocations for delegate methods.
My long term goal is to implement `Class#new` in Ruby and it uses `...`.
I was able to implement `Class#new` in Ruby
[here](https://github.com/ruby/ruby/pull/9289).
If we adopt the technique in this patch, then we can optimize allocating
objects that take keyword parameters for `initialize`.
For example, this code will allocate 2 objects: one for `SomeObject`, and one
for the kwargs:
```ruby
SomeObject.new(foo: 1)
```
If we combine this technique, plus implement `Class#new` in Ruby, then we can
reduce allocations for this common operation.
Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Alan Wu <XrXr@users.noreply.github.com>
|
||
|
|
730e3b2ce0 |
Stop exposing rb_str_chilled_p
[Feature #20205] Now that chilled strings no longer appear as frozen, there is no need to offer an API to check for chilled strings. We however need to change `rb_check_frozen_internal` to no longer be a macro, as it needs to check for chilled strings. |
||
|
|
49fcd33e13 |
Introduce a specialize instruction for Array#pack
Instructions for this code:
```ruby
# frozen_string_literal: true
[a].pack("C")
```
Before this commit:
```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself ( 3)[Li]
0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 newarray 1
0005 putobject "C"
0007 opt_send_without_block <calldata!mid:pack, argc:1, ARGS_SIMPLE>
0009 leave
```
After this commit:
```
== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)>
0000 putself ( 3)[Li]
0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0003 putobject "C"
0005 opt_newarray_send 2, :pack
0008 leave
```
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
||
|
|
12be40ae6b |
Implement chilled strings
[Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <byroot@ruby-lang.org> |
||
|
|
77c1233f79 |
Add pushtoarraykwsplat instruction to avoid unnecessary array allocation
This is designed to replace the newarraykwsplat instruction, which is
no longer used in the parse.y compiler after this commit. This avoids
an unnecessary array allocation in the case where ARGSCAT is followed
by LIST with keyword:
```ruby
a = []
kw = {}
[*a, 1, **kw]
```
Previous Instructions:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 newhash 0 ( 2)[Li]
0006 setlocal_WC_0 kw@1
0008 getlocal_WC_0 a@0 ( 3)[Li]
0010 splatarray true
0012 putobject_INT2FIX_1_
0013 putspecialobject 1
0015 newhash 0
0017 getlocal_WC_0 kw@1
0019 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>
0021 newarraykwsplat 2
0023 concattoarray
0024 leave
```
New Instructions:
```
0000 newarray 0 ( 1)[Li]
0002 setlocal_WC_0 a@0
0004 newhash 0 ( 2)[Li]
0006 setlocal_WC_0 kw@1
0008 getlocal_WC_0 a@0 ( 3)[Li]
0010 splatarray true
0012 putobject_INT2FIX_1_
0013 pushtoarray 1
0015 putspecialobject 1
0017 newhash 0
0019 getlocal_WC_0 kw@1
0021 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE>
0023 pushtoarraykwsplat
0024 leave
```
pushtoarraykwsplat is designed to be simpler than newarraykwsplat.
It does not take a variable number of arguments from the stack, it
pops the top of the stack, and appends it to the second from the top,
unless the top of the stack is an empty hash.
During this work, I found the ARGSPUSH followed by HASH with keyword
did not compile correctly, as it pushed the generated hash to the
array even if the hash was empty. This fixes the behavior, to use
pushtoarraykwsplat instead of pushtoarray in that case:
```ruby
a = []
kw = {}
[*a, **kw]
[{}] # Before
[] # After
```
This does not remove the newarraykwsplat instruction, as it is still
referenced in the prism compiler (which should be updated similar
to this), YJIT (only in the bindings, it does not appear to be
implemented), and RJIT (in a couple comments). After those are
updated, the newarraykwsplat instruction should be removed.
|
||
|
|
e878bbd641 |
Allow foo(**nil, &block_arg)
Previously, `**nil` by itself worked, but if you add a block argument, it raised a conversion error. The presence of the block argument shouldn't change how keyword splat works. See: <https://bugs.ruby-lang.org/issues/20064> |
||
|
|
361b3efe16
|
Use UNDEF_P
|
||
|
|
b8516d6d01 |
Add pushtoarray VM instruction
This instruction is similar to concattoarray, but it takes the number of arguments to push to the array, removes that number of arguments from the stack, and adds them to the array now at the top of the stack. This allows `f(*a, 1)` to allocate only a single array on the caller side (which can be reused on the callee side in the case of `def f(*a)`). Prior to this commit, `f(*a, 1)` would generate 3 arrays: * a dupped by splatarray true * 1 wrapped in array by newarray * a dupped again by concatarray Instructions Before for `a = []; f(*a, 1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 putobject_INT2FIX_1_ 0010 newarray 1 0012 concatarray 0013 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|FCALL> 0015 leave ``` Instructions After for `a = []; f(*a, 1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 putobject_INT2FIX_1_ 0010 pushtoarray 1 0012 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL> 0014 leave ``` With these changes, method calls to Ruby methods should implicitly allocate at most one array. Ignore typeprof bundled gem failure due to unrecognized instruction. |
||
|
|
6e06d0d180 |
Add concattoarray VM instruction
This instruction is similar to concatarray, but assumes the first object is already an array, and appends to it directly. This is different than concatarray, which will create a new array instead of appending to an existing array. Additionally, for both concatarray and concattoarray, if the second argument cannot be converted to an array, then just push it onto the array, instead of creating a new array to wrap it, and then using concat array. This saves an array allocation in that case. This allows `f(*a, *a, *1)` to allocate only a single array on the caller side (which can be reused on the callee side in the case of `def f(*a)`). Prior to this commit, `f(*a, *a, *1)` would generate 4 arrays: * a dupped by splatarray true * a dupped again by first concatarray * 1 wrapped in array by third splatarray * result of [*a, *a] dupped by second concatarray Instructions Before for `a = []; f(*a, *a, *1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 getlocal_WC_0 a@0 0011 splatarray false 0013 concatarray 0014 putobject_INT2FIX_1_ 0015 splatarray false 0017 concatarray 0018 opt_send_without_block <calldata!mid:g, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL> 0020 leave ``` Instructions After for `a = []; f(*a, *a, *1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 getlocal_WC_0 a@0 0011 concattoarray 0012 putobject_INT2FIX_1_ 0013 concattoarray 0014 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL> 0016 leave ``` |
||
|
|
a950f23078 |
Ensure f(**kw, &block) calls kw.to_hash before block.to_proc
Previously, block.to_proc was called first, by vm_caller_setup_arg_block. kw.to_hash was called later inside CALLER_SETUP_ARG or setup_parameters_complex. This adds a splatkw instruction that is inserted before sends with ARGS_BLOCKARG and KW_SPLAT and without KW_SPLAT_MUT. This is not needed in the KW_SPLAT_MUT case, because then you know the value is a hash, and you don't need to call to_hash on it. The splatkw instruction checks whether the second to top block is a hash, and if not, replaces it with the value of calling to_hash on it (using rb_to_hash_type). As it is always before a send with ARGS_BLOCKARG and KW_SPLAT, second to top is the keyword splat, and top is the passed block. |
||
|
|
0aed37b973 |
Make expandarray compaction safe
The expandarray instruction can allocate an array, which can trigger a GC compaction. However, since it does not increment the sp until the end of the instruction, the objects it places on the stack are not marked or reference updated by the GC, which can cause the objects to move which leaves broken or incorrect objects on the stack. This commit changes the instruction to be handles_sp so the sp is incremented inside of the instruction right after the object is written on the stack. |
||
|
|
5808999d30
|
YJIT: Fallback opt_getconstant_path for const_missing (#8623)
* YJIT: Fallback opt_getconstant_path for const_missing * Fix a comment [ci skip] * Remove a wrapper function |
||
|
|
1702b0f438
|
Remove unused opt_call_c_function insn (#7750) | ||
|
|
c5fc1ce975 |
Emit special instruction for array literal + .(hash|min|max)
This commit introduces a new instruction `opt_newarray_send` which is used when there is an array literal followed by either the `hash`, `min`, or `max` method. ``` [a, b, c].hash ``` Will emit an `opt_newarray_send` instruction. This instruction falls back to a method call if the "interested" method has been monkey patched. Here are some examples of the instructions generated: ``` $ ./miniruby --dump=insns -e '[@a, @b].max' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :max 0009 leave $ ./miniruby --dump=insns -e '[@a, @b].min' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :min 0009 leave $ ./miniruby --dump=insns -e '[@a, @b].hash' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE) 0000 getinstancevariable :@a, <is:0> ( 1)[Li] 0003 getinstancevariable :@b, <is:1> 0006 opt_newarray_send 2, :hash 0009 leave ``` [Feature #18897] [ruby-core:109147] Co-authored-by: John Hawthorn <jhawthorn@github.com> |
||
|
|
9947574b9c |
Refactor jit_func_t and jit_exec
I closed https://github.com/ruby/ruby/pull/7543, but part of the diff seems useful regardless, so I extracted it. |
||
|
|
9a43c63d43
|
YJIT: Implement throw instruction (#7491)
* Break up jit_exec from vm_sendish * YJIT: Implement throw instruction * YJIT: Explain what rb_vm_throw does [ci skip] |