mirror of https://github.com/ruby/ruby.git synced 2026-01-27 12:34:21 +00:00

History

Jeremy Evans e4f85bfc31 Implement Set as a core class

Set has been an autoloaded standard library since Ruby 3.2.
The standard library Set is less efficient than it could be, as it
uses Hash for storage, which stores unnecessary values for each key.

Implementation details:

* Core Set uses a modified version of `st_table`, named `set_table`.
  than `s/st_/set_/`, the main difference is that the stored records
  do not have values, making them 1/3 smaller. `st_table_entry` stores
  `hash`, `key`, and `record` (value), while `set_table_entry` only
  stores `hash` and `key`.  This results in large sets using ~33% less
  memory compared to stdlib Set.  For small sets, core Set uses 12% more
  memory (160 byte object slot and 64 malloc bytes, while stdlib set
  uses 40 for Set and 160 for Hash).  More memory is used because
  the set_table is embedded and 72 bytes in the object slot are
  currently wasted. Hopefully we can make this more efficient and have
  it stored in an 80 byte object slot in the future.

* All methods are implemented as cfuncs, except the pretty_print
  methods, which were moved to `lib/pp.rb` (which is where the
  pretty_print methods for other core classes are defined).  As is
  typical for core classes, internal calls call C functions and
  not Ruby methods.  For example, to check if something is a Set,
  `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the
  related object.

* Almost all methods use the same algorithm that the pure-Ruby
  implementation used.  The exception is when calling `Set#divide` with a
  block with 2-arity.  The pure-Ruby method used tsort to implement this.
  I developed an algorithm that only allocates a single intermediate
  hash and does not need tsort.

* The `flatten_merge` protected method is no longer necessary, so it
  is not implemented (it could be).

* Similar to Hash/Array, subclasses of Set are no longer reflected in
  `inspect` output.

* RDoc from stdlib Set was moved to core Set, with minor updates.

This includes a comprehensive benchmark suite for all public Set
methods.  As you would expect, the native version is faster in the
vast majority of cases, and multiple times faster in many cases.
There are a few cases where it is significantly slower:

* Set.new with no arguments (~1.6x)
* Set#compare_by_identity for small sets (~1.3x)
* Set#clone for small sets (~1.5x)
* Set#dup for small sets (~1.7x)

These are slower as Set does not currently use the AR table
optimization that Hash does, so a new set_table is initialized for
each call.  I'm not sure it's worth the complexity to have an AR
table-like optimization for small sets (for hashes it makes sense,
as small hashes are used everywhere in Ruby).

The rbs and repl_type_completor bundled gems will need updates to
support core Set.  The pull request marks them as allowed failures.

This passes all set tests with no changes.  The following specs
needed modification:

* Modifying frozen set error message (changed for the better)
* `Set#divide` when passed a 2-arity block no longer yields the same
  object as both the first and second argument (this seems like an issue
  with the previous implementation).
* Set-like objects that override `is_a?` such that `is_a?(Set)` return
  `true` are no longer treated as Set instances.
* `Set.allocate.hash` is no longer the same as `nil.hash`
* `Set#join` no longer calls `Set#to_a` (it calls the underlying C
   function).
* `Set#flatten_merge` protected method is not implemented.

Previously, `set.rb` added a `SortedSet` autoload, which loads
`set/sorted_set.rb`.  This replaces the `Set` autoload in `prelude.rb`
with a `SortedSet` autoload, but I recommend removing it and
`set/sorted_set.rb`.

This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`,
reflecting that switch to a core class.  This does not move the spec
files, as I'm not sure how they should be handled.

Internally, this uses the st_* types and functions as much as
possible, and only adds set_* types and functions as needed.
The underlying set_table implementation is stored in st.c, but
there is no public C-API for it, nor is there one planned, in
order to keep the ability to change the internals going forward.

For internal uses of st_table with Qtrue values, those can
probably be replaced with set_table.  To do that, include
internal/set_table.h.  To handle symbol visibility (rb_ prefix),
internal/set_table.h uses the same macro approach that
include/ruby/st.h uses.

The Set class (rb_cSet) and all methods are defined in set.c.
There isn't currently a C-API for the Set class, though C-API
functions can be added as needed going forward.

Implements [Feature #21216]

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Oliver Nutter <mrnoname1000@riseup.net>

2025-04-26 10:31:11 +09:00

…

lib

…

other-lang

…

app_answer.rb

…

app_aobench.rb

typo otherBasis -> orthoBasis

2024-09-13 14:35:25 +09:00

app_erb.yml

…

app_factorial.rb

…

app_fib.rb

* remove trailing spaces. [ci skip]

2025-04-18 00:10:03 +00:00

app_lc_fizzbuzz.rb

…

app_mandelbrot.rb

…

app_pentomino.rb

…

app_raise.rb

…

app_strconcat.rb

…

app_tak.rb

…

app_tarai.rb

…

app_uri.rb

…

array_flatten.yml

…

array_intersection.yml

…

array_large_literal.yml

Optimize compilation of large literal arrays

2024-01-27 10:16:52 -08:00

array_max_float.yml

…

array_max_int.yml

…

array_max_str.yml

…

array_min.yml

…

array_sample_100k_10.rb

…

array_sample_100k_11.rb

…

array_sample_100k__1k.rb

…

array_sample_100k__6k.rb

…

array_sample_100k__100.rb

…

array_sample_100k___10k.rb

…

array_sample_100k___50k.rb

…

array_sample.yml

…

array_shift.rb

…

array_small_and.rb

…

array_small_diff.rb

…

array_small_or.rb

…

array_sort_block.rb

…

array_sort_float.rb

…

array_sort_int.yml

…

array_values_at_int.rb

…

array_values_at_range.rb

…

attr_accessor.yml

…

bighash.rb

…

buffer_each.yml

…

buffer_get.yml

…

cgi_escape_html.yml

…

complex_float_add.yml

…

complex_float_div.yml

…

complex_float_mul.yml

…

complex_float_new.yml

…

complex_float_power.yml

…

complex_float_sub.yml

…

constant_invalidation.rb

…

dir_empty_p.rb

…

enum_lazy_flat_map.yml

…

enum_lazy_grep_v_20.rb

…

enum_lazy_grep_v_50.rb

…

enum_lazy_grep_v_100.rb

…

enum_lazy_uniq_20.rb

…

enum_lazy_uniq_50.rb

…

enum_lazy_uniq_100.rb

…

enum_lazy_zip.yml

…

enum_minmax.yml

…

enum_sort_by.yml

…

enum_sort.yml

…

enum_tally.yml

…

erb_escape_html.yml

…

erb_render.yml

…

fiber_chain.yml

…

fiber_locals.yml

…

file_chmod.rb

…

file_rename.rb

…

float_methods.yml

…

float_neg_posi.yml

…

float_to_s.yml

…

hash_aref_array.rb

…

hash_aref_dsym_long.rb

…

hash_aref_dsym.rb

…

hash_aref_fix.rb

…

hash_aref_flo.rb

…

hash_aref_miss.rb

…

hash_aref_str_lit.yml

Precompute embedded string literals hash code

2024-05-28 07:32:41 +02:00

hash_aref_str.rb

…

hash_aref_sym_long.rb

…

hash_aref_sym.rb

…

hash_defaults.yml

…

hash_dup.yml

…

hash_first.yml

…

hash_flatten.rb

…

hash_ident_flo.rb

…

hash_ident_num.rb

…

hash_ident_obj.rb

…

hash_ident_str.rb

…

hash_ident_sym.rb

…

hash_key.yml

Optimize Hash methods with Kernel#hash (#10160 )

2024-03-01 11:16:31 -08:00

hash_keys.rb

…

hash_literal_small2.rb

…

hash_literal_small4.rb

…

hash_literal_small8.rb

…

hash_long.rb

…

hash_new.yml

Implement Hash.new(capacity:)

2024-07-08 12:24:33 +02:00

hash_shift_u16.rb

…

hash_shift_u24.rb

…

hash_shift_u32.rb

…

hash_shift.rb

…

hash_small2.rb

…

hash_small4.rb

…

hash_small8.rb

…

hash_to_proc.rb

…

hash_values.rb

…

int_quo.rb

…

io_copy_stream_write_socket.rb

…

io_copy_stream_write.rb

…

io_file_create.rb

…

io_file_read.rb

…

io_file_write.rb

…

io_nonblock_noex2.rb

…

io_nonblock_noex.rb

…

io_pipe_rw.rb

…

io_select2.rb

…

io_select3.rb

…

io_select.rb

…

io_write.rb

…

irb_color.yml

…

irb_exec.yml

…

iseq_load_from_binary.yml

…

ivar_extend.yml

…

kernel_clone.yml

…

kernel_float.yml

…

kernel_tap.yml

…

kernel_then.yml

…

keyword_arguments.yml

…

loop_each.yml

Rewrite Array#each in Ruby using Primitive (#9533 )

2024-01-23 20:09:57 +00:00

loop_for.rb

…

loop_generator.rb

…

loop_times_megamorphic.yml

YJIT: Allow inlining ISEQ calls with a block (#9622 )

2024-01-23 19:36:23 +00:00

loop_times.rb

…

loop_whileloop2.rb

…

loop_whileloop.rb

…

marshal_dump_flo.rb

…

marshal_dump_load_geniv.rb

…

marshal_dump_load_integer.yml

…

marshal_dump_load_time.rb

…

masgn.yml

…

match_gt4.rb

…

match_small.rb

…

method_bind_call.yml

…

module_eqq.yml

…

nil_p.yml

…

nilclass.yml

Avoid array allocation for *nil, by not calling nil.to_a

2025-03-27 11:17:40 -07:00

num_zero_p.yml

…

numeric_methods.yml

…

object_allocate.yml

add allocation benchmark

2024-04-15 11:29:48 -07:00

object_id.yml

Lazily create objspace->id_to_obj_tbl

2025-04-15 07:57:39 +09:00

objspace_dump_all.yml

…

pm_array.yml

…

ractor_const.yml

…

ractor_float_to_s.yml

…

ractor_string_fstring.yml

Add benchmarks for fstring de-duplication

2025-04-18 13:03:54 +09:00

range_bsearch_bignum.yml

Add benchmarks for Range#bsearch

2023-09-26 17:31:10 +09:00

range_bsearch_endpointless.yml

Add benchmarks for Range#bsearch

2023-09-26 17:31:10 +09:00

range_bsearch_fixnum.yml

Add benchmarks for Range#bsearch

2023-09-26 17:31:10 +09:00

range_count.yml

Optimize Range#count by using range_size if possible

2023-10-05 00:19:55 +09:00

range_last.yml

…

range_min.yml

…

range_overlap.yml

[Feature #19839 ] Fix Range#overlap? for empty ranges

2023-09-16 17:24:21 +09:00

range_reverse_each.yml

Add benchmarks for Range#reverse_each

2023-10-12 17:34:49 +09:00

README.md

[DOC] Improve formatting in Markdown files (#12322 )

2024-12-12 17:49:45 -08:00

realpath.yml

Fix a benchmark to avoid leaving a garbage file

2024-02-08 17:08:23 -08:00

regexp_dup.yml

…

regexp_new.yml

…

require_thread.yml

…

require.yml

…

scan.yaml

[ruby/strscan] Add scan and search benchmark

2024-11-27 09:24:06 +09:00

search.yaml

[ruby/strscan] Add scan and search benchmark

2024-11-27 09:24:06 +09:00

securerandom.rb

…

set.yml

Implement Set as a core class

2025-04-26 10:31:11 +09:00

so_ackermann.rb

…

so_array.rb

…

so_binary_trees.rb

…

so_concatenate.rb

…

so_count_words.yml

Clean up temporary file, wc.input [ci skip]

2023-10-24 12:30:10 +09:00

so_exception.rb

…

so_fannkuch.rb

…

so_fasta.rb

…

so_k_nucleotide.yml

…

so_lists.rb

…

so_mandelbrot.rb

…

so_matrix.rb

…

so_meteor_contest.rb

…

so_nbody.rb

…

so_nested_loop.rb

…

so_nsieve_bits.rb

…

so_nsieve.rb

…

so_object.rb

…

so_partial_sums.rb

…

so_pidigits.rb

…

so_random.rb

…

so_reverse_complement.yml

…

so_sieve.rb

…

so_spectralnorm.rb

…

string_capitalize.yml

…

string_casecmp_p.yml

…

string_casecmp.yml

…

string_concat.yml

Rename size_pool -> heap

2024-10-03 21:20:09 +01:00

string_downcase.yml

…

string_dup.yml

Specialize String#dup

2023-11-20 14:33:20 +01:00

string_fstring.yml

Add benchmarks for fstring de-duplication

2025-04-18 13:03:54 +09:00

string_gsub.yml

Elide string allocation when using String#gsub in MAP mode

2025-02-12 10:23:50 +01:00

string_index.rb

…

string_rpartition.yml

…

string_scan_re.rb

…

string_scan_str.rb

…

string_slice.yml

…

string_split.yml

…

string_swapcase.yml

…

string_upcase.yml

…

struct_accessor.yml

Support tracing of struct member accessor methods

2023-12-07 10:29:33 -08:00

time_at.yml

…

time_new.yml

…

time_now.yml

…

time_parse.yml

…

time_strftime.yml

Time#strftime: grow the buffer faster

2024-09-04 14:52:55 +02:00

time_strptime.yml

…

time_subsec.rb

…

time_xmlschema.yml

Refactor Time#xmlschema

2024-09-23 14:29:25 +09:00

vm_array.yml

…

vm_attr_ivar_set.yml

…

vm_attr_ivar.yml

…

vm_backtrace.rb

…

vm_bigarray.yml

…

vm_bighash.yml

…

vm_block_handler.yml

…

vm_block.yml

…

vm_blockparam_call.yml

…

vm_blockparam_pass.yml

…

vm_blockparam_yield.yml

…

vm_blockparam.yml

…

vm_call_bmethod.yml

…

vm_call_kw_and_kw_splat.yml

Fix crash when passing large keyword splat to method accepting keywords and keyword splat

2024-02-11 22:48:38 -08:00

vm_call_method_missing.yml

…

vm_call_send_iseq.yml

…

vm_call_symproc.yml

…

vm_case_classes.yml

…

vm_case_lit.yml

…

vm_case.yml

…

vm_clearmethodcache.rb

…

vm_const.yml

…

vm_cvar.yml

…

vm_defined_method.yml

…

vm_dstr_ary.rb

…

vm_dstr_bool.rb

…

vm_dstr_class_module.rb

…

vm_dstr_digit.rb

…

vm_dstr_int.rb

…

vm_dstr_nil.rb

…

vm_dstr_obj_def.rb

…

vm_dstr_obj.rb

…

vm_dstr_str.rb

…

vm_dstr_sym.rb

…

vm_dstr.yml

…

vm_ensure.yml

…

vm_eval.yml

…

vm_fiber_allocate.yml

…

vm_fiber_count.yml

…

vm_fiber_reuse_gc.yml

…

vm_fiber_reuse.yml

…

vm_fiber_switch.yml

…

vm_float_simple.yml

…

vm_freezeobj.yml

…

vm_freezestring.yml

…

vm_gc_old_full.rb

…

vm_gc_old_immediate.rb

…

vm_gc_old_lazy.rb

…

vm_gc_short_lived.yml

…

vm_gc_short_with_complex_long.yml

…

vm_gc_short_with_long.yml

…

vm_gc_short_with_symbol.yml

…

vm_gc_wb_ary_promoted.yml

…

vm_gc_wb_ary.yml

…

vm_gc_wb_obj_promoted.yml

…

vm_gc_wb_obj.yml

…

vm_gc.rb

…

vm_iclass_super.yml

…

vm_ivar_embedded_obj_init.yml

…

vm_ivar_extended_obj_init.yml

…

vm_ivar_generic_get.yml

…

vm_ivar_generic_set.yml

…

vm_ivar_get_unintialized.yml

…

vm_ivar_get.yml

…

vm_ivar_ic_miss.yml

Update benchmark/vm_ivar_ic_miss.yml

2023-10-24 10:52:06 -07:00

vm_ivar_lazy_set.yml

…

vm_ivar_memoize.yml

vm_getivar: assume the cached shape_id like have a common ancestor

2023-11-03 12:47:43 +01:00

vm_ivar_of_class_set.yml

…

vm_ivar_of_class.yml

…

vm_ivar_set_on_instance.yml

…

vm_ivar_set_subclass.yml

…

vm_ivar_set.yml

…

vm_ivar.yml

…

vm_length.yml

…

vm_lvar_cond_set.yml

…

vm_lvar_init.yml

…

vm_lvar_set.yml

…

vm_method_missing.yml

…

vm_method_splat_calls2.yml

Avoid allocation when passing no keywords to anonymous kwrest methods

2024-02-13 11:05:26 -05:00

vm_method_splat_calls.yml

Add benchmark for recent optimization to avoid implicit allocations

2023-12-07 11:27:55 -08:00

vm_method_with_block.yml

…

vm_method.yml

…

vm_module_ann_const_set.yml

…

vm_module_const_set.yml

…

vm_mutex.yml

…

vm_neq.yml

…

vm_newlambda.yml

…

vm_not.yml

…

vm_poly_method_ov.yml

…

vm_poly_method.yml

…

vm_poly_same_method.yml

…

vm_poly_singleton.yml

…

vm_proc.yml

…

vm_raise1.yml

…

vm_raise2.yml

…

vm_regexp.yml

…

vm_rescue.yml

…

vm_send_cfunc.yml

…

vm_send.yml

…

vm_simplereturn.yml

…

vm_string_literal.yml

…

vm_struct_big_aref_hi.yml

…

vm_struct_big_aref_lo.yml

…

vm_struct_big_aset.yml

…

vm_struct_big_href_hi.yml

…

vm_struct_big_href_lo.yml

…

vm_struct_big_hset.yml

…

vm_struct_small_aref.yml

…

vm_struct_small_aset.yml

…

vm_struct_small_href.yml

…

vm_struct_small_hset.yml

…

vm_super_splat_calls.yml

Add benchmarks for super and zsuper calls of different types

2024-03-01 07:10:25 -08:00

vm_super.yml

…

vm_swap.yml

…

vm_symbol_block_pass.rb

…

vm_thread_alive_check.yml

…

vm_thread_close.rb

…

vm_thread_condvar1.rb

…

vm_thread_condvar2.rb

…

vm_thread_create_join.rb

…

vm_thread_mutex1.rb

…

vm_thread_mutex2.rb

…

vm_thread_mutex3.rb

…

vm_thread_pass_flood.rb

…

vm_thread_pass.rb

…

vm_thread_pipe.rb

…

vm_thread_queue.rb

…

vm_thread_sized_queue2.rb

…

vm_thread_sized_queue3.rb

…

vm_thread_sized_queue4.rb

…

vm_thread_sized_queue.rb

…

vm_thread_sleep.yml

…

vm_unif1.yml

…

vm_yield.yml

…

vm_zsuper_splat_calls.yml

Add benchmarks for super and zsuper calls of different types

2024-03-01 07:10:25 -08:00

vm_zsuper.yml

…

README.md

ruby/benchmark

This directory has benchmark definitions to be run with benchmark_driver.gem.

Normal usage

Execute gem install benchmark_driver and run a command like:

# Run a benchmark script with the ruby in the $PATH
benchmark-driver benchmark/app_fib.rb

# Run benchmark scripts with multiple Ruby executables or options
benchmark-driver benchmark/*.rb -e /path/to/ruby -e '/path/to/ruby --jit'

# Or compare Ruby versions managed by rbenv
benchmark-driver benchmark/*.rb --rbenv '2.5.1;2.6.0-preview2 --jit'

# You can collect many metrics in many ways
benchmark-driver benchmark/*.rb --runner memory --output markdown

# Some are defined with YAML for complex setup or accurate measurement
benchmark-driver benchmark/*.yml

make benchmark

Using make benchmark, make update-benchmark-driver automatically downloads the supported version of benchmark_driver, and it runs benchmarks with the downloaded benchmark_driver.

# Run all benchmarks with the ruby in the $PATH and the built ruby
make benchmark

# Or compare with specific ruby binary
make benchmark COMPARE_RUBY="/path/to/ruby --jit"

# Run vm benchmarks
make benchmark ITEM=vm

# Run some limited benchmarks in ITEM-matched files
make benchmark ITEM=vm OPTS=--filter=block

# You can specify the benchmark by an exact filename instead of using the default argument:
# ARGS = $$(find $(srcdir)/benchmark -maxdepth 1 -name '*$(ITEM)*.yml' -o -name '*$(ITEM)*.rb')
make benchmark ARGS=benchmark/erb_render.yml

# You can specify any option via $OPTS
make benchmark OPTS="--help"

# With `make benchmark`, some special runner plugins are available:
#   -r peak, -r size, -r total, -r utime, -r stime, -r cutime, -r cstime
make benchmark ITEM=vm_bigarray OPTS="-r peak"