mirror/ruby - ruby - Maple Linux Source

mirror of https://github.com/ruby/ruby.git synced 2026-01-26 12:14:51 +00:00

Author	SHA1	Message	Date
Nobuyoshi Nakada	8ca2f6489b	Revert "Fix rb_interned_str: create strings with BINARY (akak ASCII_8BIT) encoding" This reverts commit 1f3c52dc155fb7fbc42fc8e146924091ba1dfa20.	2026-01-17 13:38:55 +09:00
Jean Boussier	1f3c52dc15	Fix rb_interned_str: create strings with BINARY (akak ASCII_8BIT) encoding [Bug #21842] The documentation always stated as much, and it's consistent with the rb_str_* family of functions.	2026-01-16 22:44:38 +01:00
Nobuyoshi Nakada	b27d9353a7	Use `is_obj_encoding` instead of `is_data_encoding` The argument to `is_data_encoding` is assumed to be `T_DATA`.	2025-12-31 23:56:58 +09:00
John Hawthorn	176e384bcf	Cache filesystem_encindex	2025-12-12 14:03:37 -08:00
John Hawthorn	e7a38b32e0	Store Encoding#name as an attribute When debugging the fstring table, I found "UTF-8" to be the most common interned strings in many benchmarks. We have a fixed, limited number of these strings, so we might as well permanently cache their fstrings.	2025-12-12 13:53:06 -08:00
John Hawthorn	41ee65899a	Always treat encoding as TYPEDDATA Encodings are RTypedData, not the deprecated RData. Although the structures are compatible we should use the correct API.	2025-12-10 09:49:18 -08:00
Koichi Sasada	bc00c4468e	use `SET_SHAREABLE` to adopt strict shareable rule. * (basically) shareable objects only refer shareable objects * (exception) shareable objects can refere unshareable objects but should not leak reference to unshareable objects to Ruby world	2025-10-23 13:08:26 +09:00
John Hawthorn	02d5b8443a	Simplify enc_autoload_body Previously we were looping over the enc_table, but when I added an assertion the only thing that loop was doing is the equivalent of ENC_TO_ENCINDEX(base). However we don't even need the index of base. Instead we should be able to just use the badirectly.	2025-09-19 15:35:15 -07:00
John Hawthorn	f048f77c4a	Extract enc_load_from_base from enc_register_at Previously we would sometimes call enc_register_at several times in order to update the encoding values after the base encoding may have been loaded. This updates enc_register_at to only be used via enc_register, when an actual new encoding at a new index is being registered. Other callers (which in all cases found the index by the name matching) now call enc_load_from_base which is only responsibly for loading the encoding from the base values.	2025-09-19 15:35:15 -07:00
John Hawthorn	7c51ce5ff6	Mark list as frozen and shareable	2025-09-19 15:35:15 -07:00
John Hawthorn	0bb6a8bea4	Avoid racing ruby_encoding_index with base index Previously when we copied base_encoding on top of the encoding, other threads could briefly see the name and ruby_encoding_index of the base encoding.	2025-09-17 16:24:22 -07:00
John Hawthorn	71fa9809a3	Avoid duplicate autoloading of encodings	2025-09-17 16:24:22 -07:00
Luke Gruber	9db54a1a98	Fixes to encoding/transcoding for ractors. Not all ractor-related encoding issues were fixed by 1afc07e815051e2f73493f055f2130cb642ba12a. I found more by running my test-all branch with 3 ractors for each test.	2025-08-22 10:49:44 -07:00
Luke Gruber	7c67060dad	Fix enc_list across ractors Calling rb_ary_replace(copy, orig) can modify orig, which is not safe across ractors because orig is shared (it's the global encoding list). Hoping to address CI failures such as https://ci.rvm.jp/results/trunk-gc-asserts@ruby-sp2-noble-docker/5890058	2025-08-15 13:25:17 -04:00
Luke Gruber	1afc07e815	Allow encodings to be autoloaded through transcoding functions Make sure VM lock is not held when calling `load_transcoder_entry`, as that causes deadlock inside ractors. `String#encode` now works inside ractors, among others. Atomic load the rb_encoding_list Without this, wbcheck would sometimes hit a missing write barrier. Co-authored-by: John Hawthorn <john.hawthorn@shopify.com> Hold VM lock when iterating over global_enc_table.names This st_table can be inserted into at runtime when autoloading encodings. minor optimization when calling Encoding.list	2025-08-12 15:19:02 -07:00
Nobuyoshi Nakada	6179cc0118	[DOC] Fill undocumented documents	2025-08-04 02:23:43 +09:00
BurdetteLamar	54a578e72a	[DOC] Tweaks for String#encoding	2025-07-23 17:09:33 -04:00
Jean Boussier	8541dec8c4	encoding.c: check for autoload before checking index Otherwise we may be checking the index while the encoding is being autoloaded by another ractor.	2025-07-22 12:58:49 +02:00
Jean Boussier	1fb4929ace	Make `rb_enc_autoload_p` atomic Using `encoding->max_enc_len` as a way to check if the encoding has been loaded isn't atomic, because it's not atomically set last. Intead we can use a dedicated atomic value inside the encoding table.	2025-07-10 17:18:20 +02:00
Jean Boussier	482f4cad82	Autoload encodings on the main ractor None of the datastructures involved in the require process are safe to call on a secondary ractor, however when autoloading encodings, we do so from the current ractor. So all sorts of corruption can happen when using an autoloaded encoding for the first time from a secondary ractor.	2025-07-07 12:44:21 +02:00
John Hawthorn	24ac9f11de	Revert "Add locks around accesses/modifications to global encodings table" This reverts commit cf4d37fbc079116453e69cf08ea8007d0e1c73e6.	2025-07-03 21:39:10 -07:00
John Hawthorn	50704fe8e6	Revert "Make get/set default internal/external encoding lock-free" This reverts commit dda5a04f2b4835582dba09ba33797258a61efafe.	2025-07-03 21:39:10 -07:00
Luke Gruber	dda5a04f2b	Make get/set default internal/external encoding lock-free Also, make sure autoloading of encodings is safe across ractors.	2025-07-03 13:33:10 -07:00
Luke Gruber	cf4d37fbc0	Add locks around accesses/modifications to global encodings table This fixes segfaults and errors of the type "Encoding not found" when using encoding-related methods and internal encoding c functions across ractors. Example of a possible segfault in release mode or assertion error in debug mode: ```ruby rs = [] 100.times do rs << Ractor.new do "abc".force_encoding(Encoding.list.shuffle.first) end end while rs.any? r, obj = Ractor.select(*rs) rs.delete(r) end ```	2025-07-03 13:33:10 -07:00
Nobuyoshi Nakada	fc518fe1ff	Delimit the scopes using encoding/symbol tables	2025-05-25 15:22:43 +09:00
Nobuyoshi Nakada	b4417ff665	Add Encoding::UNICODE_VERSION constant	2025-04-23 14:14:36 +09:00
Jean Boussier	fae86a701e	string.c: Directly create strings with the correct encoding While profiling msgpack-ruby I noticed a very substantial amout of time spent in `rb_enc_associate_index`, called by `rb_utf8_str_new`. On that benchmark, `rb_utf8_str_new` is 33% of the total runtime, in big part because it cause GC to trigger often, but even then `5.3%` of the total runtime is spent in `rb_enc_associate_index` called by `rb_utf8_str_new`. After closer inspection, it appears that it's performing a lot of safety check we can assert we don't need, and other extra useless operations, because strings are first created and filled as ASCII-8BIT and then later reassociated to the desired encoding. By directly allocating the string with the right encoding, it allow to skip a lot of duplicated and useless operations. After this change, the time spent in `rb_utf8_str_new` is down to `28.4%` of total runtime, and most of that is GC.	2024-11-13 13:32:32 +01:00
Nobuyoshi Nakada	4b065bbe2b	Move common code to `enc_compatible_latter`	2024-10-05 16:06:54 +09:00
Peter Zhu	176c4bb3c7	Fix corruption of internal encoding string [Bug #20598] Just like [Bug #20595], Encoding#name_list and Encoding#aliases can have their strings corrupted when Encoding.default_internal is set to nil. Co-authored-by: Matthew Valentine-House <matt@eightbitraptor.com>	2024-06-27 14:06:40 -04:00
Peter Zhu	c6a0d03649	Fix corruption of encoding name string [Bug #20595] enc_set_default_encoding will free the C string if the encoding is nil, but the C string can be used by the encoding name string. This will cause the encoding name string to be corrupted. Consider the following code: Encoding.default_internal = Encoding::ASCII_8BIT names = Encoding.default_internal.names p names Encoding.default_internal = nil p names It outputs: ["ASCII-8BIT", "BINARY", "internal"] ["ASCII-8BIT", "BINARY", "\x00\x00\x00\x00\x00\x00\x00\x00"] Co-authored-by: Matthew Valentine-House <matt@eightbitraptor.com>	2024-06-27 09:47:22 -04:00
Jean Boussier	3a7846b1aa	Add a hint of `ASCII-8BIT` being `BINARY` [Feature #18576] Since outright renaming `ASCII-8BIT` is deemed to backward incompatible, the next best thing would be to only change its `#inspect`, particularly in exception messages.	2024-04-18 10:17:26 +02:00
Jean Boussier	d4f3dcf4df	Refactor VM root modules This `st_table` is used to both mark and pin classes defined from the C API. But `vm->mark_object_ary` already does both much more efficiently. Currently a Ruby process starts with 252 rooted classes, which uses `7224B` in an `st_table` or `2016B` in an `RArray`. So a baseline of 5kB saved, but since `mark_object_ary` is preallocated with `1024` slots but only use `405` of them, it's a net `7kB` save. `vm->mark_object_ary` is also being refactored. Prior to this changes, `mark_object_ary` was a regular `RArray`, but since this allows for references to be moved, it was marked a second time from `rb_vm_mark()` to pin these objects. This has the detrimental effect of marking these references on every minors even though it's a mostly append only list. But using a custom TypedData we can save from having to mark all the references on minor GC runs. Addtionally, immediate values are now ignored and not appended to `vm->mark_object_ary` as it's just wasted space.	2024-03-06 15:33:43 -05:00
Peter Zhu	c7ce2f537f	Fix memory leak in setting encodings There is a memory leak in Encoding.default_external= and Encoding.default_internal= because the duplicated name is not freed when overwriting. 10.times do 1_000_000.times do Encoding.default_internal = nil end puts `ps -o rss= -p #{$$}` end Before: 25664 41504 57360 73232 89168 105056 120944 136816 152720 168576 After: 9648 9648 9648 9680 9680 9680 9680 9680 9680 9680	2024-01-03 13:31:43 -05:00
Adam Hess	6816e8efcf	Free everything at shutdown when the RUBY_FREE_ON_SHUTDOWN environment variable is set, manually free memory at shutdown. Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Co-authored-by: Peter Zhu <peter@peterzhu.ca>	2023-12-07 15:52:35 -05:00
Jean Boussier	60c924770d	Mark Encoding as Write Barrier protected It doesn't even have a mark function. It's only about a hundred objects, but not reason to scan them every time.	2023-02-07 11:48:57 +01:00
Benoit Daloze	6abe20e87b	Remove Encoding#replicate	2023-01-11 13:41:41 +01:00
Koichi Sasada	dbf77d420d	surpress warning now `enc_table->list` is not a pointer.	2022-12-16 11:12:37 +09:00
Koichi Sasada	ae19ac5b5b	fixed encoding table This reduces global lock acquiring for reading. https://bugs.ruby-lang.org/issues/18949	2022-12-16 10:04:37 +09:00
Benoit Daloze	6525b6f760	Remove get_actual_encoding() and the dynamic endian detection for dummy UTF-16/UTF-32 * And simplify callers of get_actual_encoding(). * See [Feature #18949]. * See https://github.com/ruby/ruby/pull/6322#issuecomment-1242758474	2022-09-12 14:02:34 +02:00
Benoit Daloze	14bcf69c9c	Deprecate Encoding#replicate * See [Feature #18949].	2022-09-10 19:02:15 +02:00
Takashi Kokubun	5b21e94beb	Expand tabs [ci skip] [Misc #18891]	2022-07-21 09:42:04 -07:00
Jean Boussier	d084585f01	Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BIT Otherwise it's way too easy to confuse it with US_ASCII.	2022-07-19 08:48:56 +02:00
Burdette Lamar	81741690a0	[DOC] Main doc for encodings moved from encoding.c to doc/encodings.rdoc (#5748 ) Main doc for encodings moved from encoding.c to doc/encodings.rdoc	2022-04-01 20:41:04 -05:00
Nobuyoshi Nakada	7459a32af3	suppress warnings for probable NULL dererefences	2021-10-24 19:24:50 +09:00
卜部昌平	5112a54846	include/ruby/encoding.h: convert macros into inline functions Less macros == huge win.	2021-10-05 14:18:23 +09:00
Jeremy Evans	3f7da458a7	Make encoding loading not issue warning Instead of relying on setting an unsetting ruby_verbose, which is not thread-safe, restructure require_internal and load_lock to accept a warn argument for whether to warn, and add rb_require_internal_silent to require without warnings. Use rb_require_internal_silent when loading encoding. Note this does not modify ruby_debug and errinfo handling, those remain thread-unsafe. Also silent requires when loading transcoders.	2021-10-02 05:51:29 -09:00
S-H-GAMELINKS	18031f4102	Add rb_encoding_check function	2021-08-22 10:39:14 +09:00
S.H	378e8cdad6	Using RBOOL macro	2021-08-02 12:06:44 +09:00
Jean Boussier	7e8a9af9db	rb_enc_interned_str: handle autoloaded encodings If called with an autoloaded encoding that was not yet initialized, `rb_enc_interned_str` would crash with a NULL pointer exception. See: https://github.com/ruby/ruby/pull/4119#issuecomment-800189841	2021-03-22 21:37:48 +09:00
Koichi Sasada	2a3324fcd2	No sync on ASCII/US_ASCCII/UTF-8 rb_enc_from_index(index) doesn't need locking if index specify ASCII/US_ASCCII/UTF-8. rb_enc_from_index() is called frequently so it has impact. user system total real r_parallel/miniruby 174 0.000209 0.000000 5.559872 ( 1.811501) r_parallel/master_mini 175 0.000238 0.000000 12.664707 ( 3.523641) (repeat x1000 `s.split(/,/)` where s = '0,,' * 1000)	2020-12-17 03:44:23 +09:00

1 2 3 4 5 ...

428 Commits