Hiroshi SHIBATA
93df966848
Mark development version for unreleased gems
2025-12-26 11:00:51 +09:00
Sutou Kouhei
70c7f3ad77
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/747a3b5def
2025-12-26 11:00:51 +09:00
Hiroshi SHIBATA
df18f3baa6
Bundle strscan-3.1.6
2025-12-17 15:47:43 +09:00
Nobuyoshi Nakada
ae7415c27e
[ruby/strscan] [DOC] no doc for internal methods
...
https://github.com/ruby/strscan/commit/5614095d9c
2025-11-05 10:07:20 +00:00
Nobuyoshi Nakada
439ca0432e
[ruby/strscan] Deprecate constant Id
...
`$Id$` is for RCS, CVS, and SVN; no information with GIT.
https://github.com/ruby/strscan/commit/9e3db14fa2
2025-11-05 16:17:44 +09:00
Nobuyoshi Nakada
27b1500e70
[ruby/strscan] [DOC] Add document of StringScanner::Error
...
https://github.com/ruby/strscan/commit/16ec901356
2025-11-05 07:06:57 +00:00
Nobuyoshi Nakada
f8e9bccd03
[ruby/strscan] Deprecate undocumented toplevel constant ScanError
...
https://github.com/ruby/strscan/commit/b4ddc3a2a6
2025-11-05 05:13:20 +00:00
Nobuyoshi Nakada
c85ef2ca9c
[ruby/strscan] ISO C90 forbids mixed declarations and code
...
Cannot use C99 syntax, as far as supporting Ruby 2.6 and earlier.
https://github.com/ruby/strscan/commit/f6d178fda5
2025-11-05 04:54:56 +00:00
Nobuyoshi Nakada
13f1b432d2
[ruby/strscan] [DOC] Remove the statement rest? is obsolete
...
`eos?` is opposite, cannot be used instead of `rest?`.
https://github.com/ruby/strscan/commit/bee8cc547b
2025-11-05 02:03:10 +00:00
Takashi Kokubun
fffa4671a4
[ruby/strscan] Resurrect a method that has not been obsolete
...
(https://github.com/ruby/strscan/pull/169 )
Partially revert https://github.com/ruby/strscan/pull/168 because
strscan_rest_p did not have `rb_warning("StringScanner#rest? is
obsolete")`.
It is actively used by the latest tzinfo.gem, and we shouldn't remove it
without deprecating it.
https://github.com/ruby/strscan/commit/f3fdf21189
2025-11-04 21:34:04 +00:00
Nobuyoshi Nakada
e9e5a4a454
[ruby/strscan] Remove methods have been obsolete over two decades
...
https://github.com/ruby/strscan/commit/1387def685
2025-11-04 19:41:21 +00:00
Nobuyoshi Nakada
36cd985db4
[ruby/strscan] Remove no longer used variable
...
Since https://github.com/ruby/strscan/commit/92961cde2b42 .
https://github.com/ruby/strscan/commit/911f9c682a
2025-11-04 18:29:37 +00:00
Hiroshi SHIBATA
9f00044d0f
Bump up strscan version to 3.1.6.dev
2025-06-06 11:30:14 +09:00
Daniel Colson
deb70925a2
[ruby/strscan] Implement Write Barrier
...
(https://github.com/ruby/strscan/pull/156 )
StringScanner holds the string being scanned, and a regex for methods
like `match?`. Triggering the write barrier for those allows us to mark
this as WB protected.
https://github.com/ruby/strscan/commit/32fec70407
2025-06-06 11:29:42 +09:00
Hiroshi SHIBATA
0f06626915
Bump up strscan version to 3.1.5.dev
2025-05-02 10:11:09 +09:00
Sutou Kouhei
af6d6b64ea
[ruby/strscan] named_captures: fix incompatibility with
...
MatchData#named_captures
(https://github.com/ruby/strscan/pull/146 )
Fix https://github.com/ruby/strscan/pull/145
`MatchData#named_captures` use the last matched value for each name.
Reported by Linus Sellberg. Thanks!!!
https://github.com/ruby/strscan/commit/a6086ea322
2025-05-02 09:52:38 +09:00
Hiroshi SHIBATA
4634a0042e
Mark development version for unreleased gems
2025-04-22 11:27:24 +09:00
Sutou Kouhei
067fc410fc
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/8ff80150c4
2025-04-22 11:27:24 +09:00
Sutou Kouhei
ad8cb532d5
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/7b1eb1e4ed
2025-04-14 16:18:48 +09:00
Jean byroot Boussier
0db87b8943
[ruby/strscan] Allow parsing strings larger than 2GiB
...
(https://github.com/ruby/strscan/pull/147 )
For a reason unknown, even though `pos` is stored as a `long`, the
`#pos` and `#pos=` treat it as an `int`, which prevent seeking into
strings larger than 2GiB.
https://github.com/ruby/strscan/commit/b76368416e
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
2025-04-14 16:18:47 +09:00
NAITOH Jun
018943ba05
[ruby/strscan] Fix a bug that inconsistency of IndexError vs nil for
...
unknown capture group
(https://github.com/ruby/strscan/pull/143 )
Fix https://github.com/ruby/strscan/pull/139
Reported by Benoit Daloze. Thanks!!!
https://github.com/ruby/strscan/commit/bc8a0d2623
2025-02-25 15:36:46 +09:00
NAITOH Jun
36ab247e4d
[ruby/strscan] Fix a bug that scanning methods that don't use Regexp
...
don't clear named capture groups
(https://github.com/ruby/strscan/pull/142 )
Fix https://github.com/ruby/strscan/pull/135
https://github.com/ruby/strscan/commit/b957443e20
2025-02-25 15:36:46 +09:00
Jean Boussier
bf6c106d54
[ruby/strscan] scan_integer(base: 16) ignore x suffix if not
...
followed by hexadecimal
(https://github.com/ruby/strscan/pull/141 )
Fix: https://github.com/ruby/strscan/issues/140
`0x<EOF>`, `0xZZZ` should be parsed as `0` instead of not matching at
all.
https://github.com/ruby/strscan/commit/c4e4795ed2
2025-02-21 11:31:36 +09:00
NAITOH Jun
eee9bd1aa4
[ruby/strscan] Fix a bug that scan_until behaves differently with
...
Regexp and String patterns
(https://github.com/ruby/strscan/pull/138 )
Fix https://github.com/ruby/strscan/pull/131
https://github.com/ruby/strscan/commit/e1cec2e726
2025-02-17 11:04:32 +09:00
Hiroshi SHIBATA
b4ed6db096
Removed trailing spaces
2025-02-14 16:16:55 +09:00
Jean Boussier
51004c3641
[ruby/strscan] Fix a bug that scan_integer doesn't update matched
...
data
(https://github.com/ruby/strscan/pull/133 )
Fix https://github.com/ruby/strscan/pull/130
Reported by Andrii Konchyn. Thanks!!!
https://github.com/ruby/strscan/commit/4e5f17f87a
2025-02-14 16:13:26 +09:00
Alexander Momchilov
41e24c2f3e
[ruby/strscan] [DOC] Add syntax highlighting to MarkDown code blocks
...
(https://github.com/ruby/strscan/pull/126 )
Split off from https://github.com/ruby/ruby/pull/12322
https://github.com/ruby/strscan/commit/9bee37e0f5
2024-12-16 10:10:34 +09:00
Sutou Kouhei
219c2eee5a
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/fd140b8582
2024-12-16 10:10:34 +09:00
Hiroshi SHIBATA
78ca87f8a8
Lock released version of strscan-3.1.1
2024-12-12 16:14:25 +09:00
Jean Boussier
636d57bd1c
[ruby/strscan] Micro optimize encoding checks
...
(https://github.com/ruby/strscan/pull/117 )
Profiling shows a lot of time spent in various encoding check functions.
I'm working on optimizing them on the Ruby side, but if we assume most
strings are one of the simple 3 encodings, we can skip a lot of
overhead.
```ruby
require 'strscan'
require 'benchmark/ips'
source = 10_000.times.map { rand(9999999).to_s }.join(",").force_encoding(Encoding::UTF_8).freeze
def scan_to_i(source)
scanner = StringScanner.new(source)
while number = scanner.scan(/\d+/)
number.to_i
scanner.skip(",")
end
end
def scan_integer(source)
scanner = StringScanner.new(source)
while scanner.scan_integer
scanner.skip(",")
end
end
Benchmark.ips do |x|
x.report("scan.to_i") { scan_to_i(source) }
x.report("scan_integer") { scan_integer(source) }
x.compare!
end
```
Before:
```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec ) +YJIT [arm64-darwin23]
Warming up --------------------------------------
scan.to_i 93.000 i/100ms
scan_integer 232.000 i/100ms
Calculating -------------------------------------
scan.to_i 933.191 (± 0.2%) i/s (1.07 ms/i) - 4.743k in 5.082597s
scan_integer 2.326k (± 0.8%) i/s (429.99 μs/i) - 11.832k in 5.087974s
Comparison:
scan_integer: 2325.6 i/s
scan.to_i: 933.2 i/s - 2.49x slower
```
After:
```
ruby 3.3.4 (2024-07-09 revision https://github.com/ruby/strscan/commit/be1089c8ec ) +YJIT [arm64-darwin23]
Warming up --------------------------------------
scan.to_i 96.000 i/100ms
scan_integer 274.000 i/100ms
Calculating -------------------------------------
scan.to_i 969.489 (± 0.2%) i/s (1.03 ms/i) - 4.896k in 5.050114s
scan_integer 2.756k (± 0.1%) i/s (362.88 μs/i) - 13.974k in 5.070837s
Comparison:
scan_integer: 2755.8 i/s
scan.to_i: 969.5 i/s - 2.84x slower
```
https://github.com/ruby/strscan/commit/c02b1ce684
2024-12-02 10:50:34 +09:00
Jean Boussier
79cc3d26ed
StringScanner#scan_integer support base 16 integers ( #116 )
...
Followup: https://github.com/ruby/strscan/pull/115
`scan_integer` is now implemented in Ruby as to efficiently handle
keyword arguments without allocating a Hash. Given the goal of
`scan_integer` is to more effciently parse integers without having to
allocate an intermediary object, using `rb_scan_args` would defeat the
purpose.
Additionally, the C implementation now uses `rb_isdigit` and
`rb_isxdigit`, because on Windows `isdigit` is locale dependent.
2024-12-02 10:50:34 +09:00
Jean Boussier
d5de1a5789
[ruby/strscan] Implement #scan_integer to efficiently parse Integer
...
(https://github.com/ruby/strscan/pull/115 )
Fix: https://github.com/ruby/strscan/issues/113
This allows to directly parse an Integer from a String without needing
to first allocate a sub string.
Notes:
The implementation is limited by design, it's meant as a first step,
only the most straightforward, based 10 integers are supported.
https://github.com/ruby/strscan/commit/6a3c74b4c8
2024-11-27 09:24:07 +09:00
NAITOH Jun
e73f35ddaf
[ruby/strscan] [CRuby] Optimize strscan_do_scan(): Remove
...
unnecessary use of `rb_enc_get()`
(https://github.com/ruby/strscan/pull/108 )
- before: #106
## Why?
In `rb_strseq_index()`, the result of `rb_enc_check()` is used.
-
6c7209cd37/string.c (L4335-L4368)
> enc = rb_enc_check(str, sub);
> return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
offset, enc);
-
6c7209cd37/string.c (L4309-L4318)
```C
strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
{
const char *search_start = str_ptr;
long pos, search_len = str_len - offset;
for (;;) {
const char *t;
pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
```
## Benchmark
It shows String as a pattern is 1.24x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 9.225M i/s - 9.328M times in 1.011068s (108.40ns/i)
regexp_var 9.327M i/s - 9.413M times in 1.009214s (107.21ns/i)
string 9.200M i/s - 9.355M times in 1.016840s (108.70ns/i)
string_var 11.249M i/s - 11.255M times in 1.000578s (88.90ns/i)
Calculating -------------------------------------
regexp 9.565M i/s - 27.676M times in 2.893476s (104.55ns/i)
regexp_var 10.111M i/s - 27.982M times in 2.767496s (98.90ns/i)
string 10.060M i/s - 27.600M times in 2.743465s (99.40ns/i)
string_var 12.519M i/s - 33.746M times in 2.695615s (79.88ns/i)
Comparison:
string_var: 12518707.2 i/s
regexp_var: 10111089.6 i/s - 1.24x slower
string: 10060144.4 i/s - 1.24x slower
regexp: 9565124.4 i/s - 1.31x slower
```
https://github.com/ruby/strscan/commit/ff2d7afa19
2024-10-26 18:44:15 +09:00
Nobuyoshi Nakada
d6046bccb7
[ruby/strscan] Use C90 as far as supporting 2.6 or earlier
...
(https://github.com/ruby/strscan/pull/101 )
https://github.com/ruby/strscan/commit/d31274f41b
2024-10-26 18:44:15 +09:00
NAITOH Jun
d81b0588bb
[ruby/strscan] Accept String as a pattern at non head
...
(https://github.com/ruby/strscan/pull/106 )
It supports non-head match cases such as StringScanner#scan_until.
If we use a String as a pattern, we can improve match performance.
Here is a result of the including benchmark.
## CRuby
It shows String as a pattern is 1.18x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 9.403M i/s - 9.548M times in 1.015459s (106.35ns/i)
regexp_var 9.162M i/s - 9.248M times in 1.009479s (109.15ns/i)
string 8.966M i/s - 9.274M times in 1.034343s (111.54ns/i)
string_var 11.051M i/s - 11.190M times in 1.012538s (90.49ns/i)
Calculating -------------------------------------
regexp 10.319M i/s - 28.209M times in 2.733707s (96.91ns/i)
regexp_var 10.032M i/s - 27.485M times in 2.739807s (99.68ns/i)
string 9.681M i/s - 26.897M times in 2.778397s (103.30ns/i)
string_var 12.162M i/s - 33.154M times in 2.726046s (82.22ns/i)
Comparison:
string_var: 12161920.6 i/s
regexp: 10318949.7 i/s - 1.18x slower
regexp_var: 10031617.6 i/s - 1.21x slower
string: 9680843.7 i/s - 1.26x slower
```
## JRuby
It shows String as a pattern is 2.11x faster than Regexp as a pattern.
```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
regexp 7.591M i/s - 7.544M times in 0.993780s (131.74ns/i)
regexp_var 6.143M i/s - 6.125M times in 0.997038s (162.77ns/i)
string 14.135M i/s - 14.079M times in 0.996067s (70.75ns/i)
string_var 14.079M i/s - 14.057M times in 0.998420s (71.03ns/i)
Calculating -------------------------------------
regexp 9.409M i/s - 22.773M times in 2.420268s (106.28ns/i)
regexp_var 10.116M i/s - 18.430M times in 1.821820s (98.85ns/i)
string 21.389M i/s - 42.404M times in 1.982519s (46.75ns/i)
string_var 20.897M i/s - 42.237M times in 2.021187s (47.85ns/i)
Comparison:
string: 21389191.1 i/s
string_var: 20897327.5 i/s - 1.02x slower
regexp_var: 10116464.7 i/s - 2.11x slower
regexp: 9409222.3 i/s - 2.27x slower
```
See:
be7815ec02/core/src/main/java/org/jruby/util/StringSupport.java (L1706-L1736)
---------
https://github.com/ruby/strscan/commit/f9d96c446a
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-09-17 15:12:25 +09:00
Hiroshi SHIBATA
32f134bb85
Added pre-release suffix for development version of default gems
...
https://github.com/ruby/stringio/issues/81
2024-08-31 14:22:17 +09:00
Hiroshi SHIBATA
3eda59e975
Sync strscan HEAD again.
...
https://github.com/ruby/strscan/pull/99 split document with multi-byte
chars.
2024-06-04 12:40:08 +09:00
Hiroshi SHIBATA
78bfde5d9f
Revert "[ruby/strscan] Doc for StringScanner"
...
This reverts commit 974ed1408c516d1e8f992f0b304e2de6f8bd5c1f.
2024-05-30 21:13:10 +09:00
Hiroshi SHIBATA
d70b0da482
Revert "Fix reference path for strscan documentation"
...
This reverts commit 1fa93fb9488a32018101689fd727965fd5874eb5.
2024-05-30 21:13:01 +09:00
Hiroshi SHIBATA
1fa93fb948
Fix reference path for strscan documentation
2024-05-30 14:29:25 +09:00
Burdette Lamar
974ed1408c
[ruby/strscan] Doc for StringScanner
...
(https://github.com/ruby/strscan/pull/96 )
#peek_byte and #scan_byte not updated (not available in my repo --
sorry).
---------
https://github.com/ruby/strscan/commit/0123da7352
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
2024-05-30 12:34:18 +09:00
Aaron Patterson
164e464b04
[ruby/strscan] Add a method for peeking and reading bytes as
...
integers
(https://github.com/ruby/strscan/pull/89 )
This commit adds `scan_byte` and `peek_byte`. `scan_byte` will scan the
current byte, return it as an integer, and advance the cursor.
`peek_byte` will return the current byte as an integer without advancing
the cursor.
Currently `StringScanner#get_byte` returns a string, but I want to get
the current byte without allocating a string. I think this will help
with writing high performance lexers.
---------
https://github.com/ruby/strscan/commit/873aba2e5d
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
2024-02-26 15:54:54 +09:00
Sutou Kouhei
ce2618c628
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/ba338b882c
2024-02-08 14:43:56 +09:00
Sutou Kouhei
5afae77ce9
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/842845af1f
2024-02-08 14:43:56 +09:00
Sutou Kouhei
ac636f5709
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/d6f97ec102
2024-01-19 10:49:12 +09:00
NAITOH Jun
338eb0065b
[ruby/strscan] StringScanner#captures: Return nil not "" for
...
unmached capture
(https://github.com/ruby/strscan/pull/72 )
fix https://github.com/ruby/strscan/issues/70
If there is no substring matching the group (s[3]), the behavior is
different.
If there is no substring matching the group, the corresponding element
(s[3]) should be nil.
```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/ #=> "foobar"
s[0] #=> "foobar"
s[1] #=> "foo"
s[2] #=> "bar"
s[3] #=> nil
s.captures #=> ["foo", "bar", ""]
s.captures.compact #=> ["foo", "bar", ""]
```
```
s = StringScanner.new('foobarbaz') #=> #<StringScanner 0/9 @ "fooba...">
s.scan /(foo)(bar)(BAZ)?/ #=> "foobar"
s[0] #=> "foobar"
s[1] #=> "foo"
s[2] #=> "bar"
s[3] #=> nil
s.captures #=> ["foo", "bar", nil]
s.captures.compact #=> ["foo", "bar"]
```
https://docs.ruby-lang.org/ja/latest/method/MatchData/i/captures.html
```
/(foo)(bar)(BAZ)?/ =~ "foobarbaz" #=> 0
$~.to_a #=> ["foobar", "foo", "bar", nil]
$~.captures #=> ["foo", "bar", nil]
$~.captures.compact #=> ["foo", "bar"]
```
* StringScanner#captures is not yet documented.
https://docs.ruby-lang.org/ja/latest/class/StringScanner.html
https://github.com/ruby/strscan/commit/1fbfdd3c6f
2024-01-14 22:27:24 +09:00
Hiroshi SHIBATA
f54369830f
Revert "Rollback to released version numbers of stringio and strscan"
...
This reverts commit 6a79e53823e328281b9e9eee53cd141af28f8548.
2023-12-25 21:12:49 +09:00
Hiroshi SHIBATA
6a79e53823
Rollback to released version numbers of stringio and strscan
2023-12-16 12:00:59 +08:00
Sutou Kouhei
ce8301084f
[ruby/strscan] Bump version
...
https://github.com/ruby/strscan/commit/1b3393be05
2023-11-08 09:26:58 +09:00
Peter Zhu
91e13a5207
[ruby/strscan] Fix indentation in strscan.c
...
[ci skip]
2023-07-28 10:12:52 -04:00