mirror/perl - perl - Maple Linux Source

mirror of https://github.com/Perl/perl5.git synced 2026-01-26 08:38:23 +00:00

Author	SHA1	Message	Date
Karl Williamson	d4247bb256	sbox32_hash.h: Add #undef's These case statements need not be visible outside this header. Putting these here avoids cluttering up embed.h, where the same #undef lines would otherwise be generated	2026-01-22 09:55:58 -07:00
Karl Williamson	4b67bbf7f6	embed.pl: Also consider #undef's This code looks to see what conditions must apply before a #define happens. This commit extends that to also look for #undef commands. The end result is that for symbols that are visible to XS code, but aren't supposed to be, embed.h contains an #undef so it isn't visible. But if it already has been #undef'ed, there is no need to do this. But a symbol can be defined and undefined many times, and the conditions for doing an #undef may be different than what the symbol was #defined under. The consequences of not realizing that a symbol gets undefined are simply that we generate an unnecessary #undef. The consequences of failing to generate one when the symbol is defined is that it is visibile when not intended to be so. So, there are various restrictions to try to make sure that we don't err in the latter direction.	2026-01-22 09:55:58 -07:00
Karl Williamson	46bf0cf5e8	embed.pl: Consider symbols visible only to extensions Two commits ago, the code was extended to compare the C preprocessor visibility of a symbol with what the desired visibility of a symbol is. It assumed that everything not constrained to core was visible everywhere. This commit extends that to look for being visible only to extensions. As a result, the large number of symbols added to the override list in that commit are now removed.	2026-01-22 09:55:58 -07:00
Karl Williamson	6731bcd053	embed.pl: Compare cpp visibility with desired This commit creates a function that calculates what C preprocessor constraints there are on the visibility of a #define'd symbol. It then compares that with what the desired visibility is, based as prior commits have determined, and reconciles any discrepancies. It warns if the symbol is supposed to be visible, but cpp makes it not so. It adds it to the list of symbols to undefine if it is visible, but is not supposed to be so. In order to make this commit somewhat smaller with respect to code changes, it assumes anything that is visible to extensions is visible everywhere. This entailed adding a large number of symbols to the list of symbols to not #undef, in order to not change embed.h. The commit after the next one will fix this, and those symbols will be removed from the list in that commit.	2026-01-22 09:55:58 -07:00
Karl Williamson	1fd7f0165b	embed.pl: Handle case of multiple flags for an element Some symbols are #defined in multiple places in the input; based typically on different preprocessor conditionals. We want to use the definition which has the widest visibility. This change showed that USE_STDIO had wrongly been undefined for the past few commits in blead. It is no longer actually ever defined by perl.	2026-01-22 09:55:58 -07:00
Karl Williamson	cea46623dd	op.c: Don't hand-roll is_dup_mode() attributes This commit adds an entry in embed.fnc for S_is_dup_mode, removing the assert and __attribute lines in op.c. The proximal cause for this commit is that I tried compiling with Perl_assert() enabled, which resulted in a lot of compiler warnings because of the __attribute__non_null__ line But I also think it is better to not hand-roll things unless absolutely necessary. Changes someone makes to the general scheme are not likely to be propagated to the hand-rolled items.	2026-01-22 06:58:57 -07:00
Tony Cook	e11f0edb1a	sv_numeq etc: don't do numify overloading with SV_SKIP_OVERLOAD	2026-01-22 13:09:13 +11:00
Tony Cook	747670eba8	add sv_numle(), sv_numlt(), sv_numge(), sv_numgt() APIs These are all needed because overloading may make them inconsistent with <=> overloading.	2026-01-22 13:07:35 +11:00
Tony Cook	01e16d6bee	add sv_numcmp() to the API	2026-01-22 13:07:34 +11:00
Tony Cook	22223d7d51	sv.c: extract the common parts of sv_numeq_flags and sv_numne_flags	2026-01-22 13:07:34 +11:00
Tony Cook	53bf030389	add sv_numne() to the API some refactoring next, since sv_numeq_flags and sv_numne_flags are similar. Used a separate test file since putting every sv_num*() variant in the one file would be ugly Addresses GH #23918 but isn't a direct fix	2026-01-22 13:07:34 +11:00
Paul "LeoNerd" Evans	b5e0af402b	Unify mg_free_struct() between mg.c and sv.c Rather than copy-pasted code in two places, define a helper function to call from both. This also accounts for minor (but currently inconsequential) differences in behaviour between the two copy-pasted locations, that seem to have drifted apart over time.	2026-01-21 23:49:41 +00:00
Karl Williamson	7e9ba10fa8	Document (and export) [IU]_(BITS\|DIG) Fixes #24083. Fixes #24084. In d957e95daa0143d60933d96d6cbfb69eee6d6269 I changed the definitions of IV_DIG and UV_DIG to depend on IV_BITS and UV_BITS respectively, creating the latter in perl.h. These had only visibility to the perl core. But I forgot that the _DIG forms were visible everywhere, so the _BITS forms needed to be as well. This commit merely documents all of them as public API (which should have been the case all along anyway), which automatically causes their visibility to be made everywhere.	2026-01-15 18:15:54 -07:00
Karl Williamson	c1becd7fcc	numeric.c: Make S_output_nonportable() callable from core Instead of being internal to this file, changing its name to Perl_output_nonportable	2026-01-15 09:49:49 -07:00
Karl Williamson	31c9996116	Add class for underscore character to l1_char_class_tab.h l1_char_class_tab.h categorizes characters in the Latin1 range into various classes, mostly into the POSIX classes like [:word:]. Each character has a bit set corresponding to every class it is a member of. These values are placed in a 256-element array and the ordinal value of a character is used as an index into it for quick determination of if a character is a member of a given class. Besides the POSIX classes, there are some classes that make it more convenient and/or faster for our code. For example, there is a class that allows us to quickly know if a given character is one that needs to be preceded by a backslash by quotemeta(). This commit adds a class for the single character underscore '_', and a macro that allows for seeing if a character is either an underscore or a member of any other class, using a single conditional. This means code that checks for if character X is either an underscore or a member of class Y can change to eliminate one conditional. Thus the reason to do this is efficiency. Currently, the only places that do this explicitly are in non-hot code. But I have wip that has hot code that could benefit from this. The only downside of doing this is that it uses up one bit of the 32 available (without shenanigans) for such classes, leaving 4 spare. But before this release, the last time any new bit had been used up was 5.32, so the rate of using these spare up is quite low. This bit could be reclaimed because the IDFIRST class in the Latin1 range is identical to ALPHA plus the underscore, so it could be rewritten as that combination and its bit freed up. However, this would require adding some macros that take two class parameters instead of one. I briefly thought about doing that now, but since we have spare bits and the rate of using them up is low, I didn't think it was worth it at this time. \w in this range is ALPHANUMERIC plus underscore. But its use is more embedded than IDFIRST is, so an attempt to reclaim its bit would require more effort.	2026-01-15 08:47:33 -07:00
Karl Williamson	6b6a828b0a	Inline Perl_grok_(bin\|oct\|hex)	2026-01-12 09:42:39 -07:00
Karl Williamson	d957e95daa	Add definitions for [IU}V_BITS to perl.h replacing the pp.c definition of IV_BITS	2026-01-12 09:18:06 -07:00
Karl Williamson	52a45f6eea	perlapio: USE_STDIO is no longer an option Since 5.16.0.	2026-01-09 12:50:24 -07:00
Karl Williamson	86cc3ae1e0	regen/embed.pl: Update copyright to 2026	2026-01-05 13:30:18 -07:00
Karl Williamson	24c7fb4c21	Convert Perl utf16 to utf8 functions to macros These functions are hereby removed in favor of calling the plain macros that already exist	2025-12-27 21:24:47 -07:00
Karl Williamson	7e1ae0c850	Remove SBOX case statements from external visibility I'm pretty sure there is no use case for these, and very unlikely to have any actual uses.	2025-12-10 08:50:19 -07:00
Karl Williamson	ebbe6ac0f7	Remove a few more macros from being visible to XS code These are a few macros dealing with inversion lists that were never intended to be visible to general XS code, and they actually can't be in use in cpan because the mechanisms to create inversion lists are private to perl.	2025-12-10 08:50:19 -07:00
Karl Williamson	92dcf59a90	Gain control of macro namespace visibility This commit adds the capability to undefine macros that are visible to XS code but shouldn't be. This can be used to stop macro namespace pollution by perl. It works by changing embed.h to have two modes, controlled by a #ifdef that is set by perl.h. perl.h now #includes embed.h twice. The first time works as it always has. The second sets the #ifdef, and causes embed.h to #undef the macros that shouldn't be visible. This call is just before perl.h returns to its includer, so that these macros have come and gone before the file that #included perl.h is affected by them. It comes after the inline headers get included, so they have access to all the symbols that are defined. The list of macros is determined by the visibility given by the apidoc lines documenting them, plus several exception lists that allow a symbol to be visible even though it is not documented as such. In this commit, the main exception list contains everything that is currently visible outside the Perl core, so this should not break any code. But it means that the visibility control is established for future changes to our code base. New macros will not be visible except when documented as needing to be such. We can no longer inadvertently add new names to pollute the user's. I expect that over time, the exception list will become smaller, as we go through it and remove the items that really shouldn't be visible. We can then see via smoking if someone is actually using them, and either decide that these should be visible, or work with the module author for another way to accomplish their needs. (I would hope this would lead to proper documentation of the ones that need to be visible.) There are currently four lists of symbols. One list is for symbols that are used by libc functions, and that Perl may redefine (usually so that code doesn't have to know if it is running on a platform that is lacking the given feature.) The algorithm added here catches most of these and keeps them visible, but there are a few items that currently must be manually listed. A second list is of symbols that the re extension to Perl requires, but no one else needs to. This list is currently empty, as everything initially is in the main exception list. A third list is for items that other Perl extensions require, but no one else needs to. This list is currently empty, as everything initially is in the main exception list. The final list is for items that currently are visible to the whole world. It contains thousands of items. This list should be examined for: 1) Names that shouldn't be so visible; and 2) Names that need to remain visible but should be changed so they are less likely to clash with anything the user might come up with. I have wanted this ability to happen for a long time; and now things have come together to enable it. This allows us to have a clear-cut boundary with CPAN. It means you can add macros that have internal-only use without having to worry about making them likely not to clash with user names. It shows precisely in one place what our names are that are visible to CPAN.	2025-12-10 08:50:19 -07:00
Karl Williamson	838b774823	Move hv_stores() declaration from embed.fnc to hv.h This is required for the next few commits that start automatically creating long Perl_name functions for the elements in embed.fnc that are macros and don't already have them in the source. Only macros can take a parameter that has to be a literal string, so don't fit with the next few commits. This is the only case in embed.fnc like that, so I'm deferring dealing with it for now.	2025-12-10 08:50:19 -07:00
Karl Williamson	32aaa22eec	embed.fnc: Drop Perl_ on do_aexec my_stat my_lstat These macros are not for external use, so don't need a Perl_ prefix	2025-12-10 08:50:19 -07:00
Karl Williamson	4092daf53e	Remove some special EBCDIC code The 'variant_byte_number' function was written to find the byte number in a word of the first byte whose meaning varies depending on if the string it is part of is encoded in UTF-8 or not. On ASCII machines, that is simply when the upper bit is set. On EBCDIC machines, there is no similar pattern, so this function hasn't been compiled on those. A long time ago, I realized that this function could also handle binary data by coercing that binary data into having the form of having that bit set or not depending on the pattern being looked for, and then calling that function. But I actually hadn't realized until now that it was binary data not tied to a character set that was being worked on. This commit rectifies that. A new alias is added for that function that emphasizes that it works on binary data, the function is now compiled for EBCDIC, and the EBCDIC-only code that avoided using it is now removed.	2025-11-01 21:02:37 -06:00
Paul "LeoNerd" Evans	f1a8d7d883	Implement named parameters in signatures (PPC0024) This adds a major new ability to subroutine signatures, allowing callers to pass parameters by name/value pairs rather than by position. sub f ($x, $y, :$alpha, :$beta = undef) { ... } f( 123, 456, alpha => 789 ); Originally specified in https://github.com/Perl/PPCs/blob/main/ppcs/ppc0024-signature-named-parameters.md This feature is currently considered experimental.	2025-10-31 11:31:29 +00:00
Branislav Zahradník	147d5f1b9e	[parser] new_block_statement - deduplicate "a block is a loop that happens once"	2025-10-22 17:23:56 +01:00
Branislav Zahradník	e6a443b294	[parser] package - deduplicate coupled call sequence Function combines call of original `package` and `package_version` when new namespace statement is detected. Instead of required three statements usage now consists of single function call.	2025-10-22 17:23:56 +01:00
Karl Williamson	935cdb76e8	embed.fnc: mv definition of more_sv This was in a #ifdef of being in sv.c, which it is, but since it is public, it needs to be moved out of this. This removes the need for a copy of its prototype to be in sv_inline.h	2025-10-21 18:58:48 -06:00
Karl Williamson	2e142e0d27	regen/embed.pl: Avoid use of hard-coded list The list consists of exactly the functions that have the O flag set in embed.fnc. No need to keep this data twice. The entries are trivially generatable from existing entries as we go along And those generated entries have the added advantage of not using the short name, so potentially less name space pollution	2025-10-21 18:58:48 -06:00
Karl Williamson	bd4c0d1fc2	Add S_parse_ident_no_copy() This new function is for callers that are merely checking if the string being parsed is a legal identifier or not, and arent interested in the normalized version of the identifier that parse_indent() generates. This new function allows callers to not have to think about this buffer; it just wraps plain parse_ident() using a throw-away buffer to hold the returned normalized text. This avoids introducing a bunch of conditionals inside parse_ident.	2025-10-17 12:26:04 -06:00
Karl Williamson	735e7cc211	toke.c: Change parse_ident to take any string Prior to this commit, the string passed to this function had to be pointing to somewhere in PL_bufptr. But this is only because it assumed that the initial position is less than PL_bufend. By passing the upper bound in, that assumption is automatically removed.	2025-10-17 12:26:00 -06:00
Karl Williamson	e4be402477	toke.c: Use flags parameter to S_parse_ident This makes it clearer at each call point what is happening, and prepares for future commits where more flags will be passed to this function.	2025-10-17 12:26:00 -06:00
Karl Williamson	bfbd5f7e35	toke.c: Use flags parameter for S_force_word This makes it clear at each call point what is happening, instead of having to jump to the S_force_word definition to know what 'false, true' vs 'true, false' actually means. And this prepares for future commits.	2025-10-17 12:25:59 -06:00
Karl Williamson	3450d19250	intuit_more: 'use strict' allows much better handling Most code these days runs under 'use strict'. That allows us to resolve ambiguity without resorting to heuristics in far more cases than before. This commit adds a parameter to intuit_more() that gives the context it is being called from. And when that call is to resolve what $foo[...] is supposed to mean, we can look up foo to see if it is an array or a scalar. If the former, the "..." must be a subscript; if a scalar, it must be a charclass. Only if there is both a $foo and an @foo is there ambiguity. If so, we drop down to using the heuristics	2025-10-17 12:09:03 -06:00
Karl Williamson	aa93969e9c	toke.c: Create function to see if an identifier is known This checks first if there is a lexical variable in scope with the given name, and if not, if there is a global	2025-10-17 12:09:03 -06:00
Karl Williamson	9fc9ec2818	Change invlist function names to be legal This continues the process started in #23592 to change names with leading underscores to be legal C. See that p.r. or 4bb3572f7a1c1f3944b7f58b22b6e7a9ef5faba6 for extensive discussion. This commit simply moves the leading underscore to be trailing	2025-10-12 16:56:21 -06:00
Karl Williamson	59bca40fd0	S_scan_ident: Convert parameter to bool All calls to it set it to TRUE or FALSE	2025-10-07 11:48:47 -06:00
Paul "LeoNerd" Evans	215e36f380	Add `cop_*_warning()` API This adds three new API functions: a pair to modify a COP by enabling or disabling a single warning bit within it, and a query function to ask if a given warning is already enabled. This API is provided for CPAN modules to use to modify the set of warnings present in a COP during compile-time. Currently modules need to use the `new_warnings_bitfield()` function, which was recently hidden by 09a0707. That change broke the `Syntax::Keyword::Try` module, as reported in https://github.com/Perl/perl5/issues/23609.	2025-09-23 13:43:47 +01:00
Karl Williamson	c14d142701	Make die() always expand to Perl_die_nocontext() See 03f24b8a082948e5b437394fa33d0af08d7b80b6 for the motivation. This commit changes plain die() to not use a thread context parameter. It and die_nocontext() now behave identically.	2025-09-21 06:55:45 -06:00
Karl Williamson	2cb0034ef5	Unroll valid_utf8_to_uv loop This gives a bit of performance boost in this function that can be called during pattern matching. Here are some cachegrind comparisons with blead: Key: Ir Instruction read Dr Data read Dw Data write COND conditional branches IND indirect branches The numbers represent relative counts per loop iteration, compared to blead at 100.0%. Higher is better: for example, using half as many instructions gives 200%, while using twice as many gives 50%. GCC CLANG valid_utf8_to_uv(0x007f), length is 1 blead hacked blead hacked ------ ----------- ------ ------ Ir 100.00 100.69 Ir 100.00 99.11 Dr 100.00 101.47 Dr 100.00 99.74 Dw 100.00 100.00 Dw 100.00 99.57 COND 100.00 101.20 COND 100.00 100.00 IND 100.00 100.00 IND 100.00 94.12 valid_utf8_to_uv(0x07ff), length is 2 blead hacked blead hacked ------ ----------- ------ ------ Ir 100.00 100.68 Ir 100.00 99.04 Dr 100.00 101.47 Dr 100.00 99.74 Dw 100.00 100.00 Dw 100.00 99.57 COND 100.00 102.40 COND 100.00 101.23 IND 100.00 100.00 IND 100.00 94.12 valid_utf8_to_uv(0xfffd), length is 3 blead hacked blead hacked ------ ----------- ------ ------ Ir 100.00 100.83 Ir 100.00 99.04 Dr 100.00 101.47 Dr 100.00 99.75 Dw 100.00 100.00 Dw 100.00 99.57 COND 100.00 102.99 COND 100.00 101.84 IND 100.00 100.00 IND 100.00 94.12 valid_utf8_to_uv(0xffffd), length is 4 blead hacked blead hacked ------ ----------- ------ ------ Ir 100.00 100.91 Ir 100.00 99.13 Dr 100.00 101.46 Dr 100.00 99.75 Dw 100.00 100.00 Dw 100.00 99.57 COND 100.00 103.59 COND 100.00 102.45 IND 100.00 100.00 IND 100.00 94.12 valid_utf8_to_uv(0x3ffffff), length is 5 blead hacked blead hacked ------ ----------- ------ ------ Ir 100.00 101.28 Ir 100.00 99.29 Dr 100.00 101.46 Dr 100.00 99.75 Dw 100.00 100.00 Dw 100.00 99.57 COND 100.00 104.19 COND 100.00 103.07 IND 100.00 100.00 IND 100.00 94.12 valid_utf8_to_uv(0x7fffffff), length is 6 blead hacked blead hacked ------ ----------- ------ ------ Ir 100.00 89.83 Ir 100.00 88.83 Dr 100.00 95.22 Dr 100.00 92.94 Dw 100.00 92.44 Dw 100.00 91.63 COND 100.00 86.21 COND 100.00 87.11 IND 100.00 100.00 IND 100.00 88.89 Clang gives slightly worse results than gcc. But there is an improvement in both cases for conditionals for two-byte and longer characters.. This shows that the performance is significantly worse for code points that take 6 bytes (or more, which I didn't include) to represent. These are all well outside the Unicode range; hence are very rarely encountered. Performance is improved a bit for the typical cases. The algorithm used could handle 6 and 7 byte characters, but that increases memory usage, and can lead to the compiler choosing to not inline this function. In blead, experiments with clang gave these results Max bytes inlined Instances in the code where not inlined 3 14 4 19 5 19 6 19 7 57 We really need to accomodate any Unicode code point, which is 4 bytes (5 on EBCDIC). But the others we don't care about. Even though 6 bytes doesn't show as being worse than 4, I chose to not include it, because we don't care about performance for these rare non-Unicode code points, and it just might cause non-inlining for different compilers or clang versions.	2025-09-20 10:21:33 -06:00
Karl Williamson	03f24b8a08	Make croak() always expand to Perl_croak_nocontext() Perl almost always opts for saving time over saving space. Hence, we have croak() that saves time at the expense of space, but needs thread context available; and croak_no_context() that doesn't need that, but takes extra time But, when we are about to die, time isn't that important. Even if we are doing eval after eval in a tight loop, the potential time savings of passing the thread context to Perl_croak is insignificant compared to the tear-down that follows. My claim then is that croak() never needed a thread context parameter to save a bit of time just before death. It is an optimization that isn't worth it. And having it do so required the invention of croak_nocontext(), and the extra cognitive load associated with two methods for the same task. This commit changes plain croak() to not use a thread context parameter. It and croak_nocontext() now behave identically. That means that going forward, people will likely choose croak() which requires less typing and occupies fewer columns on the screen, and they won't have to remember which form to use when.	2025-09-12 14:47:53 -06:00
Karl Williamson	8444d54d4b	Move prototype definition of SvPV_helper to embed.fnc It's usually a bad idea to try to work around a limitation in common code by copy-pasting and then modifiying to taste. Fixes/improvements to the common code rarely get propagated to the outlier. I wrote code in 1ef9039bccb that did just this for the prototype definition of SvPV_helper, because the place where it really belongs, embed.fnc, couldn't (and still doesn't) handle function pointers as arguments (patches welcome). I should have at least added a comment to the common code noting the existence of this outlier. It turns out that that limitation can be worked around by declaring a typedef of the pointer, and then using that in embed.fnc. That's what this commit does. This commit removes the final instance of duplicating the work of embed.fnc in the core, except for some in the regex system whose comments say the reason is to avoid making a typedef public. I haven't investigated these further.	2025-09-01 10:50:08 -06:00
Karl Williamson	d8012228a9	Convert _is_utf8_FOO to legal name	2025-09-01 08:12:24 -06:00
Karl Williamson	8de60a95d1	Convert _is_uni_FOO to legal name	2025-09-01 08:12:23 -06:00
Karl Williamson	8b91a7e5f4	Convert _is_utf8_perl_idcont to legal name	2025-09-01 08:12:23 -06:00
Karl Williamson	ffc38ee761	Convert _is_uni_perl_idcont to legal name	2025-09-01 08:12:22 -06:00
Karl Williamson	9f11f6a038	Convert _is_utf8_perl_idstart to legal name	2025-09-01 08:12:21 -06:00
Karl Williamson	eb3ee9300b	Convert _is_uni_perl_idstart to legal name	2025-09-01 08:12:21 -06:00

1 2 3 4 5 ...

2272 Commits