libexpat

mirror of https://github.com/libexpat/libexpat.git synced 2026-01-29 19:04:19 +00:00

Author	SHA1	Message	Date
Sebastian Pipping	94cceb228c	tests: Address clang-tidy warning bugprone-narrowing-conversions The symptom was: > [..]/expat/tests/alloc_tests.c:326:26: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 326 \| g_allocation_count = i; > \| ^ > [..]/expat/tests/alloc_tests.c:437:26: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 437 \| g_allocation_count = i; > \| ^ > [..]/expat/tests/basic_tests.c:415:47: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 415 \| if (_XML_Parse_SINGLE_BYTES(g_parser, text, first_chunk_bytes, XML_FALSE) > \| ^ > [..]/expat/tests/basic_tests.c:421:34: error: narrowing conversion from 'unsigned long' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 421 \| sizeof(text) - first_chunk_bytes - 1, > \| ^ > [..]/expat/tests/handlers.c:92:37: error: narrowing conversion from 'XML_Size' (aka 'unsigned long') to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 92 \| StructData_AddItem(storage, name, XML_GetCurrentColumnNumber(g_parser), > \| ^ > [..]/expat/tests/handlers.c:93:22: error: narrowing conversion from 'XML_Size' (aka 'unsigned long') to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 93 \| XML_GetCurrentLineNumber(g_parser), STRUCT_START_TAG); > \| ^ > [..]/expat/tests/handlers.c:99:37: error: narrowing conversion from 'XML_Size' (aka 'unsigned long') to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 99 \| StructData_AddItem(storage, name, XML_GetCurrentColumnNumber(g_parser), > \| ^ > [..]/expat/tests/handlers.c💯22: error: narrowing conversion from 'XML_Size' (aka 'unsigned long') to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 100 \| XML_GetCurrentLineNumber(g_parser), STRUCT_END_TAG); > \| ^ > [..]/expat/tests/handlers.c:1279:26: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 1279 \| g_allocation_count = i; > \| ^ > [..]/expat/tests/misc_tests.c:73:26: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 73 \| g_allocation_count = i; > \| ^ > [..]/expat/tests/misc_tests.c:93:26: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 93 \| g_allocation_count = i; > \| ^ > [..]/expat/tests/nsalloc_tests.c:86:26: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 86 \| g_allocation_count = i; > \| ^ > [..]/expat/tests/nsalloc_tests.c:526:28: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors] > 526 \| g_reallocation_count = i; > \| ^	2025-03-30 18:52:09 +02:00
Sebastian Pipping	bbd413a808	Sync file headers	2025-03-13 14:01:31 +01:00
Berkay Eren Ürün	eb53088cd5	tests: Cover missing elements after internal entity reference	2025-03-13 14:01:31 +01:00
Sebastian Pipping	d0e04b3b75	tests: Cover free'ing of all three open entity lists by XML_ParserReset .. without taking away coverage from XML_ParserFree	2025-03-13 14:01:31 +01:00
Berkay Eren Ürün	e15cdd6c1f	Add test case for nested entities with delayed interpretation	2025-03-13 14:01:31 +01:00
Berkay Eren Ürün	00fda3c598	Add test case for unbounded entity recursion in attributes	2025-03-13 14:01:31 +01:00
Sebastian Pipping	0ab49eafae	tests: Cover suspend inside nested entites in internalEntityProcessor more	2025-03-13 14:01:31 +01:00
Sebastian Pipping	dae1cd5a1b	tests: Add support for "CharData *storage" to ExtHdlrData .. for an upcoming test in a follow-up commit	2025-03-13 14:01:31 +01:00
Berkay Eren Ürün	ad9e140612	Add test cases for unbounded entity recursion	2025-03-13 14:01:31 +01:00
Sebastian Pipping	de96ce5b5d	tests: Cover doContent with startTagLevel >=2	2024-12-11 12:29:32 +01:00
Sebastian Pipping	3d5fdbb44e	tests: Cover indirect entity recursion	2024-11-26 18:29:36 +01:00
Sebastian Pipping	c33ff57a51	tests: Save runtime on tests that do not use the single-bytes approach Related tests are: - test_reset_in_entity - test_resume_entity_with_syntax_error - test_suspend_parser_between_cdata_calls	2024-11-23 12:50:55 +01:00
Sebastian Pipping	35ec6e65d6	tests: Fix multiple places that combined single-bytes with suspension Please see commit 60dffa148c3ce26799cb933afdb0dc3581ad2098 ("tests: Use normal XML_Parse in test_suspend_resume_internal_entity") for more details on the related issue. Related tests are: - test_repeated_stop_parser_between_char_data_calls - test_reset_in_entity - test_resume_entity_with_syntax_error - test_suspend_parser_between_cdata_calls - test_suspend_parser_between_char_data_calls - test_suspend_xdecl In reaction to a finding by Berkay Eren Ürün.	2024-11-23 12:50:55 +01:00
Sebastian Pipping	f001f38aed	tests: Migrate test_attributes off of g_parser	2024-09-21 21:57:31 +02:00
Sebastian Pipping	1968906b22	tests: Stop counting_start_element_handler from using g_parser Use of g_parser means risk of cross-test interference and hence risk of hard-to-catch bugs in the test suite, and so we want to get rid of g_parser altogether midterm.	2024-09-21 21:57:31 +02:00
Sebastian Pipping	c12f039b80	tests: Cover "len < 0" for both XML_Parse and XML_ParseBuffer	2024-08-26 22:25:19 +02:00
Sebastian Pipping	565ab44a42	tests: Cover rejection of direct parameter entity recursion	2024-03-06 22:34:26 +01:00
Snild Dolkow	dc8499f295	tests: Replace clock counting with scanned bytes in linear-time test This removes the dependency on CLOCKS_PER_SEC that prevented this test from running properly on some platforms, as well as the inherent flakiness of time measurements. Since later commits have introduced g_bytesScanned (and before that, g_parseAttempts), we can use that value as a proxy for parse time instead of clock().	2024-02-13 14:05:44 +01:00
Snild Dolkow	fe0177cd3f	tests: Replace g_parseAttempts with g_bytesScanned This was used to estimate the number of scanned bytes. Just exposing that number directly will be more precise.	2024-02-13 13:57:35 +01:00
Sebastian Pipping	2a10e173ab	Sync file headers	2024-02-06 14:13:00 +01:00
Sebastian Pipping	aba268e2c0	tests/basic_tests.c: Fix CLOCKS_PER_SEC guard for arm64 FreeBSD reality CLOCKS_PER_SEC turned out to be as small as 128 in practice on machine cfarm240.cfarm.net .	2024-02-02 18:11:12 +01:00
Snild Dolkow	8f8aaf5c8e	tests: Check heuristic bypass with varying buffer fill sizes The bypass works on the assumption that the application uses a consistent fill size. Let's make some assertions about what should happen when the application doesn't do that -- most importantly, that parsing does happen eventually, and that the number of scanned bytes doesn't explode.	2024-01-29 19:59:18 +01:00
Snild Dolkow	182bbc350e	tests: Make it clear to clang-tidy that assert_true may not return The key is to have __attribute__((noreturn)) somewhere that clang-tidy can see it. In this case, this is the _fail() function, which is conditionally called from the assert_true() macro. This will ensure that clang-tidy doesn't complain about NULL values that we've asserted against in tests.	2024-01-29 19:57:54 +01:00
Snild Dolkow	3d8141d26a	Bypass partial token heuristic when nearing full buffer ...instead of only when approaching the maximum buffer size INT/2+1. We'd like to give applications a chance to finish parsing a large token before buffer reallocation, in case the reallocation fails. By bypassing the reparse deferral heuristic when getting close to the filling the buffer, we give them this chance -- if the whole token is present in the buffer, it will be parsed at that time. This may come at the cost of some extra reparse attempts. For a token of n bytes, these extra parses cause us to scan over a maximum of 2n bytes (... + n/8 + n/4 + n/2 + n). Therefore, parsing of big tokens remains O(n) in regard how many bytes we scan in attempts to parse. The cost in reality is lower than that, since the reparses that happen due to the bypass will affect m_partialTokenBytesBefore, delaying the next ratio-based reparse. Furthermore, only the first token that "breaks through" a buffer ceiling takes that extra reparse attempt; subsequent large tokens will only bypass the heuristic if they manage to hit the new buffer ceiling. Note that this cost analysis depends on the assumption that Expat grows its buffer by doubling it (or, more generally, grows it exponentially). If this changes, the cost of this bypass may increase. Hopefully, this would be caught by test_big_tokens_take_linear_time or the new test. The bypass logic assumes that the application uses a consistent fill. If the app increases its fill size, it may miss the bypass (and the normal heuristic will apply). If the app decreases its fill size, the bypass may be hit multiple times for the same buffer size. The very worst case would be to always fill half of the remaining buffer space, in which case parsing of a large n-byte token becomes O(n log n). As an added bonus, the new test case should be faster than the old one, since it doesn't have to go all the way to 1GiB to check the behavior. Finally, this change necessitated a small modification to two existing tests related to reparse deferral. These tests are testing the deferral enabled setting, and assume that reparsing will not happen for any other reason. By pre-growing the buffer, we make sure that this new deferral does not affect those test cases.	2024-01-29 17:09:36 +01:00
Snild Dolkow	60b7420989	Bypass partial token heuristic when close to maximum buffer size For huge tokens, we may end up in a situation where the partial token parse deferral heuristic demands more bytes than Expat's maximum buffer size (currently ~half of INT_MAX) could fit. INT_MAX/2 is 1024 MiB on most systems. Clearly, a token of 950 MiB could fit in that buffer, but the reparse threshold might be such that callProcessor() will defer it, allowing the app to keep filling the buffer until XML_GetBuffer() eventually returns a memory error. By bypassing the heuristic when we're getting close to the maximum buffer size, it will once again be possible to parse tokens in the size range INT_MAX/2/ratio < size < INT_MAX/2 reliably. We subtract the last buffer fill size as a way to detect that the next XML_GetBuffer() call has a risk of returning a memory error -- assuming that the application is likely to keep using the same (or smaller) fill. We subtract XML_CONTEXT_BYTES because that's the maximum amount of bytes that could remain at the start of the buffer, preceding the partial token. Technically, it could be fewer bytes, but XML_CONTEXT_BYTES is normally small relative to INT_MAX, and is much simpler to use. Co-authored-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:36 +01:00
Snild Dolkow	ad9c01be8e	Make external entity parser inherit partial token heuristic setting The test is essentially a copy of the existing test for the setter, adapted to run on the external parser instead of the original one. Suggested-by: Sebastian Pipping <sebastian@pipping.org> CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:36 +01:00
Snild Dolkow	8ddd8e86aa	Try to parse even when incoming len is zero If the reparse deferral setting has changed, it may be possible to finish a token.	2024-01-29 17:09:36 +01:00
Snild Dolkow	1d3162da8a	Add app setting for enabling/disabling reparse heuristic Suggested-by: Sebastian Pipping <sebastian@pipping.org> CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:36 +01:00
Snild Dolkow	09957b8ced	Allow XML_GetBuffer() with len=0 on a fresh parser len=0 was previously OK if there had previously been a non-zero call. It makes sense to allow an application to work the same way on a newly-created parser, and not have to care if its incoming buffer happens to be 0.	2024-01-29 17:09:36 +01:00
Snild Dolkow	f1eea784d0	tests: Add max_slowdown info in test_big_tokens_take_linear_time Suggested-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:36 +01:00
Snild Dolkow	9fe3672459	tests: Run both with and without partial token heuristic If we always run with the heuristic enabled, it may hide some bugs by grouping up input into bigger parse attempts. CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:36 +01:00
Snild Dolkow	9cdf9b8d77	Skip parsing after repeated partials on the same token When the parse buffer contains the starting bytes of a token but not all of them, we cannot parse the token to completion. We call this a partial token. When this happens, the parse position is reset to the start of the token, and the parse() call returns. The client is then expected to provide more data and call parse() again. In extreme cases, this means that the bytes of a token may be parsed many times: once for every buffer refill required before the full token is present in the buffer. Math: Assume there's a token of T bytes Assume the client fills the buffer in chunks of X bytes We'll try to parse X, 2X, 3X, 4X ... until mX == T (technically >=) That's (m²+m)X/2 = (T²/X+T)/2 bytes parsed (arithmetic progression) While it is alleviated by larger refills, this amounts to O(T²) Expat grows its internal buffer by doubling it when necessary, but has no way to inform the client about how much space is available. Instead, we add a heuristic that skips parsing when we've repeatedly stopped on an incomplete token. Specifically: * Only try to parse if we have a certain amount of data buffered * Every time we stop on an incomplete token, double the threshold * As soon as any token completes, the threshold is reset This means that when we get stuck on an incomplete token, the threshold grows exponentially, effectively making the client perform larger buffer fills, limiting how many times we can end up re-parsing the same bytes. Math: Assume there's a token of T bytes Assume the client fills the buffer in chunks of X bytes We'll try to parse X, 2X, 4X, 8X ... until (2^k)X == T (or larger) That's (2^(k+1)-1)X bytes parsed -- e.g. 15X if T = 8X This is equal to 2T-X, which amounts to O(T) We could've chosen a faster growth rate, e.g. 4 or 8. Those seem to increase performance further, at the cost of further increasing the risk of growing the buffer more than necessary. This can easily be adjusted in the future, if desired. This is all completely transparent to the client, except for: 1. possible delay of some callbacks (when our heuristic overshoots) 2. apps that never do isFinal=XML_TRUE could miss data at the end For the affected testdata, this change shows a 100-400x speedup. The recset.xml benchmark shows no clear change either way. Before: benchmark -n ../testdata/largefiles/recset.xml 65535 3 3 loops, with buffer size 65535. Average time per loop: 0.270223 benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 15.033048 benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.018027 benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 11.775362 benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 11.711414 benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.019362 After: ./run.sh benchmark -n ../testdata/largefiles/recset.xml 65535 3 3 loops, with buffer size 65535. Average time per loop: 0.269030 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.044794 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.016377 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.027022 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.099360 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.017956	2024-01-29 17:09:35 +01:00
Snild Dolkow	60dffa148c	tests: Use normal XML_Parse in test_suspend_resume_internal_entity When the parser is suspended, _XML_Parse_SINGLE_BYTES() will return early. At that point, there could be some amount of bytes that haven't been fed into Expat at all yet. This leaves us with an incomplete document. Furthermore, the last internal XML_Parse() call with isFinal=XML_TRUE will not have happened, so the parser will not know that no more input is to be expected. This is what allowed the test to pass when it was originally changed to use SINGLE_BYTES. With the new partial token heuristic, the lack of a final parse call means that we don't even reach the "Ho" text, and fail the test. The simplest solution is to go back to using XML_Parse() in this test. Another option would be to let SINGLE_BYTES expose how far it got in its loop, allowing for later continuation, but it doesn't seem worth the extra complexity.	2024-01-29 17:09:35 +01:00
Sebastian Pipping	10cded2493	tests/basic_tests.c: Address clang-tidy warning clang-analyzer-core.NullDereference clang-tidy output was: > [..]/tests/basic_tests.c:2083:19: warning: Dereference of null pointer [clang-analyzer-core.NullDereference] > 2083 \| errorFlags \|= ((model[0].type == XML_CTYPE_SEQ) ? 0 : (1u << 2)); > \| ^~~~~~~~~~~~~	2024-01-12 17:25:27 +01:00
Sebastian Pipping	3e5f6d6601	tests: Migrate more tests to variable chunk size parsing	2023-11-09 20:27:20 +01:00
Snild Dolkow	dcbc143680	tests: Narrow test_buffer_can_grow_to_max mingw allocation workaround Instead of applying it to all less-than-64-bit platforms, let's narrow the workaround to just the known failure: 32-bit mingw.	2023-11-09 10:46:51 +01:00
Sebastian Pipping	2f18bacfcd	tests: Cover and adjust to XML_GE==0 self-references	2023-11-07 13:00:42 +01:00
Sebastian Pipping	00089ed745	tests: Fix tests for XML_GE==0 + broaden for XML_GE==1	2023-11-06 20:43:09 +01:00
Sebastian Pipping	d4e0eeb77b	Merge pull request #771 from SonyMobile/buffer-limit Grow buffer based on current size	2023-10-26 15:53:18 +02:00
Snild Dolkow	119ae277ab	Grow buffer based on current size Until now, the buffer size to grow to has been calculated based on the distance from the current parse position to the end of the buffer. This means that the size of any already-parsed data was not considered, leading to inconsistent buffer growth. There was also a special case in XML_Parse() when XML_CONTEXT_BYTES was zero, where the buffer size would be set to twice the incoming string length. This patch replaces this with an XML_GetBuffer() call. Growing the buffer based on its total size makes its growth consistent. The commit includes a test that checks that we can reach the max buffer size (usually INT_MAX/2 + 1) regardless of previously parsed content. GitHub CI couldn't allocate the full 1GiB with MinGW/wine32, though it works locally with the same compiler and wine version. As a workaround, the test tries to malloc 1GiB, and reduces `maxbuf` to 512MiB in case of failure.	2023-10-26 08:21:51 +02:00
Snild Dolkow	1222ae34fb	tests: Use SINGLE_BYTES in test_nobom_utf16_le All tests now run one instance where SINGLE_BYTES is equivalent to a single XML_Parse call. Using SINGLE_BYTES therefore gives more coverage, as evidenced by the new failure we now have to avoid in the test, until it can be fixed.	2023-10-25 17:25:48 +02:00
Snild Dolkow	29babedcab	tests: Remove EE_PARSE_FULL_BUFFER All tests now run one instance where SINGLE_BYTES is equivalent to a single XML_Parse call. There is no longer a need for individual tests to switch between them.	2023-10-25 17:25:05 +02:00
Sebastian Pipping	da64791736	tests: Rename fail_unless to assert_true for clarity	2023-10-20 23:41:04 +02:00
Sebastian Pipping	23110a864d	Be stricter about macro XML_CONTEXT_BYTES - Start treating -DXML_CONTEXT_BYTES=0 as "no context" rather than "context of size 0". Was documented as "must be set to a positive integer", previously. - Enforce that macro XML_CONTEXT_BYTES is defined at build time to avoid accidental misbuilds lacking context in environments that bypass both of Expats official build systems. - Detect and reject use of negative context size at compile time.	2023-10-05 15:44:10 +02:00
Snild Dolkow	a5993b2d42	tests: Remove early comment count check in test_user_parameters Before a parse call with isFinal=XML_TRUE, there is no guarantee that all supplied data has been parsed. Removing the first comment count check removes the test's assumption of such a guarantee.	2023-09-26 10:33:43 +02:00
Snild Dolkow	bb3c171980	tests: set isFinal in test_reset_in_entity Without this, parsing may be deferred so that the suspending callback hasn't been called when the test checks for it.	2023-09-26 10:33:43 +02:00
Snild Dolkow	2e12534145	tests: set isFinal in test_line_number_after_parse Without this, parsing of the start or end tag may be deferred, yielding an unexpected line number.	2023-09-26 10:33:43 +02:00
Snild Dolkow	4978d285d2	tests: Look for single-char match in test_abort_epilog ...instead of a full-string match. These tests were depending on getting handler callbacks with exactly one character of data at a time. For example, if test_abort_epilog got "\n\r\n" in one callback, it would fail to match on the '\r', and would not abort parsing as expected. By searching the callback arg for the magic character rather than expecting a full match, the test no longer depends on exact callback timing. `userData` is never NULL in these tests, so that check was left out of the new version.	2023-09-25 14:01:53 +02:00
Snild Dolkow	7474fe3d3f	tests: Make test_default_current insensitive to callback chunking Instead of testing the exact number and sequence of callbacks, we now test that we get the exact data lengths and sequence of callbacks. The checks become much more verbose, but will now accept any buffer fill strategy -- single bytes, multiple bytes, or any combination thereof.	2023-09-25 14:01:46 +02:00
Snild Dolkow	a4a4552313	tests: Add subtest names to test_default_current	2023-09-25 08:45:37 +02:00

1 2 3

110 Commits