If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e)
(cherry picked from commit 6279eb8c076d89d3739a6edb393e43c7929b429d)
(cherry picked from commit a75953b347716fff694aa59a7c7c2489fa50d1f5)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-113602: Bail out when the parser tries to override existing errors (GH-113607)
(cherry picked from commit 9ed36d533ab8b256f0a589b5be6d7a2fdcf4aff2)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
(cherry picked from commit 48c49739f5502fc7aa82f247ab2e4d7b55bdca62)
(cherry picked from commit d58a5f453f59f44ccf09b1a9b11a0b879ac6f35b)
Co-authored-by: Yilei Yang <yileiyang@google.com>
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
gh-112388: Fix an error that was causing the parser to try to overwrite tokenizer errors (GH-112410)
(cherry picked from commit 2c8b19174274c183eb652932871f60570123fe99)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-111380: Show SyntaxWarnings only once when parsing if invalid syntax is encouintered (GH-111381)
(cherry picked from commit 3d2f1f0b830d86f16f42c42b54d3ea4453dac318)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
It now points on the invalid non-ASCII character, not on the valid numerical literal.
(cherry picked from commit b2729e93e9d73503b1fda4ea4fecd77c58909091)
There are some warnings if build python via clang:
Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_clear_memo_statistics()
^
void
Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_get_memo_statistics()
^
void
Fix it to make clang happy.
(cherry picked from commit 7703def37e4fa7d25c3d23756de8f527daa4e165)
Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
Co-authored-by: Chenxi Mao <chenxi.mao@suse.com>
This makes tokenizer.c:valid_utf8 match stringlib/codecs.h:decode_utf8.
It also fixes an off-by-one error introduced in 3.10 for the line number when the tokenizer reports bad UTF8.
(cherry picked from commit 8bc356a7dd50cbdb46d10b8c7e457832431f5d9e)
Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
This backports https://github.com/python/cpython/pull/96499 aka 511ca9452033ef95bc7d7fc404b8161068226002
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
* gh-94949: Disallow parsing parenthesised ctx manager with old feature_version
* 📜🤖 Added by blurb_it.
* Allow it with feature_version=(3, 9) as well
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
(cherry picked from commit 0daba822212cd5d6c63384a27f390f0945330c2b)
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
(cherry picked from commit ee70c70aa93d7a41cbe47a0b361b17f9d7ec8acd)
Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
Currently, calling Py_EnterRecursiveCall() and
Py_LeaveRecursiveCall() may use a function call or a static inline
function call, depending if the internal pycore_ceval.h header file
is included or not. Use a different name for the static inline
function to ensure that the static inline function is always used in
Python internals for best performance. Similar approach than
PyThreadState_GET() (function call) and _PyThreadState_GET() (static
inline function).
* Rename _Py_EnterRecursiveCall() to _Py_EnterRecursiveCallTstate()
* Rename _Py_LeaveRecursiveCall() to _Py_LeaveRecursiveCallTstate()
* pycore_ceval.h: Rename Py_EnterRecursiveCall() to
_Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() and
_Py_LeaveRecursiveCall()
The warning emitted by the Python parser for a numeric literal
immediately followed by keyword has been changed from deprecation
warning to syntax warning.