cpython

mirror of https://github.com/python/cpython.git synced 2026-01-27 05:05:50 +00:00

Author	SHA1	Message	Date
Serhiy Storchaka	bb2b9ba49d	gh-143897: Remove the isxidstart() and isxidcontinue() methods of unicodedata.ucd_3_2_0 (GH-143898) They are now only exposed as the unicodedata function.	2026-01-19 12:37:41 +00:00
Serhiy Storchaka	85013d7a55	Fix refleaks in new unicodedata classes added in gh-74902 (GH-143843)	2026-01-14 19:55:11 +00:00
Serhiy Storchaka	bab1d7a561	gh-74902: Add Unicode Grapheme Cluster Break algorithm (GH-143076) Add the unicodedata.iter_graphemes() function to iterate over grapheme clusters according to rules defined in Unicode Standard Annex #29. Add unicodedata.grapheme_cluster_break(), unicodedata.indic_conjunct_break() and unicodedata.extended_pictographic() functions to get the properties of the character which are related to the above algorithm. Co-authored-by: Guillaume "Vermeille" Sanchez <guillaume.v.sanchez@gmail.com>	2026-01-14 14:37:57 +00:00
Stan Ulbrych	dbe3950a76	gh-129117: Add unicodedata.isxidstart() function (#140269 ) Expose `_PyUnicode_IsXidContinue/Start` in `unicodedata`: add isxidstart() and isxidcontinue() functions. Co-authored-by: Victor Stinner <vstinner@python.org>	2025-10-30 10:18:12 +00:00
Victor Stinner	c4e7d245d6	gh-138342: Move _PyObject_VisitType() to the internal C API (#139734 )	2025-10-08 12:10:58 +02:00
Benjamin Peterson	5bd4bf04c4	closes gh-138706: update Unicode to 17.0.0 (#138719 )	2025-09-11 09:58:39 -07:00
Peter Bierma	4f6ecd10c2	gh-138342: Use a common utility for visiting an object's type (GH-138343) Add `_PyObject_VisitType` in place of `tp_traverse` functions that only visit the object's type.	2025-09-01 16:20:33 +00:00
Adam Turner	918e3ba6c0	GH-137623: Use an AC decorator for docstring line length enforcement (#137690 )	2025-08-18 18:29:00 +01:00
Sergey Miryanov	3a7f17c7e2	gh-130790: Remove references about unicode's readiness from comments (#130801 )	2025-03-03 19:18:09 +00:00
Bénédikt Tran	c1478d1ebb	gh-111178: fix UBSan failures in `Modules/unicodedata.c` (GH-129801) fix UBSan failures for `PreviousDBVersion`	2025-02-25 13:13:47 +01:00
Hizuru	c359fcd2f5	gh-129569: The function unicodedata.normalize() always returns built-in str (#129570 ) Co-authored-by: Victor Stinner <vstinner@python.org>	2025-02-21 14:51:13 +01:00
Victor Stinner	b9a8ca0a6a	gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_STR) (#125194 ) Replace PyUnicode_New(0, 0), PyUnicode_FromString("") and PyUnicode_FromStringAndSize("", 0) with Py_GetConstant(Py_CONSTANT_EMPTY_STR).	2024-10-09 17:15:23 +02:00
Brett Simmers	c2627d6eea	gh-116322: Add Py_mod_gil module slot (#116882 ) This PR adds the ability to enable the GIL if it was disabled at interpreter startup, and modifies the multi-phase module initialization path to enable the GIL when loading a module, unless that module's spec includes a slot indicating it can run safely without the GIL. PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148. A warning will be issued up to once per interpreter for the first GIL-using module that is loaded. If `-v` is given, a shorter message will be printed to stderr every time a GIL-using module is loaded (including the first one that issues a warning).	2024-05-03 11:30:55 -04:00
CF Bolz-Tereick	9573d14215	gh-96954: use a directed acyclic word graph for storing the unicodedata codepoint names (#97906 ) Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>	2023-11-04 15:56:58 +01:00
James Gerity	def828995a	fixes gh-109559: Update `unicodedata` for Unicode 15.1.0 (GH-109560) --------- Co-authored-by: Benjamin Peterson <benjamin@python.org>	2023-09-19 22:07:47 -07:00
Victor Stinner	1a3faba9f1	gh-106869: Use new PyMemberDef constant names (#106871 ) * Remove '#include "structmember.h"'. * If needed, add <stddef.h> to get offsetof() function. * Update Parser/asdl_c.py to regenerate Python/Python-ast.c. * Replace: * T_SHORT => Py_T_SHORT * T_INT => Py_T_INT * T_LONG => Py_T_LONG * T_FLOAT => Py_T_FLOAT * T_DOUBLE => Py_T_DOUBLE * T_STRING => Py_T_STRING * T_OBJECT => _Py_T_OBJECT * T_CHAR => Py_T_CHAR * T_BYTE => Py_T_BYTE * T_UBYTE => Py_T_UBYTE * T_USHORT => Py_T_USHORT * T_UINT => Py_T_UINT * T_ULONG => Py_T_ULONG * T_STRING_INPLACE => Py_T_STRING_INPLACE * T_BOOL => Py_T_BOOL * T_OBJECT_EX => Py_T_OBJECT_EX * T_LONGLONG => Py_T_LONGLONG * T_ULONGLONG => Py_T_ULONGLONG * T_PYSSIZET => Py_T_PYSSIZET * T_NONE => _Py_T_NONE * READONLY => Py_READONLY * PY_AUDIT_READ => Py_AUDIT_READ * READ_RESTRICTED => Py_AUDIT_READ * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED * RESTRICTED => (READ_RESTRICTED \| _Py_WRITE_RESTRICTED)	2023-07-25 15:28:30 +02:00
Serhiy Storchaka	329e4a1a3f	gh-86493: Modernize modules initialization code (GH-106858) Use PyModule_Add() or PyModule_AddObjectRef() instead of soft deprecated PyModule_AddObject().	2023-07-25 14:34:49 +03:00
Serhiy Storchaka	a293fa5915	gh-86493: Use PyModule_Add() instead of PyModule_AddObjectRef() (GH-106860)	2023-07-18 23:59:53 +03:00
Inada Naoki	d5bd32fb48	gh-104922: remove PY_SSIZE_T_CLEAN (#106315 )	2023-07-02 15:07:46 +09:00
Victor Stinner	ef300937c2	gh-92536: Remove PyUnicode_READY() calls (#105210 ) Since Python 3.12, PyUnicode_READY() does nothing and always returns 0.	2023-06-02 01:33:17 +02:00
Eric Snow	a9c6e0618f	gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205) Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).	2023-05-05 21:11:27 +00:00
Dong-hee Na	9ef7e75434	gh-101372: Fix unicodedata.is_normalized to properly handle the UCD 3… (gh-101388)	2023-02-06 13:58:00 +09:00
Victor Stinner	65dd745f1a	gh-99300: Use Py_NewRef() in Modules/ directory (#99473 ) Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and Py_XNewRef() in test C files of the Modules/ directory.	2022-11-14 16:21:40 +01:00
Benjamin Peterson	fd1e477f53	closes gh-96734: Update to Unicode 15.0.0. (GH-96809)	2022-09-13 15:45:12 -07:00
Dong-hee Na	2bf5f64455	Remove usage of _Py_IDENTIFIER from unicodedata module. (GH-91532)	2022-04-15 10:44:05 +09:00
Eric Snow	81c72044a1	bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _PyId functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _PyId(), replacing the _Py_Identifier * parameter with PyObject . The following are not changed (yet): stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541	2022-02-08 13:39:07 -07:00
Christian Heimes	03e9f5dc75	bpo-43974: Move Py_BUILD_CORE_MODULE into module code (GH-29157) setup.py no longer defines Py_BUILD_CORE_MODULE. Instead every module defines the macro before #include "Python.h" unless Py_BUILD_CORE_BUILTIN is already defined. Py_BUILD_CORE_BUILTIN is defined for every module that is built by Modules/Setup. The PR also simplifies Modules/Setup. Makefile and makesetup already define Py_BUILD_CORE_BUILTIN and include Modules/internal for us. Signed-off-by: Christian Heimes <christian@python.org>	2021-10-22 15:36:28 +02:00
Benjamin Peterson	024fda47d4	closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336)	2021-09-14 11:00:38 -07:00
Dong-hee Na	9abd07e596	bpo-44987: Speed up unicode normalization of ASCII strings (GH-28283)	2021-09-11 18:04:38 +03:00
Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి)	7b21108445	Remove irrelevant comment which was added in 2a70a3a (GH-27044)	2021-07-08 21:57:25 -07:00
Erlend Egeberg Aasland	00710e6346	bpo-43908: Make heap types converted during 3.10 alpha immutable (GH-26351) * Make functools types immutable * Multibyte codec types are now immutable * pyexpat.xmlparser is now immutable * array.arrayiterator is now immutable * _thread types are now immutable * _csv types are now immutable * _queue.SimpleQueue is now immutable * mmap.mmap is now immutable * unicodedata.UCD is now immutable * sqlite3 types are now immutable * _lsprof.Profiler is now immutable * _overlapped.Overlapped is now immutable * _operator types are now immutable * winapi__overlapped.Overlapped is now immutable * _lzma types are now immutable * _bz2 types are now immutable * _dbm.dbm and _gdbm.gdbm are now immutable	2021-06-17 11:06:09 +01:00
Erlend Egeberg Aasland	59af59c2df	bpo-42972: Fully support GC for pyexpat, unicodedata, and dbm/gdbm heap types (GH-26376) * bpo-42972: pyexpat * bpo-42972: unicodedata * bpo-42972: dbm/gdbm	2021-05-27 17:29:00 +10:00
Inada Naoki	4d4be47705	Do not use Py_ssize_clean_t (GH-25940)	2021-05-08 10:17:37 +09:00
Erlend Egeberg Aasland	9746cda705	bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748) Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types: * _dbm.dbm * _gdbm.gdbm * _multibytecodec.MultibyteCodec * _sre..SRE_Scanner * _thread._localdummy * _thread.lock * _winapi.Overlapped * array.arrayiterator * functools.KeyWrapper * functools._lru_list_elem * pyexpat.xmlparser * re.Match * re.Pattern * unicodedata.UCD * zlib.Compress * zlib.Decompress	2021-04-30 16:04:57 +02:00
Erlend Egeberg Aasland	61d26394f9	bpo-41798: Allocate unicodedata CAPI on the heap (GH-24128)	2021-01-20 12:03:53 +01:00
Victor Stinner	32bd68c839	bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587) No longer use deprecated aliases to functions: * Replace PyObject_MALLOC() with PyObject_Malloc() * Replace PyObject_REALLOC() with PyObject_Realloc() * Replace PyObject_FREE() with PyObject_Free() * Replace PyObject_Del() with PyObject_Free() * Replace PyObject_DEL() with PyObject_Free()	2020-12-01 10:37:39 +01:00
Victor Stinner	84f7382215	bpo-42157: Rename unicodedata.ucnhash_CAPI (GH-22994) Removed the unicodedata.ucnhash_CAPI attribute which was an internal PyCapsule object. The related private _PyUnicode_Name_CAPI structure was moved to the internal C API. Rename unicodedata.ucnhash_CAPI as unicodedata._ucnhash_CAPI.	2020-10-27 04:36:22 +01:00
Victor Stinner	c8c4200b65	bpo-42157: Convert unicodedata.UCD to heap type (GH-22991) Convert the unicodedata extension module to the multiphase initialization API (PEP 489) and convert the unicodedata.UCD static type to a heap type. Co-Authored-By: Mohamed Koubaa <koubaa.m@gmail.com>	2020-10-26 23:19:22 +01:00
Victor Stinner	920cb647ba	bpo-42157: unicodedata avoids references to UCD_Type (GH-22990) * UCD_Check() uses PyModule_Check() * Simplify the internal _PyUnicode_Name_CAPI structure: * Remove size and state members * Remove state and self parameters of getcode() and getname() functions * Remove global_module_state	2020-10-26 19:19:36 +01:00
Victor Stinner	47e1afd2a1	bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713) The private _PyUnicode_Name_CAPI structure of the PyCapsule API unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the structure gets a new state member which must be passed to the getcode() and getname() functions. * Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h * unicodedata module is now built with Py_BUILD_CORE_MODULE. * unicodedata: move hashAPI variable into unicodedata_module_state.	2020-10-26 16:43:47 +01:00
Victor Stinner	e6b8c5263a	bpo-1635741: Add a global module state to unicodedata (GH-22712) Prepare unicodedata to add a state per module: start with a global "module" state, pass it to subfunctions which access &UCD_Type. This change also prepares the conversion of the UCD_Type static type to a heap type.	2020-10-15 16:22:19 +02:00
Mohamed Koubaa	ddc0dd001a	bpo-1635741, unicodedata: add ucd_type parameter to UCD_Check() macro (GH-22328) Co-authored-by: Victor Stinner <vstinner@python.org>	2020-09-23 12:38:16 +02:00
Victor Stinner	4a21e57fe5	bpo-40268: Remove unused structmember.h includes (GH-19530) If only offsetof() is needed: include stddef.h instead. When structmember.h is used, add a comment explaining that PyMemberDef is used.	2020-04-15 02:35:41 +02:00
Serhiy Storchaka	cd8295ff75	bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345)	2020-04-11 10:48:40 +03:00
Andy Lester	982307b9cc	bpo-39943: Remove unused self from find_nfc_index() (GH-18973)	2020-03-17 17:38:12 +01:00
Benjamin Peterson	051b9d08d1	closes bpo-39926: Update Unicode to 13.0.0. (GH-18910)	2020-03-10 20:41:34 -07:00
Dong-hee Na	1b55b65638	bpo-39573: Clean up modules and headers to use Py_IS_TYPE() function (GH-18521)	2020-02-17 11:09:15 +01:00
Victor Stinner	d2ec81a8c9	bpo-39573: Add Py_SET_TYPE() function (GH-18394) Add Py_SET_TYPE() function to set the type of an object.	2020-02-07 09:17:07 +01:00
Jordon Xu	2ec7010206	bpo-37752: Delete redundant Py_CHARMASK in normalizestring() (GH-15095)	2019-09-10 17:04:08 +01:00
Greg Price	7669cb8b21	bpo-38043: Use `bool` for boolean flags on is_normalized_quickcheck. (GH-15711)	2019-09-09 02:16:31 -07:00

1 2 3 4

175 Commits