cpython

mirror of https://github.com/python/cpython.git synced 2026-01-28 05:35:31 +00:00

Author	SHA1	Message	Date
Miss Islington (bot)	c7064e7d6b	[3.13] gh-140875: Fix handling of unclosed charrefs before EOF in HTMLParser (GH-140904) (GH-141746) (cherry picked from commit 95296a9d40aa2d58502a09e86e2a93c03df23366) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-11-19 12:17:54 +00:00
Miss Islington (bot)	0329bd11c7	[3.13] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) * the "plaintext" element * the RAWTEXT elements "xmp", "iframe", "noembed" and "noframes" * optionally RAWTEXT (if scripting=True) element "noscript" (cherry picked from commit a17c57eee5b5cc81390750d07e4800b19c0c3084) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-10-31 16:08:42 +00:00
Miss Islington (bot)	f2b7954ce0	[3.13] gh-135661: Fix parsing unterminated bogus comments in HTMLParser (GH-137873) (GH-137875) Bogus comments that start with "<![CDATA[" should not include the starting "!" in its value. (cherry picked from commit 7636a66635a0da849cfccd06a52d0a21fb692271) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-08-17 10:59:24 +00:00
Miss Islington (bot)	a33596765b	[3.13] gh-135661: Fix CDATA section parsing in HTMLParser (GH-135665) (GH-137773) "] ]>" and "]] >" no longer end the CDATA section. Make CDATA section parsing context depending. Add private method HTMLParser._set_support_cdata() to change the context. If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>". If called with False, "<[CDATA[" starts a bogus comments which ends with ">". (cherry picked from commit 0cbbfc462119b9107b373c24d2bda5a1271bed36) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-08-14 21:44:16 +03:00
Miss Islington (bot)	8de88e0840	[3.13] gh-118350: Fix support of elements "textarea" and "title" in HTMLParser (GH-135310) (GH-136985) (cherry picked from commit 4d02f31cdd45d81b95540d9076222b709d4f2335) Co-authored-by: Timon Viola <44016238+timonviola@users.noreply.github.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl>	2025-07-22 14:17:59 +02:00
Miss Islington (bot)	853b5c43d0	[3.13] gh-135661: Fix parsing attributes with whitespaces around the "=" separator in HTMLParser (GH-136908) (GH-136918) This fixes a regression introduced in GH-135930. (cherry picked from commit dee650189497735edbc08a54edabb5b06ef1bd09) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-07-22 11:56:10 +02:00
Miss Islington (bot)	7038e4a243	[3.13] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) (GH-136272) * "--!>" now ends the comment. * "-- >" no longer ends the comment. * Support abnormally ended empty comments "<-->" and "<--->". --------- (cherry picked from commit 8ac7613dc8b8f82253d7c0e2b6ef6ed703a0a1ee) Co-author: Kerim Kabirov <the.privat33r+gh@pm.me> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>	2025-07-04 07:22:33 +00:00
Miss Islington (bot)	b041a456f1	[3.13] gh-135661: Fix parsing start and end tags in HTMLParser according to the HTML5 standard (GH-135930) (GH-136256) * Whitespaces no longer accepted between `</` and the tag name. E.g. `</ script>` does not end the script section. * Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are `\t\n\r\f `. * Null character (U+0000) no longer ends the tag name. * Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first `>` in quoted attribute value. E.g. `</script/foo=">"/>`. * Multiple slashes and whitespaces between the last attribute and closing `>` are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`. * Multiple `=` between attribute name and value are no longer collapsed. E.g. `<a foo==bar>` produces attribute "foo" with value "=bar". * Whitespaces between the `=` separator and attribute name or value are no longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and "=bar", both with value None; `<a foo= bar>` produces two attributes: "foo" with value "" and "bar" with value None. --------- (cherry picked from commit 0243f97cbadec8d985e63b1daec5d1cbc850cae3) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>	2025-07-03 21:07:40 +00:00
Miss Islington (bot)	4455cbabf9	[3.13] gh-135462: Fix quadratic complexity in processing special input in HTMLParser (GH-135464) (GH-135482) End-of-file errors are now handled according to the HTML5 specs -- comments and declarations are automatically closed, tags are ignored. (cherry picked from commit 6eb6c5dbfb528bd07d77b60fd71fd05d81d45c41) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-06-13 17:20:30 +00:00
Miss Islington (bot)	e76ff560b0	[3.13] gh-86155: Fix data loss after unclosed script or style tag in HTMLParser (GH-22658) (GH-133845) When calling .close() the HTMLParser should flush all remaining content, even when that content is in an unclosed script or style tag. (cherry picked from commit 53383e90e4df7029f792b7aa81aa2e4cff348ed0) Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>	2025-05-10 17:58:29 +00:00
Miss Islington (bot)	aa0c3d1098	[3.13] gh-77057: Fix handling of invalid markup declarations in HTMLParser (GH-9295) (GH-133834) (cherry picked from commit 76c0b01bc401c3e976011bbc69cec56dbebe0ad5) Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-05-10 14:55:12 +00:00
Miss Islington (bot)	3e55441090	[3.13] gh-69426: HTMLParser: only unescape properly terminated character entities in attribute values (GH-95215) (GH-133586) According to the HTML5 spec, named character references in attribute values should only be processed if they are not followed by an ASCII alphanumeric, or an equals sign. (cherry picked from commit 77b14a6d58e527f915966446eb0866652a46feb5) https: //html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@googlemail.com>	2025-05-09 09:43:54 +03:00
Jean-Christophe Amiel	9a07eff628	gh-100210: Correct the comment link for unescaping HTML (#100212 ) gh-100210: correct the comment link for unescaping HTML	2023-02-19 11:18:12 +01:00
Victor Stinner	1863302d61	gh-97669: Create Tools/build/ directory (#97963 ) Create Tools/build/ directory. Move the following scripts from Tools/scripts/ to Tools/build/: * check_extension_modules.py * deepfreeze.py * freeze_modules.py * generate_global_objects.py * generate_levenshtein_examples.py * generate_opcode_h.py * generate_re_casefix.py * generate_sre_constants.py * generate_stdlib_module_names.py * generate_token.py * parse_html5_entities.py * smelly.py * stable_abi.py * umarshal.py * update_file.py * verify_ensurepip_wheels.py Update references to these scripts.	2022-10-17 12:01:00 +02:00
Dong-hee Na	157aef79b0	gh-95813: Improve HTMLParser from the view of inheritance (#95874 ) * gh-95813: Improve HTMLParser from the view of inheritance * gh-95813: Add unittest * Address code review	2022-08-18 13:16:33 +02:00
Ezio Melotti	f28ec34c5c	gh-82927: Update files related to HTML entities. (GH-92504)	2022-06-21 22:03:12 +02:00
slateny	d707d073be	Add source for character mappings (#92014 )	2022-05-06 12:28:09 +02:00
Alberto Mardegan	562c0d7398	bpo-45421: Remove dead code from html.parser (GH-28847) Support for HtmlParserError was removed back in 2014 with commit 73a4359eb0eb624c588c5d52083ea4944f9787ea, however this small block was missed.	2021-10-12 10:12:21 -07:00
Christian Clauss	745c9d9dfc	Fix typos in the Lib directory (GH-28775) Fix typos in the Lib directory as identified by codespell. Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>	2021-10-06 16:13:48 -07:00
Karl Dubost	9eb11a139f	bpo-41748: Handles unquoted attributes with commas (#24072 ) * bpo-41748: Adds tests for unquoted attributes with comma * bpo-41748: Handles unquoted attributes with comma * bpo-41748: Addresses review comments * bpo-41748: Addresses review comments * Adds more test cases * Simplifies the regex for handling spaces * bpo-41748: Moves attributes tests under the right class * bpo-41748: Addresses review about duplicate attributes * bpo-41748: Adds NEWS.d entry for this patch	2021-02-01 21:32:50 +01:00
Inada Naoki	fae0ed5099	bpo-37328: remove deprecated HTMLParser.unescape (GH-14186) It is deprecated since Python 3.4.	2019-08-27 11:48:06 +09:00
Motoki Naruse	3358d589fb	bpo-30629: Remove second call of str.lower() in html.parser.parse_endtag. (#2099 ) elem is the result of .lower() 6 lines above the handle_endtag call. Patch by Motoki Naruse	2017-06-16 21:15:25 -04:00
Serhiy Storchaka	c842efc6ae	Revert "Fixed a typo in the HTMLParser.feed docstrings" (#1771 ) * Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as adding the "r" prefix. This reverts commit 5ba185039f1bd465d3f82531324fd3fe1ee42f0c.	2017-05-24 07:20:45 +03:00
Jani Šumak	5ba185039f	Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a rawstring. (#1759 )	2017-05-23 16:40:54 +03:00
R David Murray	44b548dda8	#27364 : fix "incorrect" uses of escape character in the stdlib. And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.	2016-09-08 13:59:53 -04:00
Martin Panter	46f50726a0	Issue #27076 : Doc, comment and tests spelling fixes Most fixes to Doc/ and Lib/ directories by Ville Skyttä.	2016-05-26 05:35:26 +00:00
Martin Panter	4827e488a4	Merge spelling fixes from 3.4 into 3.5	2015-10-31 12:16:18 +00:00
Martin Panter	1f1177d69a	Fix some spelling errors in documentation and code comments	2015-10-31 11:48:53 +00:00
Ezio Melotti	20a2c6482e	#23144 : merge with 3.4.	2015-09-06 21:44:45 +03:00
Ezio Melotti	6f2bb98966	#23144 : Make sure that HTMLParser.feed() returns all the data, even when convert_charrefs is True.	2015-09-06 21:38:06 +03:00
Serhiy Storchaka	82e07b92b3	Issue #23181 : More "codepoint" -> "code point".	2015-01-18 11:33:31 +02:00
Serhiy Storchaka	d3faf43f9b	Issue #23181 : More "codepoint" -> "code point".	2015-01-18 11:28:37 +02:00
Ezio Melotti	6fc16d81af	#21047 : set the default value for the convert_charrefs argument of HTMLParser to True. Patch by Berker Peksag.	2014-08-02 18:36:12 +03:00
Ezio Melotti	11bec7a1b8	Add an __all__ to html.entities.	2014-08-02 15:15:02 +03:00
Ezio Melotti	73a4359eb0	#15114 : the strict mode and argument of HTMLParser, HTMLParser.error, and the HTMLParserError exception have been removed.	2014-08-02 14:10:30 +03:00
Ezio Melotti	153d97b24e	#20288 : merge with 3.3.	2014-02-01 21:22:26 +02:00
Ezio Melotti	f27b9a741a	#20288 : fix handling of invalid numeric charrefs in HTMLParser.	2014-02-01 21:21:01 +02:00
Ezio Melotti	95401c5f6b	#13633 : Added a new convert_charrefs keyword arg to HTMLParser that, when True, automatically converts all character references.	2013-11-23 19:52:05 +02:00
Ezio Melotti	f6de9eb2bb	#19688 : add back and deprecate the internal HTMLParser.unescape() method.	2013-11-22 05:49:29 +02:00
Ezio Melotti	4a9ee26750	#2927 : Added the unescape() function to the html module.	2013-11-19 20:28:45 +02:00
Ezio Melotti	b7038817fe	#19480 : merge with 3.3.	2013-11-07 18:35:27 +02:00
Ezio Melotti	7165d8b9ba	#19480 : HTMLParser now accepts all valid start-tag names as defined by the HTML5 standard.	2013-11-07 18:33:24 +02:00
Ezio Melotti	88ebfb129b	#15114 : The html.parser module now raises a DeprecationWarning when the strict argument of HTMLParser or the HTMLParser.error method are used.	2013-11-02 17:08:24 +02:00
Ezio Melotti	4603487dc9	#18020 : improve html.escape speed by an order of magnitude. Patch by Matt Bryant.	2013-07-07 11:11:24 +02:00
Ezio Melotti	f6ca26fbff	#17802 : merge with 3.3.	2013-05-01 16:20:00 +03:00
Ezio Melotti	8e596a765c	#17802 : Fix an UnboundLocalError in html.parser. Initial tests by Thomas Barlow.	2013-05-01 16:18:25 +03:00
Ezio Melotti	1698babd1b	#14679 : add an __all__ (that contains only HTMLParser) to html.parser.	2013-05-01 16:09:34 +03:00
Ezio Melotti	e6e96eea51	#16245 : Fix the value of a few entities in html.entities.html5.	2012-10-23 15:51:27 +02:00
Ezio Melotti	518dbfd7b5	Reorder html.entities.html5 entities to make updates easier. Patch by Iuliia Proskurnia.	2012-10-23 14:45:58 +02:00
Ezio Melotti	46495182d0	#15156 : HTMLParser now uses the new "html.entities.html5" dictionary.	2012-06-24 22:02:56 +02:00

1 2

76 Commits