cpython

mirror of https://github.com/python/cpython.git synced 2026-01-28 21:55:51 +00:00

Author	SHA1	Message	Date
Serhiy Storchaka	3a623c6c55	[3.10] gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837) (GH-140842) (GH-140853) (cherry picked from commit a17c57eee5b5cc81390750d07e4800b19c0c3084) (cherry picked from commit 0329bd11c7e98484727bbb9062d53a8fa53ac7fd)	2025-10-31 17:55:58 +01:00
Miss Islington (bot)	7317e0bbb7	[3.10] gh-135661: Fix CDATA section parsing in HTMLParser (GH-135665) (GH-137774) (GH-139660) "] ]>" and "]] >" no longer end the CDATA section. Make CDATA section parsing context depending. Add private method HTMLParser._set_support_cdata() to change the context. If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>". If called with False, "<[CDATA[" starts a bogus comments which ends with ">". (cherry picked from commit 0cbbfc462119b9107b373c24d2bda5a1271bed36) (cherry picked from commit dcf24768c918c41821cda6fe6a1aa20ce26545dd) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-10-07 14:12:23 +02:00
Serhiy Storchaka	9b51801581	[3.10] gh-118350: Fix support of elements "textarea" and "title" in HTMLParser (GH-135310) (GH-137783) (cherry picked from commit 4d02f31cdd45d81b95540d9076222b709d4f2335) Co-authored-by: Timon Viola <44016238+timonviola@users.noreply.github.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl>	2025-09-13 22:36:51 +02:00
Miss Islington (bot)	1df5d00145	[3.10] gh-135661: Fix parsing attributes with whitespaces around the "=" separator in HTMLParser (GH-136908) (GH-136921) This fixes a regression introduced in GH-135930. (cherry picked from commit dee650189497735edbc08a54edabb5b06ef1bd09) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2025-07-22 11:57:56 +02:00
Miss Islington (bot)	151e0f00f7	[3.10] gh-135661: Fix parsing start and end tags in HTMLParser according to the HTML5 standard (GH-135930) (GH-136268) (#136292 ) * Whitespaces no longer accepted between `</` and the tag name. E.g. `</ script>` does not end the script section. * Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are `\t\n\r\f `. * Null character (U+0000) no longer ends the tag name. * Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first `>` in quoted attribute value. E.g. `</script/foo=">"/>`. * Multiple slashes and whitespaces between the last attribute and closing `>` are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`. * Multiple `=` between attribute name and value are no longer collapsed. E.g. `<a foo==bar>` produces attribute "foo" with value "=bar". * Whitespaces between the `=` separator and attribute name or value are no longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and "=bar", both with value None; `<a foo= bar>` produces two attributes: "foo" with value "" and "bar" with value None. * Fix data loss after unclosed script or style tag (gh-86155). Also backport test.support.subTests() (gh-135120). --------- (cherry picked from commit 0243f97cbadec8d985e63b1daec5d1cbc850cae3) (cherry picked from commit c555f889c3558a0a8cd8d8ecc2b493014b88a700) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>	2025-07-12 14:26:58 +02:00
Miss Islington (bot)	85766db07e	[3.10] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) (GH-136275) * "--!>" now ends the comment. * "-- >" no longer ends the comment. * Support abnormally ended empty comments "<-->" and "<--->". --------- (cherry picked from commit 8ac7613dc8b8f82253d7c0e2b6ef6ed703a0a1ee) Co-author: Kerim Kabirov <the.privat33r+gh@pm.me> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>	2025-07-12 14:24:27 +02:00
Serhiy Storchaka	fdc9d214c0	[3.10] gh-135462: Fix quadratic complexity in processing special input in HTMLParser (GH-135464) (GH-135485) End-of-file errors are now handled according to the HTML5 specs -- comments and declarations are automatically closed, tags are ignored. (cherry picked from commit 6eb6c5dbfb528bd07d77b60fd71fd05d81d45c41)	2025-07-03 23:05:53 +02:00
Christian Clauss	cfca4a6774	[3.10] Fix typos in the Lib directory (GH-28775) (GH-28804) Fix typos in the Lib directory as identified by codespell. Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>. (cherry picked from commit 745c9d9dfc1ad6fdfdf1d07420c6273ff67fa5be) Co-authored-by: Christian Clauss <cclauss@me.com>	2021-10-07 11:49:47 -04:00
Karl Dubost	9eb11a139f	bpo-41748: Handles unquoted attributes with commas (#24072 ) * bpo-41748: Adds tests for unquoted attributes with comma * bpo-41748: Handles unquoted attributes with comma * bpo-41748: Addresses review comments * bpo-41748: Addresses review comments * Adds more test cases * Simplifies the regex for handling spaces * bpo-41748: Moves attributes tests under the right class * bpo-41748: Addresses review about duplicate attributes * bpo-41748: Adds NEWS.d entry for this patch	2021-02-01 21:32:50 +01:00
Inada Naoki	fae0ed5099	bpo-37328: remove deprecated HTMLParser.unescape (GH-14186) It is deprecated since Python 3.4.	2019-08-27 11:48:06 +09:00
Motoki Naruse	3358d589fb	bpo-30629: Remove second call of str.lower() in html.parser.parse_endtag. (#2099 ) elem is the result of .lower() 6 lines above the handle_endtag call. Patch by Motoki Naruse	2017-06-16 21:15:25 -04:00
Serhiy Storchaka	c842efc6ae	Revert "Fixed a typo in the HTMLParser.feed docstrings" (#1771 ) * Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as adding the "r" prefix. This reverts commit 5ba185039f1bd465d3f82531324fd3fe1ee42f0c.	2017-05-24 07:20:45 +03:00
Jani Šumak	5ba185039f	Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a rawstring. (#1759 )	2017-05-23 16:40:54 +03:00
R David Murray	44b548dda8	#27364 : fix "incorrect" uses of escape character in the stdlib. And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.	2016-09-08 13:59:53 -04:00
Martin Panter	46f50726a0	Issue #27076 : Doc, comment and tests spelling fixes Most fixes to Doc/ and Lib/ directories by Ville Skyttä.	2016-05-26 05:35:26 +00:00
Ezio Melotti	20a2c6482e	#23144 : merge with 3.4.	2015-09-06 21:44:45 +03:00
Ezio Melotti	6f2bb98966	#23144 : Make sure that HTMLParser.feed() returns all the data, even when convert_charrefs is True.	2015-09-06 21:38:06 +03:00
Ezio Melotti	6fc16d81af	#21047 : set the default value for the convert_charrefs argument of HTMLParser to True. Patch by Berker Peksag.	2014-08-02 18:36:12 +03:00
Ezio Melotti	73a4359eb0	#15114 : the strict mode and argument of HTMLParser, HTMLParser.error, and the HTMLParserError exception have been removed.	2014-08-02 14:10:30 +03:00
Ezio Melotti	153d97b24e	#20288 : merge with 3.3.	2014-02-01 21:22:26 +02:00
Ezio Melotti	f27b9a741a	#20288 : fix handling of invalid numeric charrefs in HTMLParser.	2014-02-01 21:21:01 +02:00
Ezio Melotti	95401c5f6b	#13633 : Added a new convert_charrefs keyword arg to HTMLParser that, when True, automatically converts all character references.	2013-11-23 19:52:05 +02:00
Ezio Melotti	f6de9eb2bb	#19688 : add back and deprecate the internal HTMLParser.unescape() method.	2013-11-22 05:49:29 +02:00
Ezio Melotti	4a9ee26750	#2927 : Added the unescape() function to the html module.	2013-11-19 20:28:45 +02:00
Ezio Melotti	b7038817fe	#19480 : merge with 3.3.	2013-11-07 18:35:27 +02:00
Ezio Melotti	7165d8b9ba	#19480 : HTMLParser now accepts all valid start-tag names as defined by the HTML5 standard.	2013-11-07 18:33:24 +02:00
Ezio Melotti	88ebfb129b	#15114 : The html.parser module now raises a DeprecationWarning when the strict argument of HTMLParser or the HTMLParser.error method are used.	2013-11-02 17:08:24 +02:00
Ezio Melotti	f6ca26fbff	#17802 : merge with 3.3.	2013-05-01 16:20:00 +03:00
Ezio Melotti	8e596a765c	#17802 : Fix an UnboundLocalError in html.parser. Initial tests by Thomas Barlow.	2013-05-01 16:18:25 +03:00
Ezio Melotti	1698babd1b	#14679 : add an __all__ (that contains only HTMLParser) to html.parser.	2013-05-01 16:09:34 +03:00
Ezio Melotti	46495182d0	#15156 : HTMLParser now uses the new "html.entities.html5" dictionary.	2012-06-24 22:02:56 +02:00
Ezio Melotti	3861d8b271	#15114 : the strict mode of HTMLParser and the HTMLParseError exception are deprecated now that the parser is able to parse invalid markup.	2012-06-23 15:27:51 +02:00
Ezio Melotti	0780b6bc58	#14538 : HTMLParser can now parse correctly start tags that contain a bare /.	2012-04-18 19:18:22 -06:00
Ezio Melotti	29877e8e04	HTMLParser is now able to handle slashes in the start tag.	2012-02-21 09:25:00 +02:00
Ezio Melotti	e31ddedb0e	Fix an index and clean up comments.	2012-02-13 20:20:00 +02:00
Ezio Melotti	f4ab491901	Improve handling of declarations in HTMLParser.	2012-02-13 15:50:37 +02:00
Ezio Melotti	5211ffe4df	#13993 : HTMLParser is now able to handle broken end tags when strict=False.	2012-02-13 11:24:50 +02:00
Ezio Melotti	fa3702dc28	#13960 : HTMLParser is now able to handle broken comments when strict=False.	2012-02-10 10:45:44 +02:00
Ezio Melotti	15cb489234	#13358 : HTMLParser now calls handle_data only once for each CDATA.	2011-11-18 18:01:49 +02:00
Ezio Melotti	c2fe57762b	#1745761 , #755670 , #13357 , #12629 , #1200313 : improve attribute handling in HTMLParser.	2011-11-14 18:53:33 +02:00
Ezio Melotti	7de56f6a04	#670664 : Fix HTMLParser to correctly handle the content of ``<script>...</script>` `and` `<style>...</style>``.	2011-11-01 14:12:22 +02:00
Ezio Melotti	f50ffa94ab	#13273 : fix a bug that prevented HTMLParser to properly detect some tags when strict=False.	2011-10-28 13:21:09 +03:00
Ezio Melotti	d9e0b068af	#12888 : Fix a bug in HTMLParser.unescape that prevented it to escape more than 128 entities. Patch by Peter Otten.	2011-09-05 17:11:06 +03:00
Éric Araujo	51b7aedadd	Merge 3.1	2011-05-25 18:13:49 +02:00
Éric Araujo	39f180bb1f	Fix display of html.parser.HTMLParser.feed docstring	2011-05-04 15:55:47 +02:00
Ezio Melotti	2e3607c1e7	#7311 : fix html.parser to accept non-ASCII attribute values.	2011-04-07 22:03:31 +03:00
Senthil Kumaran	6c85838489	Merged revisions 87542 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r87542 \| senthil.kumaran \| 2010-12-28 23:55:16 +0800 (Tue, 28 Dec 2010) \| 3 lines Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax ........	2010-12-28 16:10:56 +00:00
Senthil Kumaran	164540fee1	Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax	2010-12-28 15:55:16 +00:00
R. David Murray	b579dba119	#1486713 : Add a tolerant mode to HTMLParser. The motivation for adding this option is that the the functionality it provides used to be provided by sgmllib in Python2, and was used by, for example, BeautifulSoup. Without this option, the Python3 version of BeautifulSoup and the many programs that use it are crippled. The original patch was by 'kxroberto'. I modified it heavily but kept his heuristics and test. I also added additional heuristics to fix #975556, #1046092, and part of #6191. This patch should be completely backward compatible: the behavior with the default strict=True is unchanged.	2010-12-03 04:06:39 +00:00
Victor Stinner	30c223cff5	Merged revisions 81504 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ................ r81504 \| victor.stinner \| 2010-05-24 23:46:25 +0200 (lun., 24 mai 2010) \| 13 lines Recorded merge of revisions 81500-81501 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r81500 \| victor.stinner \| 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) \| 2 lines Issue #6662: Fix parsing of malformatted charref (&#bad;) ........ r81501 \| victor.stinner \| 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) \| 2 lines Add the author of the last fix (Issue #6662) ........ ................	2010-05-24 21:48:07 +00:00

1 2

54 Commits