"] ]>" and "]] >" no longer end the CDATA section.
Make CDATA section parsing context depending.
Add private method HTMLParser._set_support_cdata() to change the context.
If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>".
If called with False, "<[CDATA[" starts a bogus comments which ends with ">".
(cherry picked from commit 0cbbfc462119b9107b373c24d2bda5a1271bed36)
(cherry picked from commit dcf24768c918c41821cda6fe6a1aa20ce26545dd)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
This fixes a regression introduced in GH-135930.
(cherry picked from commit dee650189497735edbc08a54edabb5b06ef1bd09)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Whitespaces no longer accepted between `</` and the tag name.
E.g. `</ script>` does not end the script section.
* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
as whitespaces. The only whitespaces are `\t\n\r\f `.
* Null character (U+0000) no longer ends the tag name.
* Attributes and slashes after the tag name in end tags are now ignored,
instead of terminating after the first `>` in quoted attribute value.
E.g. `</script/foo=">"/>`.
* Multiple slashes and whitespaces between the last attribute and closing `>`
are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.
* Multiple `=` between attribute name and value are no longer collapsed.
E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".
* Whitespaces between the `=` separator and attribute name or value are no
longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
"=bar", both with value None; `<a foo= bar>` produces two attributes:
"foo" with value "" and "bar" with value None.
* Fix data loss after unclosed script or style tag (gh-86155).
Also backport test.support.subTests() (gh-135120).
---------
(cherry picked from commit 0243f97cbadec8d985e63b1daec5d1cbc850cae3)
(cherry picked from commit c555f889c3558a0a8cd8d8ecc2b493014b88a700)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
End-of-file errors are now handled according to the HTML5 specs --
comments and declarations are automatically closed, tags are ignored.
(cherry picked from commit 6eb6c5dbfb528bd07d77b60fd71fd05d81d45c41)
Fix typos in the Lib directory as identified by codespell.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>.
(cherry picked from commit 745c9d9dfc1ad6fdfdf1d07420c6273ff67fa5be)
Co-authored-by: Christian Clauss <cclauss@me.com>
* bpo-41748: Adds tests for unquoted attributes with comma
* bpo-41748: Handles unquoted attributes with comma
* bpo-41748: Addresses review comments
* bpo-41748: Addresses review comments
* Adds more test cases
* Simplifies the regex for handling spaces
* bpo-41748: Moves attributes tests under the right class
* bpo-41748: Addresses review about duplicate attributes
* bpo-41748: Adds NEWS.d entry for this patch
* Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as *adding* the "r" prefix.
This reverts commit 5ba185039f1bd465d3f82531324fd3fe1ee42f0c.
The motivation for adding this option is that the the functionality it
provides used to be provided by sgmllib in Python2, and was used by,
for example, BeautifulSoup. Without this option, the Python3 version
of BeautifulSoup and the many programs that use it are crippled.
The original patch was by 'kxroberto'. I modified it heavily but kept his
heuristics and test. I also added additional heuristics to fix#975556,
#1046092, and part of #6191. This patch should be completely backward
compatible: the behavior with the default strict=True is unchanged.