gh-119451: Fix a potential denial of service in http.client (GH-119454)
Reading the whole body of the HTTP response could cause OOM if
the Content-Length value is too large even if the server does not send
a large amount of data. Now the HTTP client reads large data by chunks,
therefore the amount of consumed memory is proportional to the amount
of sent data.
(cherry picked from commit 5a4c4a033a4a54481be6870aa1896fad732555b5)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
[3.14] gh-119452: Fix a potential virtual memory allocation denial of service in http.server (GH-142216)
The CGI server on Windows could consume the amount of memory specified
in the Content-Length header of the request even if the client does not
send such much data. Now it reads the POST request body by chunks,
therefore the memory consumption is proportional to the amount of sent
data.
(cherry picked from commit 0e4f4f1a4633f2d215fb5a803cae278aeea31845)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-104049: do not expose on-disk location from SimpleHTTPRequestHandler (GH-104067)
Do not expose the local server's on-disk location from `SimpleHTTPRequestHandler` when generating a directory index. (unnecessary information disclosure)
---------
(cherry picked from commit c7c3a60c88de61a79ded9fdaf6bc6a29da4efb9a)
Co-authored-by: Ethan Furman <ethan@stoneleaf.us>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit d052a383f1a0c599c176a12c73a761ca00436d8b)
Co-authored-by: Bernhard Wagner <github.comNotification20120125@xmlizer.net>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Éric <merwok@netwok.org>
Also \ escape \s in the http.server BaseHTTPRequestHandler.log_message so
that it is technically possible to parse the line and reconstruct what the
original data was. Without this a \xHH is ambiguious as to if it is a hex
replacement we put in or the characters r"\x" came through in the original
request line.
(cherry picked from commit 7e29398407dbd53b714702abb89aa2fd7baca48a)
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Replace control characters in http.server.BaseHTTPRequestHandler.log_message with an escaped \xHH sequence to avoid causing problems for the terminal the output is printed to.
(cherry picked from commit d8ab0a4dfa48f881b4ac9ab857d2e9de42f72828)
Co-authored-by: Gregory P. Smith <greg@krypto.org>
MozillaCookieJar works for curl's cookies
(cherry picked from commit 0ea8b925d096629852d1045c2c53ff6ad63199cc)
Co-authored-by: Boris Verkhovskiy <boris.verk@gmail.com>
Reindent files which were not properly formatted (PEP 8: 4 spaces).
Remove also some trailing spaces.
(cherry picked from commit e87ada48a9e5d9d03f9759138869216df0d7383a)
Fix an open redirection vulnerability in the `http.server` module when
an URI path starts with `//` that could produce a 301 Location header
with a misleading target. Vulnerability discovered, and logic fix
proposed, by Hamza Avvan (@hamzaavvan).
Test and comments authored by Gregory P. Smith [Google].
(cherry picked from commit 4abab6b603dd38bec1168e9a37c40a48ec89508e)
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Fix command-line option -d/--directory in http.server main
function that was ignored when combined with --cgi.
Automerge-Triggered-By: GH:merwok
(cherry picked from commit 2d080347d74078a55c47715d232d1ab8dc8cd603)
Co-authored-by: Géry Ogam <gery.ogam@gmail.com>
Co-authored-by: Géry Ogam <gery.ogam@gmail.com>
Operating systems without support for TCP_NODELAY will raise an OSError
when trying to set the socket option, but the show can still go on.
(cherry picked from commit 0571b934f5f9198c3461a7b631d7073ac0a5676f)
Co-authored-by: rtobar <rtobarc@gmail.com>
* Set content-length for simple http server 301s
When http.server.SimpleHTTPRequestHandler sends a 301 (Moved
Permanently) due to a missing file, it does not set a Content-Length
of 0. Unfortunately, certain clients can be left waiting for the
connection to be closed in this circumstance, even though no body
will be sent. At time of writing, both curl and Firefox demonstrate
this behavior.
* Test Content-Length on simple http server redirect
When serving a redirect, the SimpleHTTPRequestHandler will now send
`Content-Length: 0`. Several tests for http.server already cover
various behaviors and checks including redirection. This change only
adds one check for the expected Content-Length on the simplest case
for a redirect.
* Add news entry for SimpleHTTPRequestHandler fix
* Clarify the specific kind of 301
Co-authored-by: Senthil Kumaran <skumaran@gatech.edu>
(cherry picked from commit fb427255614fc1f740e7785554c1da8ca39116c2)
Co-authored-by: Stephen Rosen <sirosen@globus.org>
Fixes http.client potential denial of service where it could get stuck reading lines from a malicious server after a 100 Continue response.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit 47895e31b6f626bc6ce47d175fe9d43c1098909d)
Co-authored-by: Gen Xu <xgbarry@gmail.com>
add:
* `_simple_enum` decorator to transform a normal class into an enum
* `_test_simple_enum` function to compare
* `_old_convert_` to enable checking `_convert_` generated enums
`_simple_enum` takes a normal class and converts it into an enum:
@simple_enum(Enum)
class Color:
RED = 1
GREEN = 2
BLUE = 3
`_old_convert_` works much like` _convert_` does, using the original logic:
# in a test file
import socket, enum
CheckedAddressFamily = enum._old_convert_(
enum.IntEnum, 'AddressFamily', 'socket',
lambda C: C.isupper() and C.startswith('AF_'),
source=_socket,
)
`_test_simple_enum` takes a traditional enum and a simple enum and
compares the two:
# in the REPL or the same module as Color
class CheckedColor(Enum):
RED = 1
GREEN = 2
BLUE = 3
_test_simple_enum(CheckedColor, Color)
_test_simple_enum(CheckedAddressFamily, socket.AddressFamily)
Any important differences will raise a TypeError
add:
_simple_enum decorator to transform a normal class into an enum
_test_simple_enum function to compare
_old_convert_ to enable checking _convert_ generated enums
_simple_enum takes a normal class and converts it into an enum:
@simple_enum(Enum)
class Color:
RED = 1
GREEN = 2
BLUE = 3
_old_convert_ works much like _convert_ does, using the original logic:
# in a test file
import socket, enum
CheckedAddressFamily = enum._old_convert_(
enum.IntEnum, 'AddressFamily', 'socket',
lambda C: C.isupper() and C.startswith('AF_'),
source=_socket,
)
test_simple_enum takes a traditional enum and a simple enum and
compares the two:
# in the REPL or the same module as Color
class CheckedColor(Enum):
RED = 1
GREEN = 2
BLUE = 3
_test_simple_enum(CheckedColor, Color)
_test_simple_enum(CheckedAddressFamily, socket.AddressFamily)
Any important differences will raise a TypeError
We now buffer the CONNECT request + tunnel HTTP headers into a single
send call. This prevents the OS from generating multiple network
packets for connection setup when not necessary, improving efficiency.
I've done the implementation for both non-chunked and chunked reads. I haven't benchmarked chunked reads because I don't currently have a convenient way to generate a high-bandwidth chunked stream, but I don't see any reason that it shouldn't enjoy the same benefits that the non-chunked case does. I've used the benchmark attached to the bpo bug to verify that performance now matches the unsized read case.
Automerge-Triggered-By: @methane
CGIHTTPRequestHandler of http.server now logs the CGI script exit
code, rather than the CGI script exit status of os.waitpid().
For example, if the script is killed by signal 11, it now logs:
"CGI script exit code -11."
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).
LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.
The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to
\d+-\w+-\d+(\s*\s*\s*)$
Therefore, a long sequence of spaces can trigger bad performance.
Matching a malicious string such as
LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")
caused catastrophic backtracking.
The fix removes ambiguity about which \s* should match a particular
space.
You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.
from http.server import BaseHTTPRequestHandler, HTTPServer
def make_set_cookie_value(n_spaces):
spaces = " " * n_spaces
expiry = f"1-c-1{spaces}!"
return f"b;Expires={expiry}"
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
self.log_request(204)
self.send_response_only(204) # Don't bother sending Server and Date
n_spaces = (
int(self.path[1:]) # Can GET e.g. /100 to test shorter sequences
if len(self.path) > 1 else
65506 # Max header line length 65536
)
value = make_set_cookie_value(n_spaces)
for i in range(99): # Not necessary, but we can have up to 100 header lines
self.send_header("Set-Cookie", value)
self.end_headers()
if __name__ == "__main__":
HTTPServer(("", 44020), Handler).serve_forever()
This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.
Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :
import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://localhost:44020/")
The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):
import requests
requests.get("http://localhost:44020/")
* Regression test for http.cookiejar REDoS
If we regress, this test will take a very long time.
* Improve performance of http.cookiejar.ISO_DATE_RE
A string like
"444444" + (" " * 2000) + "A"
could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
is_cgi() function of http.server library does not currently handle a
cgi script if one of the cgi_directories is located at the
sub-directory of given path. Since is_cgi() in CGIHTTPRequestHandler
class separates given path into (dir, rest) based on the first seen
'/', multi-level directories like /sub/dir/cgi-bin/hello.py is divided
into head=/sub, rest=dir/cgi-bin/hello.py then check whether '/sub'
exists in cgi_directories = [..., '/sub/dir/cgi-bin'].
This patch makes the is_cgi() keep expanding dir part to the next '/'
then checking if that expanded path exists in the cgi_directories.
Signed-off-by: Siwon Kang <kkangshawn@gmail.com>
https://bugs.python.org/issue38863
* bpo-38216: Allow bypassing input validation
* bpo-36274: Also allow the URL encoding to be overridden.
* bpo-38216, bpo-36274: Add tests demonstrating a hook for overriding validation, test demonstrating override encoding, and a test to capture expectation of the interface for the URL.
* Call with skip_host to avoid tripping on the host checking in the URL.
* Remove obsolete comment.
* Make _prepare_path_encoding its own attr.
This makes overriding just that simpler.
Also, don't use the := operator to make backporting easier.
* Add a news entry.
* _prepare_path_encoding -> _encode_prepared_path()
* Once again separate the path validation and request encoding, drastically simplifying the behavior. Drop the guarantee that all processing happens in _prepare_path.
Handle time comparison for cookies with `expires` attribute when `CookieJar.make_cookies` is called.
Co-authored-by: Demian Brecht <demianbrecht@gmail.com>
https://bugs.python.org/issue12144
Automerge-Triggered-By: @asvetlov
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected.