docs: fold long lines

Also:
- replace a 'will' found by badwords.
- drop duplicate empty lines.

Closes #19930
This commit is contained in:
Viktor Szakats 2025-12-11 02:13:46 +01:00
parent 421f931e7a
commit 8ff5222b4e
No known key found for this signature in database
GPG Key ID: B5ABD165E2AEF201
10 changed files with 346 additions and 162 deletions

View File

@ -35,7 +35,8 @@ space separated fields.
4. The ALPN id for the destination host
5. The hostname for the destination host
6. The port number for the destination host
7. The expiration date and time of this entry within double quotes. The date format is "YYYYMMDD HH:MM:SS" and the time zone is GMT.
7. The expiration date and time of this entry within double quotes.
The date format is "YYYYMMDD HH:MM:SS" and the time zone is GMT.
8. Boolean (1 or 0) if "persist" was set for this entry
9. Integer priority value (not currently used)

View File

@ -14,7 +14,6 @@ to and read from. It manages read and write positions and has a maximum size.
Its basic read/write functions have a similar signature and return code
handling as many internal curl read and write ones.
```
ssize_t Curl_bufq_write(struct bufq *q, const unsigned char *buf, size_t len, CURLcode *err);
@ -150,7 +149,6 @@ reports **full**, but one can **still** write. This option is necessary, if
partial writes need to be avoided. It means that you need other checks to keep
the `bufq` from growing ever larger and larger.
## pools
A `struct bufc_pool` may be used to create chunks for a `bufq` and keep spare

View File

@ -6,26 +6,41 @@ SPDX-License-Identifier: curl
# curl client readers
Client readers is a design in the internals of libcurl, not visible in its public API. They were started
in curl v8.7.0. This document describes the concepts, its high level implementation and the motivations.
Client readers is a design in the internals of libcurl, not visible in its
public API. They were started in curl v8.7.0. This document describes
the concepts, its high level implementation and the motivations.
## Naming
`libcurl` operates between clients and servers. A *client* is the application using libcurl, like the command line tool `curl` itself. Data to be uploaded to a server is **read** from the client and **sent** to the server, the servers response is **received** by `libcurl` and then **written** to the client.
`libcurl` operates between clients and servers. A *client* is the application
using libcurl, like the command line tool `curl` itself. Data to be uploaded
to a server is **read** from the client and **sent** to the server, the servers
response is **received** by `libcurl` and then **written** to the client.
With this naming established, client readers are concerned with providing data from the application to the server. Applications register callbacks via `CURLOPT_READFUNCTION`, data via `CURLOPT_POSTFIELDS` and other options to be used by `libcurl` when the request is send.
With this naming established, client readers are concerned with providing data
from the application to the server. Applications register callbacks via
`CURLOPT_READFUNCTION`, data via `CURLOPT_POSTFIELDS` and other options to be
used by `libcurl` when the request is send.
## Invoking
The transfer loop that sends and receives, is using `Curl_client_read()` to get more data to send for a transfer. If no specific reader has been installed yet, the default one that uses `CURLOPT_READFUNCTION` is added. The prototype is
The transfer loop that sends and receives, is using `Curl_client_read()` to get
more data to send for a transfer. If no specific reader has been installed yet,
the default one that uses `CURLOPT_READFUNCTION` is added. The prototype is
```
CURLcode Curl_client_read(struct Curl_easy *data, char *buf, size_t blen,
size_t *nread, bool *eos);
```
The arguments are the transfer to read for, a buffer to hold the read data, its length, the actual number of bytes placed into the buffer and the `eos` (*end of stream*) flag indicating that no more data is available. The `eos` flag may be set for a read amount, if that amount was the last. That way curl can avoid to read an additional time.
The arguments are the transfer to read for, a buffer to hold the read data, its
length, the actual number of bytes placed into the buffer and the `eos` (*end of
stream*) flag indicating that no more data is available. The `eos` flag may be
set for a read amount, if that amount was the last. That way curl can avoid to
read an additional time.
The implementation of `Curl_client_read()` uses a chain of *client reader* instances to get the data. This is similar to the design of *client writers*. The chain of readers allows processing of the data to send.
The implementation of `Curl_client_read()` uses a chain of *client reader*
instances to get the data. This is similar to the design of *client writers*.
The chain of readers allows processing of the data to send.
The definition of a reader is:
@ -51,11 +66,17 @@ struct Curl_creader {
};
```
`Curl_creader` is a reader instance with a `next` pointer to form the chain. It as a type `crt` which provides the implementation. The main callback is `do_read()` which provides the data to the caller. The others are for setup and tear down. `needs_rewind()` is explained further below.
`Curl_creader` is a reader instance with a `next` pointer to form the chain.
It as a type `crt` which provides the implementation. The main callback is
`do_read()` which provides the data to the caller. The others are for setup
and tear down. `needs_rewind()` is explained further below.
## Phases and Ordering
Since client readers may transform the data being read through the chain, the order in which they are called is relevant for the outcome. When a reader is created, it gets the `phase` property in which it operates. Reader phases are defined like:
Since client readers may transform the data being read through the chain,
the order in which they are called is relevant for the outcome. When a reader
is created, it gets the `phase` property in which it operates. Reader phases
are defined like:
```
typedef enum {
@ -67,65 +88,115 @@ typedef enum {
} Curl_creader_phase;
```
If a reader for phase `PROTOCOL` is added to the chain, it is always added *after* any `NET` or `TRANSFER_ENCODE` readers and *before* and `CONTENT_ENCODE` and `CLIENT` readers. If there is already a reader for the same phase, the new reader is added before the existing one(s).
If a reader for phase `PROTOCOL` is added to the chain, it is always added
*after* any `NET` or `TRANSFER_ENCODE` readers and *before* and
`CONTENT_ENCODE` and `CLIENT` readers. If there is already a reader for the
same phase, the new reader is added before the existing one(s).
### Example: `chunked` reader
In `http_chunks.c` a client reader for chunked uploads is implemented. This one operates at phase `CURL_CR_TRANSFER_ENCODE`. Any data coming from the reader "below" has the HTTP/1.1 chunk handling applied and returned to the caller.
In `http_chunks.c` a client reader for chunked uploads is implemented. This
one operates at phase `CURL_CR_TRANSFER_ENCODE`. Any data coming from the
reader "below" has the HTTP/1.1 chunk handling applied and returned to the
caller.
When this reader sees an `eos` from below, it generates the terminal chunk, adding trailers if provided by the application. When that last chunk is fully returned, it also sets `eos` to the caller.
When this reader sees an `eos` from below, it generates the terminal chunk,
adding trailers if provided by the application. When that last chunk is fully
returned, it also sets `eos` to the caller.
### Example: `lineconv` reader
In `sendf.c` a client reader that does line-end conversions is implemented. It operates at `CURL_CR_CONTENT_ENCODE` and converts any "\n" to "\r\n". This is used for FTP ASCII uploads or when the general `crlf` options has been set.
In `sendf.c` a client reader that does line-end conversions is implemented. It
operates at `CURL_CR_CONTENT_ENCODE` and converts any "\n" to "\r\n". This is
used for FTP ASCII uploads or when the general `crlf` options has been set.
### Example: `null` reader
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader has the simple job of providing transfer bytes of length 0 to the caller, immediately indicating an `eos`. This reader is installed by HTTP for all GET/HEAD requests and when authentication is being negotiated.
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader has the
simple job of providing transfer bytes of length 0 to the caller, immediately
indicating an `eos`. This reader is installed by HTTP for all GET/HEAD
requests and when authentication is being negotiated.
### Example: `buf` reader
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader get a buffer pointer and a length and provides exactly these bytes. This one is used in HTTP for sending `postfields` provided by the application.
Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader get a buffer
pointer and a length and provides exactly these bytes. This one is used in
HTTP for sending `postfields` provided by the application.
## Request retries
Sometimes it is necessary to send a request with client data again. Transfer handling can inquire via `Curl_client_read_needs_rewind()` if a rewind (e.g. a reset of the client data) is necessary. This asks all installed readers if they need it and give `FALSE` of none does.
Sometimes it is necessary to send a request with client data again. Transfer
handling can inquire via `Curl_client_read_needs_rewind()` if a rewind (e.g. a
reset of the client data) is necessary. This asks all installed readers if
they need it and give `FALSE` of none does.
## Upload Size
Many protocols need to know the amount of bytes delivered by the client readers in advance. They may invoke `Curl_creader_total_length(data)` to retrieve that. However, not all reader chains know the exact value beforehand. In that case, the call returns `-1` for "unknown".
Many protocols need to know the amount of bytes delivered by the client
readers in advance. They may invoke `Curl_creader_total_length(data)` to
retrieve that. However, not all reader chains know the exact value beforehand.
In that case, the call returns `-1` for "unknown".
Even if the length of the "raw" data is known, the length that is send may not. Example: with option `--crlf` the uploaded content undergoes line-end conversion. The line converting reader does not know in advance how many newlines it may encounter. Therefore it must return `-1` for any positive raw content length.
Even if the length of the "raw" data is known, the length that is send may
not. Example: with option `--crlf` the uploaded content undergoes line-end
conversion. The line converting reader does not know in advance how many
newlines it may encounter. Therefore it must return `-1` for any positive raw
content length.
In HTTP, once the correct client readers are installed, the protocol asks the readers for the total length. If that is known, it can set `Content-Length:` accordingly. If not, it may choose to add an HTTP "chunked" reader.
In HTTP, once the correct client readers are installed, the protocol asks the
readers for the total length. If that is known, it can set `Content-Length:`
accordingly. If not, it may choose to add an HTTP "chunked" reader.
In addition, there is `Curl_creader_client_length(data)` which gives the total length as reported by the reader in phase `CURL_CR_CLIENT` without asking other readers that may transform the raw data. This is useful in estimating the size of an upload. The HTTP protocol uses this to determine if `Expect: 100-continue` shall be done.
In addition, there is `Curl_creader_client_length(data)` which gives the total
length as reported by the reader in phase `CURL_CR_CLIENT` without asking
other readers that may transform the raw data. This is useful in estimating
the size of an upload. The HTTP protocol uses this to determine if `Expect:
100-continue` shall be done.
## Resuming
Uploads can start at a specific offset, if so requested. The "resume from" that offset. This applies to the reader in phase `CURL_CR_CLIENT` that delivers the "raw" content. Resumption can fail if the installed reader does not support it or if the offset is too large.
Uploads can start at a specific offset, if so requested. The "resume from"
that offset. This applies to the reader in phase `CURL_CR_CLIENT` that
delivers the "raw" content. Resumption can fail if the installed reader does
not support it or if the offset is too large.
The total length reported by the reader changes when resuming. Example: resuming an upload of 100 bytes by 25 reports a total length of 75 afterwards.
The total length reported by the reader changes when resuming. Example:
resuming an upload of 100 bytes by 25 reports a total length of 75 afterwards.
If `resume_from()` is invoked twice, it is additive. There is currently no way to undo a resume.
If `resume_from()` is invoked twice, it is additive. There is currently no way
to undo a resume.
## Rewinding
When a request is retried, installed client readers are discarded and replaced by new ones. This works only if the new readers upload the same data. For many readers, this is not an issue. The "null" reader always does the same. Also the `buf` reader, initialized with the same buffer, does this.
When a request is retried, installed client readers are discarded and replaced
by new ones. This works only if the new readers upload the same data. For many
readers, this is not an issue. The "null" reader always does the same. Also
the `buf` reader, initialized with the same buffer, does this.
Readers operating on callbacks to the application need to "rewind" the underlying content. For example, when reading from a `FILE*`, the reader needs to `fseek()` to the beginning. The following methods are used:
1. `Curl_creader_needs_rewind(data)`: tells if a rewind is necessary, given the current state of the reader chain. If nothing really has been read so far, this returns `FALSE`.
2. `Curl_creader_will_rewind(data)`: tells if the reader chain rewinds at the start of the next request.
3. `Curl_creader_set_rewind(data, TRUE)`: marks the reader chain for rewinding at the start of the next request.
4. `Curl_client_start(data)`: tells the readers that a new request starts and they need to rewind if requested.
Readers operating on callbacks to the application need to "rewind" the
underlying content. For example, when reading from a `FILE*`, the reader needs
to `fseek()` to the beginning. The following methods are used:
1. `Curl_creader_needs_rewind(data)`: tells if a rewind is necessary, given
the current state of the reader chain. If nothing really has been read so
far, this returns `FALSE`.
2. `Curl_creader_will_rewind(data)`: tells if the reader chain rewinds at
the start of the next request.
3. `Curl_creader_set_rewind(data, TRUE)`: marks the reader chain for rewinding
at the start of the next request.
4. `Curl_client_start(data)`: tells the readers that a new request starts and
they need to rewind if requested.
## Summary and Outlook
By adding the client reader interface, any protocol can control how/if it wants the curl transfer to send bytes for a request. The transfer loop becomes then blissfully ignorant of the specifics.
By adding the client reader interface, any protocol can control how/if it wants
the curl transfer to send bytes for a request. The transfer loop becomes then
blissfully ignorant of the specifics.
The protocols on the other hand no longer have to care to package data most efficiently. At any time, should more data be needed, it can be read from the client. This is used when sending HTTP requests headers to add as much request body data to the initial sending as there is room for.
The protocols on the other hand no longer have to care to package data most
efficiently. At any time, should more data be needed, it can be read from the
client. This is used when sending HTTP requests headers to add as much request
body data to the initial sending as there is room for.
Future enhancements based on the client readers:
* `expect-100` handling: place that into an HTTP specific reader at

View File

@ -6,24 +6,35 @@ SPDX-License-Identifier: curl
# curl client writers
Client writers is a design in the internals of libcurl, not visible in its public API. They were started
in curl v8.5.0. This document describes the concepts, its high level implementation and the motivations.
Client writers is a design in the internals of libcurl, not visible in its
public API. They were started in curl v8.5.0. This document describes the
concepts, its high level implementation and the motivations.
## Naming
`libcurl` operates between clients and servers. A *client* is the application using libcurl, like the command line tool `curl` itself. Data to be uploaded to a server is **read** from the client and **send** to the server, the servers response is **received** by `libcurl` and then **written** to the client.
`libcurl` operates between clients and servers. A *client* is the application
using libcurl, like the command line tool `curl` itself. Data to be uploaded
to a server is **read** from the client and **send** to the server, the
servers response is **received** by `libcurl` and then **written** to the
client.
With this naming established, client writers are concerned with writing responses from the server to the application. Applications register callbacks via `CURLOPT_WRITEFUNCTION` and `CURLOPT_HEADERFUNCTION` to be invoked by `libcurl` when the response is received.
With this naming established, client writers are concerned with writing
responses from the server to the application. Applications register callbacks
via `CURLOPT_WRITEFUNCTION` and `CURLOPT_HEADERFUNCTION` to be invoked by
`libcurl` when the response is received.
## Invoking
All code in `libcurl` that handles response data is ultimately expected to forward this data via `Curl_client_write()` to the application. The exact prototype of this function is:
All code in `libcurl` that handles response data is ultimately expected to
forward this data via `Curl_client_write()` to the application. The exact
prototype of this function is:
```
CURLcode Curl_client_write(struct Curl_easy *data, int type, const char *buf, size_t blen);
```
The `type` argument specifies what the bytes in `buf` actually are. The following bits are defined:
The `type` argument specifies what the bytes in `buf` actually are.
The following bits are defined:
```
#define CLIENTWRITE_BODY (1<<0) /* non-meta information, BODY */
#define CLIENTWRITE_INFO (1<<1) /* meta information, not a HEADER */
@ -35,11 +46,15 @@ The `type` argument specifies what the bytes in `buf` actually are. The followin
```
The main types here are `CLIENTWRITE_BODY` and `CLIENTWRITE_HEADER`. They are
mutually exclusive. The other bits are enhancements to `CLIENTWRITE_HEADER` to
specify what the header is about. They are only used in HTTP and related
mutually exclusive. The other bits are enhancements to `CLIENTWRITE_HEADER`
to specify what the header is about. They are only used in HTTP and related
protocols (RTSP and WebSocket).
The implementation of `Curl_client_write()` uses a chain of *client writer* instances to process the call and make sure that the bytes reach the proper application callbacks. This is similar to the design of connection filters: client writers can be chained to process the bytes written through them. The definition is:
The implementation of `Curl_client_write()` uses a chain of *client writer*
instances to process the call and make sure that the bytes reach the proper
application callbacks. This is similar to the design of connection filters:
client writers can be chained to process the bytes written through them. The
definition is:
```
struct Curl_cwtype {
@ -60,11 +75,17 @@ struct Curl_cwriter {
};
```
`Curl_cwriter` is a writer instance with a `next` pointer to form the chain. It has a type `cwt` which provides the implementation. The main callback is `do_write()` that processes the data and calls then the `next` writer. The others are for setup and tear down.
`Curl_cwriter` is a writer instance with a `next` pointer to form the chain.
It has a type `cwt` which provides the implementation. The main callback is
`do_write()` that processes the data and calls then the `next` writer. The
others are for setup and tear down.
## Phases and Ordering
Since client writers may transform the bytes written through them, the order in which the are called is relevant for the outcome. When a writer is created, one property it gets is the `phase` in which it operates. Writer phases are defined like:
Since client writers may transform the bytes written through them, the order
in which the are called is relevant for the outcome. When a writer is created,
one property it gets is the `phase` in which it operates. Writer phases are
defined like:
```
typedef enum {
@ -76,15 +97,28 @@ typedef enum {
} Curl_cwriter_phase;
```
If a writer for phase `PROTOCOL` is added to the chain, it is always added *after* any `RAW` or `TRANSFER_DECODE` and *before* any `CONTENT_DECODE` and `CLIENT` phase writer. If there is already a writer for the same phase present, the new writer is inserted just before that one.
If a writer for phase `PROTOCOL` is added to the chain, it is always added
*after* any `RAW` or `TRANSFER_DECODE` and *before* any `CONTENT_DECODE` and
`CLIENT` phase writer. If there is already a writer for the same phase
present, the new writer is inserted just before that one.
All transfers have a chain of 3 writers by default. A specific protocol handler may alter that by adding additional writers. The 3 standard writers are (name, phase):
All transfers have a chain of 3 writers by default. A specific protocol
handler may alter that by adding additional writers. The 3 standard writers
are (name, phase):
1. `"raw", CURL_CW_RAW `: if the transfer is verbose, it forwards the body data to the debug function.
1. `"download", CURL_CW_PROTOCOL`: checks that protocol limits are kept and updates progress counters. When a download has a known length, it checks that it is not exceeded and errors otherwise.
1. `"client", CURL_CW_CLIENT`: the main work horse. It invokes the application callbacks or writes to the configured file handles. It chops large writes into smaller parts, as documented for `CURLOPT_WRITEFUNCTION`. If also handles *pausing* of transfers when the application callback returns `CURL_WRITEFUNC_PAUSE`.
1. `"raw", CURL_CW_RAW `: if the transfer is verbose, it forwards the body data
to the debug function.
1. `"download", CURL_CW_PROTOCOL`: checks that protocol limits are kept and
updates progress counters. When a download has a known length, it checks
that it is not exceeded and errors otherwise.
1. `"client", CURL_CW_CLIENT`: the main work horse. It invokes the application
callbacks or writes to the configured file handles. It chops large writes
into smaller parts, as documented for `CURLOPT_WRITEFUNCTION`. If also
handles *pausing* of transfers when the application callback returns
`CURL_WRITEFUNC_PAUSE`.
With these writers always in place, libcurl's protocol handlers automatically have these implemented.
With these writers always in place, libcurl's protocol handlers automatically
have these implemented.
## Enhanced Use
@ -112,12 +146,23 @@ which always is ordered before writers in phase `CURL_CW_CONTENT_DECODE`.
What else?
Well, HTTP servers may also apply a `Transfer-Encoding` to the body of a response. The most well-known one is `chunked`, but algorithms like `gzip` and friends could also be applied. The difference to content encodings is that decoding needs to happen *before* protocol checks, for example on length, are done.
Well, HTTP servers may also apply a `Transfer-Encoding` to the body of a
response. The most well-known one is `chunked`, but algorithms like `gzip` and
friends could also be applied. The difference to content encodings is that
decoding needs to happen *before* protocol checks, for example on length, are
done.
That is why transfer decoding writers are added for phase `CURL_CW_TRANSFER_DECODE`. Which makes their operation happen *before* phase `CURL_CW_PROTOCOL` where length may be checked.
That is why transfer decoding writers are added for phase
`CURL_CW_TRANSFER_DECODE`. Which makes their operation happen *before* phase
`CURL_CW_PROTOCOL` where length may be checked.
## Summary
By adding the common behavior of all protocols into `Curl_client_write()` we make sure that they do apply everywhere. Protocol handler have less to worry about. Changes to default behavior can be done without affecting handler implementations.
By adding the common behavior of all protocols into `Curl_client_write()` we
make sure that they do apply everywhere. Protocol handler have less to worry
about. Changes to default behavior can be done without affecting handler
implementations.
Having a writer chain as implementation allows protocol handlers with extra needs, like HTTP, to add to this for special behavior. The common way of writing the actual response data stays the same.
Having a writer chain as implementation allows protocol handlers with extra
needs, like HTTP, to add to this for special behavior. The common way of
writing the actual response data stays the same.

View File

@ -271,9 +271,14 @@ conn[curl.se] --> SETUP[TCP] --> HAPPY-EYEBALLS --> TCP[2a04:4e42:c00::347]:443
* transfer
```
The modular design of connection filters and that we can plug them into each other is used to control the parallel attempts. When a `TCP` filter does not connect (in time), it is torn down and another one is created for the next address. This keeps the `TCP` filter simple.
The modular design of connection filters and that we can plug them into each
other is used to control the parallel attempts. When a `TCP` filter does not
connect (in time), it is torn down and another one is created for the next
address. This keeps the `TCP` filter simple.
The `HAPPY-EYEBALLS` on the other hand stays focused on its side of the problem. We can use it also to make other type of connection by just giving it another filter type to try to have happy eyeballing for QUIC:
The `HAPPY-EYEBALLS` on the other hand stays focused on its side of the
problem. We can use it also to make other type of connection by just giving it
another filter type to try to have happy eyeballing for QUIC:
```
* create connection for --http3-only https://curl.se/

View File

@ -12,7 +12,6 @@ A plain "GET" subscribes to the topic and prints all published messages.
Doing a "POST" publishes the post data to the topic and exits.
### Subscribing
Command usage:
@ -26,9 +25,10 @@ Example subscribe:
This sends an MQTT SUBSCRIBE packet for the topic `bedroom/temp` and listen in
for incoming PUBLISH packets.
You can set the upkeep interval ms option to make curl send MQTT ping requests to the
server at an internal, to prevent the connection to get closed because of idleness.
You might then need to use the progress callback to cancel the operation.
You can set the upkeep interval ms option to make curl send MQTT ping requests
to the server at an internal, to prevent the connection to get closed because
of idleness. You might then need to use the progress callback to cancel the
operation.
### Publishing
@ -45,8 +45,8 @@ payload `75`.
## What does curl deliver as a response to a subscribe
Whenever a PUBLISH packet is received, curl outputs two bytes topic length (MSB | LSB), the topic followed by the
payload.
Whenever a PUBLISH packet is received, curl outputs two bytes topic length
(MSB | LSB), the topic followed by the payload.
## Caveats

View File

@ -12,14 +12,14 @@ curl/libcurl in a reproducible fashion to judge improvements or detect
regressions. They are not intended to represent real world scenarios
as such.
This script is not part of any official interface and we may
change it in the future according to the project's needs.
This script is not part of any official interface and we may change it in
the future according to the project's needs.
## setup
When you are able to run curl's `pytest` suite, scorecard should work
for you as well. They start a local Apache httpd or Caddy server and
invoke the locally build `src/curl` (by default).
When you are able to run curl's `pytest` suite, scorecard should work for you
as well. They start a local Apache httpd or Caddy server and invoke the
locally build `src/curl` (by default).
## invocation
@ -29,8 +29,9 @@ A typical invocation for measuring performance of HTTP/2 downloads would be:
curl> python3 tests/http/scorecard.py -d h2
```
and this prints a table with the results. The last argument is the protocol to test and
it can be `h1`, `h2` or `h3`. You can add `--json` to get results in JSON instead of text.
and this prints a table with the results. The last argument is the protocol to
test and it can be `h1`, `h2` or `h3`. You can add `--json` to get results in
JSON instead of text.
Help for all command line options are available via:
@ -40,11 +41,12 @@ curl> python3 tests/http/scorecard.py -h
## scenarios
Apart from `-d/--downloads` there is `-u/--uploads` and `-r/--requests`. These are run with
a variation of resource sizes and parallelism by default. You can specify these in some way
if you are just interested in a particular case.
Apart from `-d/--downloads` there is `-u/--uploads` and `-r/--requests`. These
are run with a variation of resource sizes and parallelism by default. You can
specify these in some way if you are just interested in a particular case.
For example, to run downloads of a 1 MB resource only, 100 times with at max 6 parallel transfers, use:
For example, to run downloads of a 1 MB resource only, 100 times with at max 6
parallel transfers, use:
```
curl> python3 tests/http/scorecard.py -d --download-sizes=1mb --download-count=100 --download-parallel=6 h2
@ -61,21 +63,28 @@ involved. (Note: this does not work for HTTP/3)
## flame graphs
With the excellent [Flame Graph](https://github.com/brendangregg/FlameGraph) by Brendan Gregg, scorecard can turn `perf`/`dtrace` samples into an interactive SVG. Either clone the `Flamegraph` repository next to your `curl` project or set the environment variable `FLAMEGRAPH` to the location of your clone. Then run scorecard with the `--flame` option, like
With the excellent [Flame Graph](https://github.com/brendangregg/FlameGraph)
by Brendan Gregg, scorecard can turn `perf`/`dtrace` samples into an
interactive SVG. Either clone the `Flamegraph` repository next to your `curl`
project or set the environment variable `FLAMEGRAPH` to the location of your
clone. Then run scorecard with the `--flame` option, like
```
curl> FLAMEGRAPH=/Users/sei/projects/FlameGraph python3 tests/http/scorecard.py \
-r --request-count=50000 --request-parallels=100 --samples=1 --flame h2
```
and the SVG of the run is in `tests/http/gen/curl/curl.flamegraph.svg`. You can open that in Firefox and zoom in/out of stacks of interest.
The flame graph is about the last run of `curl`. That is why you should add scorecard arguments
that restrict measurements to a single run.
and the SVG of the run is in `tests/http/gen/curl/curl.flamegraph.svg`. You
can open that in Firefox and zoom in/out of stacks of interest.
The flame graph is about the last run of `curl`. That is why you should add
scorecard arguments that restrict measurements to a single run.
### Measures/Privileges
The `--flame` option uses `perf` on linux and `dtrace` on macOS. Since both tools require special
privileges, they are run via the `sudo` command by scorecard. This means you need to issue a
`sudo` recently enough before running scorecard, so no new password check is needed.
The `--flame` option uses `perf` on linux and `dtrace` on macOS. Since both
tools require special privileges, they are run via the `sudo` command by
scorecard. This means you need to issue a `sudo` recently enough before
running scorecard, so no new password check is needed.
There is no support right now for measurements on other platforms.

View File

@ -68,7 +68,6 @@ peer keys for this reason.
a previous ticket, curl might trust a server which no longer has a root
certificate in the file.
## Session Cache Access
#### Lookups

View File

@ -6,27 +6,26 @@ SPDX-License-Identifier: curl
# `uint32_t` Sets
The multi handle tracks added easy handles via an `uint32_t`
it calls an `mid`. There are four data structures for `uint32_t`
optimized for the multi use case.
The multi handle tracks added easy handles via an `uint32_t` it calls an
`mid`. There are four data structures for `uint32_t` optimized for the multi
use case.
## `uint32_tbl`
`uint32_table`, implemented in `uint-table.[ch]` manages an array
of `void *`. The `uint32_t` is the index into this array. It is
created with a *capacity* which can be *resized*. The table assigns
the index when a `void *` is *added*. It keeps track of the last
assigned index and uses the next available larger index for a
subsequent add. Reaching *capacity* it wraps around.
`uint32_table`, implemented in `uint-table.[ch]` manages an array of `void *`.
The `uint32_t` is the index into this array. It is created with a *capacity*
which can be *resized*. The table assigns the index when a `void *` is
*added*. It keeps track of the last assigned index and uses the next available
larger index for a subsequent add. Reaching *capacity* it wraps around.
The table *can not* store `NULL` values. The largest possible index
is `UINT32_MAX - 1`.
The table is iterated over by asking for the *first* existing index,
meaning the smallest number that has an entry, if the table is not
empty. To get the *next* entry, one passes the index of the previous
iteration step. It does not matter if the previous index is still
in the table. Sample code for a table iteration would look like this:
The table is iterated over by asking for the *first* existing index, meaning
the smallest number that has an entry, if the table is not empty. To get
the *next* entry, one passes the index of the previous iteration step. It does
not matter if the previous index is still in the table. Sample code for a table
iteration would look like this:
```c
uint32_t int mid;
@ -50,46 +49,50 @@ This iteration has the following properties:
### Memory
For storing 1000 entries, the table would allocate one block of 8KB on a 64-bit system,
plus the 2 pointers and 3 `uint32_t` in its base `struct uint32_tbl`. A resize
allocates a completely new pointer array, copy the existing entries and free the previous one.
For storing 1000 entries, the table would allocate one block of 8KB on
a 64-bit system, plus the 2 pointers and 3 `uint32_t` in its base `struct
uint32_tbl`. A resize allocates a completely new pointer array, copy
the existing entries and free the previous one.
### Performance
Lookups of entries are only an index into the array, O(1) with a tiny 1. Adding
entries and iterations are more work:
Lookups of entries are only an index into the array, O(1) with a tiny 1.
Adding entries and iterations are more work:
1. adding an entry means "find the first free index larger than the previous assigned
one". Worst case for this is a table with only a single free index where `capacity - 1`
checks on `NULL` values would be performed, O(N). If the single free index is randomly
distributed, this would be O(N/2).
2. iterating a table scans for the first not `NULL` entry after the start index. This
makes a complete iteration O(N) work.
1. adding an entry means "find the first free index larger than the previous
assigned one". Worst case for this is a table with only a single free index
where `capacity - 1` checks on `NULL` values would be performed, O(N). If
the single free index is randomly distributed, this would be O(N/2).
2. iterating a table scans for the first not `NULL` entry after the start
index. This makes a complete iteration O(N) work.
In the multi use case, point 1 is remedied by growing the table so that a good chunk
of free entries always exists.
In the multi use case, point 1 is remedied by growing the table so that a good
chunk of free entries always exists.
Point 2 is less of an issue for a multi, since it does not really matter when the
number of transfer is relatively small. A multi managing a larger set needs to operate
event based anyway and table iterations rarely are needed.
Point 2 is less of an issue for a multi, since it does not really matter when
the number of transfer is relatively small. A multi managing a larger set
needs to operate event based anyway and table iterations rarely are needed.
For these reasons, the simple implementation was preferred. Should this become
a concern, there are options like "free index lists" or, alternatively, an internal
bitset that scans better.
a concern, there are options like "free index lists" or, alternatively, an
internal bitset that scans better.
## `uint32_bset`
A bitset for `uint32_t` values, allowing fast add/remove operations. It is initialized
with a *capacity*, meaning it can store only the numbers in the range `[0, capacity-1]`.
It can be *resized* and safely *iterated*. `uint32_bset` is designed to operate in combination with `uint_tbl`.
A bitset for `uint32_t` values, allowing fast add/remove operations. It is
initialized with a *capacity*, meaning it can store only the numbers in the
range `[0, capacity-1]`. It can be *resized* and safely *iterated*.
`uint32_bset` is designed to operate in combination with `uint_tbl`.
The bitset keeps an array of `uint64_t`. The first array entry keeps the numbers 0 to 63, the
second 64 to 127 and so on. A bitset with capacity 1024 would therefore allocate an array
of 16 64-bit values (128 bytes). Operations for an unsigned int divide it by 64 for the array index and then check/set/clear the bit of the remainder.
The bitset keeps an array of `uint64_t`. The first array entry keeps the
numbers 0 to 63, the second 64 to 127 and so on. A bitset with capacity 1024
would therefore allocate an array of 16 64-bit values (128 bytes). Operations
for an unsigned int divide it by 64 for the array index and then
check/set/clear the bit of the remainder.
Iterator works the same as with `uint32_tbl`: ask the bitset for the *first* number present and
then use that to get the *next* higher number present. Like the table, this safe for
adds/removes and growing the set while iterating.
Iterator works the same as with `uint32_tbl`: ask the bitset for the *first*
number present and then use that to get the *next* higher number present. Like
the table, this safe for adds/removes and growing the set while iterating.
### Memory
@ -98,31 +101,36 @@ A bitset for 40000 transfers occupies 5KB of memory.
## Performance
Operations for add/remove/check are O(1). Iteration needs to scan for the next bit set. The
number of scans is small (see memory footprint) and, for checking bits, many compilers
offer primitives for special CPU instructions.
Operations for add/remove/check are O(1). Iteration needs to scan for the next
bit set. The number of scans is small (see memory footprint) and, for checking
bits, many compilers offer primitives for special CPU instructions.
## `uint32_spbset`
While the memory footprint of `uint32_bset` is good, it still needs 5KB to store the single number 40000. This
is not optimal when many are needed. For example, in event based processing, each socket needs to
keep track of the transfers involved. There are many sockets potentially, but each one mostly tracks
a single transfer or few (on HTTP/2 connection borderline up to 100).
While the memory footprint of `uint32_bset` is good, it still needs 5KB to
store the single number 40000. This is not optimal when many are needed. For
example, in event based processing, each socket needs to keep track of the
transfers involved. There are many sockets potentially, but each one mostly
tracks a single transfer or few (on HTTP/2 connection borderline up to 100).
For such uses cases, the `uint32_spbset` is intended: track a small number of unsigned int, potentially
rather "close" together. It keeps "chunks" with an offset and has no capacity limit.
For such uses cases, the `uint32_spbset` is intended: track a small number of
unsigned int, potentially rather "close" together. It keeps "chunks" with an
offset and has no capacity limit.
Example: adding the number 40000 to an empty sparse bitset would have one chunk with offset 39936, keeping
track of the numbers 39936 to 40192 (a chunk has 4 64-bit values). The numbers in that range can be handled
without further allocations.
Example: adding the number 40000 to an empty sparse bitset would have one
chunk with offset 39936, keeping track of the numbers 39936 to 40192 (a chunk
has 4 64-bit values). The numbers in that range can be handled without further
allocations.
The worst case is then storing 100 numbers that lie in separate intervals. Then 100 chunks
would need to be allocated and linked, resulting in overall 4 KB of memory used.
The worst case is then storing 100 numbers that lie in separate intervals.
Then 100 chunks would need to be allocated and linked, resulting in overall 4
KB of memory used.
Iterating a sparse bitset works the same as for bitset and table.
## `uint32_hash`
At last, there are places in libcurl such as the HTTP/2 and HTTP/3 protocol implementations that need
to store their own data related to a transfer. `uint32_hash` allows then to associate an unsigned int,
e.g. the transfer's `mid`, to their own data.
At last, there are places in libcurl such as the HTTP/2 and HTTP/3 protocol
implementations that need to store their own data related to a transfer.
`uint32_hash` allows then to associate an unsigned int, e.g. the transfer's
`mid`, to their own data.

View File

@ -6,7 +6,9 @@ SPDX-License-Identifier: curl
# The curl HTTP Test Suite
This is an additional test suite using a combination of Apache httpd and nghttpx servers to perform various tests beyond the capabilities of the standard curl test suite.
This is an additional test suite using a combination of Apache httpd and
nghttpx servers to perform various tests beyond the capabilities of the
standard curl test suite.
# Usage
@ -23,13 +25,17 @@ collected 5 items
tests/http/test_01_basic.py .....
```
Pytest takes arguments. `-v` increases its verbosity and can be used several times. `-k <expr>` can be used to run only matching test cases. The `expr` can be something resembling a python test or just a string that needs to match test cases in their names.
Pytest takes arguments. `-v` increases its verbosity and can be used several
times. `-k <expr>` can be used to run only matching test cases. The `expr` can
be something resembling a python test or just a string that needs to match
test cases in their names.
```
curl/tests/http> pytest -vv -k test_01_02
```
runs all test cases that have `test_01_02` in their name. This does not have to be the start of the name.
runs all test cases that have `test_01_02` in their name. This does not have
to be the start of the name.
Depending on your setup, some test cases may be skipped and appear as `s` in
the output. If you run pytest verbose, it also gives you the reason for
@ -40,7 +46,8 @@ skipping.
You need:
1. a recent Python, the `cryptography` module and, of course, `pytest`
2. an apache httpd development version. On Debian/Ubuntu, the package `apache2-dev` has this
2. an apache httpd development version. On Debian/Ubuntu, the package
`apache2-dev` has this
3. a local `curl` project build
3. optionally, a `nghttpx` with HTTP/3 enabled or h3 test cases are skipped
@ -48,33 +55,49 @@ You need:
Via curl's `configure` script you may specify:
* `--with-test-nghttpx=<path-of-nghttpx>` if you have nghttpx to use somewhere outside your `$PATH`.
* `--with-test-httpd=<httpd-install-path>` if you have an Apache httpd installed somewhere else. On Debian/Ubuntu it will otherwise look into `/usr/bin` and `/usr/sbin` to find those.
* `--with-test-caddy=<caddy-install-path>` if you have a Caddy web server installed somewhere else.
* `--with-test-vsftpd=<vsftpd-install-path>` if you have a vsftpd ftp server installed somewhere else.
* `--with-test-nghttpx=<path-of-nghttpx>` if you have nghttpx to use
somewhere outside your `$PATH`.
* `--with-test-httpd=<httpd-install-path>` if you have an Apache httpd
installed somewhere else. On Debian/Ubuntu it otherwise looks into
`/usr/bin` and `/usr/sbin` to find those.
* `--with-test-caddy=<caddy-install-path>` if you have a Caddy web server
installed somewhere else.
* `--with-test-vsftpd=<vsftpd-install-path>` if you have a vsftpd ftp
server installed somewhere else.
* `--with-test-danted=<danted-path>` if you have `dante-server` installed
## Usage Tips
Several test cases are parameterized, for example with the HTTP version to use. If you want to run a test with a particular protocol only, use a command line like:
Several test cases are parameterized, for example with the HTTP version to
use. If you want to run a test with a particular protocol only, use a command
line like:
```
curl/tests/http> pytest -k "test_02_06 and h2"
```
Test cases can be repeated, with the `pytest-repeat` module (`pip install pytest-repeat`). Like in:
Test cases can be repeated, with the `pytest-repeat` module (`pip install
pytest-repeat`). Like in:
```
curl/tests/http> pytest -k "test_02_06 and h2" --count=100
```
which then runs this test case a hundred times. In case of flaky tests, you can make pytest stop on the first one with:
which then runs this test case a hundred times. In case of flaky tests, you
can make pytest stop on the first one with:
```
curl/tests/http> pytest -k "test_02_06 and h2" --count=100 --maxfail=1
```
which allow you to inspect output and log files for the failed run. Speaking of log files, the verbosity of pytest is also used to collect curl trace output. If you specify `-v` three times, the `curl` command is started with `--trace`:
which allow you to inspect output and log files for the failed run. Speaking
of log files, the verbosity of pytest is also used to collect curl trace
output. If you specify `-v` three times, the `curl` command is started with
`--trace`:
```
curl/tests/http> pytest -vvv -k "test_02_06 and h2" --count=100 --maxfail=1
@ -84,7 +107,10 @@ all of curl's output and trace file are found in `tests/http/gen/curl`.
## Writing Tests
There is a lot of [`pytest` documentation](https://docs.pytest.org/) with examples. No use in repeating that here. Assuming you are somewhat familiar with it, it is useful how *this* general test suite is setup. Especially if you want to add test cases.
There is a lot of [`pytest` documentation](https://docs.pytest.org/) with
examples. No use in repeating that here. Assuming you are somewhat familiar
with it, it is useful how *this* general test suite is setup. Especially if
you want to add test cases.
### Servers
@ -110,22 +136,44 @@ left behind.
### Test Cases
Tests making use of these fixtures have them in their parameter list. This tells pytest that a particular test needs them, so it has to create them. Since one can invoke pytest for just a single test, it is important that a test references the ones it needs.
Tests making use of these fixtures have them in their parameter list. This
tells pytest that a particular test needs them, so it has to create them.
Since one can invoke pytest for just a single test, it is important that a
test references the ones it needs.
All test cases start with `test_` in their name. We use a double number scheme to group them. This makes it ease to run only specific tests and also give a short mnemonic to communicate trouble with others in the project. Otherwise you are free to name test cases as you think fitting.
All test cases start with `test_` in their name. We use a double number scheme
to group them. This makes it ease to run only specific tests and also give a
short mnemonic to communicate trouble with others in the project. Otherwise
you are free to name test cases as you think fitting.
Tests are grouped thematically in a file with a single Python test class. This is convenient if you need a special "fixture" for several tests. "fixtures" can have "class" scope.
Tests are grouped thematically in a file with a single Python test class. This
is convenient if you need a special "fixture" for several tests. "fixtures"
can have "class" scope.
There is a curl helper class that knows how to invoke curl and interpret its output. Among other things, it does add the local CA to the command line, so that SSL connections to the test servers are verified. Nothing prevents anyone from running curl directly, for specific uses not covered by the `CurlClient` class.
There is a curl helper class that knows how to invoke curl and interpret its
output. Among other things, it does add the local CA to the command line, so
that SSL connections to the test servers are verified. Nothing prevents anyone
from running curl directly, for specific uses not covered by the `CurlClient`
class.
### mod_curltest
The module source code is found in `testenv/mod_curltest`. It is compiled using the `apxs` command, commonly provided via the `apache2-dev` package. Compilation is quick and done once at the start of a test run.
The module source code is found in `testenv/mod_curltest`. It is compiled
using the `apxs` command, commonly provided via the `apache2-dev` package.
Compilation is quick and done once at the start of a test run.
The module adds 2 "handlers" to the Apache server (right now). Handler are pieces of code that receive HTTP requests and generate the response. Those handlers are:
The module adds 2 "handlers" to the Apache server (right now). Handler are
pieces of code that receive HTTP requests and generate the response. Those
handlers are:
* `curltest-echo`: hooked up on the path `/curltest/echo`. This one echoes
a request and copies all data from the request body to the response body.
Useful for simulating upload and checking that the data arrived as intended.
* `curltest-tweak`: hooked up on the path `/curltest/tweak`. This handler is
more of a Swiss army knife. It interprets parameters from the URL query
string to drive its behavior.
* `curltest-echo`: hooked up on the path `/curltest/echo`. This one echoes a request and copies all data from the request body to the response body. Useful for simulating upload and checking that the data arrived as intended.
* `curltest-tweak`: hooked up on the path `/curltest/tweak`. This handler is more of a Swiss army knife. It interprets parameters from the URL query string to drive its behavior.
* `status=nnn`: generate a response with HTTP status code `nnn`.
* `chunks=n`: generate `n` chunks of data in the response body, defaults to 3.
* `chunk_size=nnn`: each chunk should contain `nnn` bytes of data. Maximum is 16KB right now.