- utils_math.c:136:20: error: 'UINT_MAX' undeclared (first use in this function)
- tc_core.c:51:22: error: 'UINT_MAX' undeclared (first use in this function)
Signed-off-by: Akhilesh Nema <nemaakhilesh@gmail.com>
Modify tc police to permit burst sizes up to the limit of the
kernel API, which may exceed 4 GB of burst size at higher rates.
As presently implemented, the tc police burst option limits the
size of the burst to 4 GB in size. This is a reasonable limit for the
rates common when this was developed. However, the underlying
implementation of burst is expressed in terms of time at the specified
rate, and for higher rates, a burst size exceeding 4 GB is feasible
without modification to the kernel.
The kernel API specifies the burst size as the number of "psched
ticks" needed to send the burst at the specified rate. As each psched
tick is 64 nsec, the actual kernel limit on burst size is approximately
274.88 seconds (UINT_MAX * 64 / NSEC_PER_SEC).
For example, at a rate of 10 Gbit/sec, the current 4 GB size limit
corresponds to just under 3.5 seconds.
Additionally, overflows (burst values that exceed UINT_MAX psched
ticks) are now correctly detected, and flagged as an error, rather than
passing arbitrary psched tick values to the kernel.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In preparation for accepting 64-bit burst sizes, modify
tc_calc_xmittime and tc_calc_xmitsize to handle 64-bit values.
tc_calc_xmittime continues to return a 32-bit value, as its range
is limited by the kernel API, but overflow is now detected and the return
value is limited to UINT_MAX.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In preparation for accepting 64 bit burst sizes, create 64-bit
versions of get_size and get_size_and_cell. The 32-bit versions become
wrappers around the 64-bit versions.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
When build with -DDEBUG, tc build fails with:
q_gred.c: In function ‘init_gred’:
q_gred.c:53:17: error: passing argument 2 of ‘fprintf’ from incompatible pointer type [-Wincompatible-pointer-types]
53 | DPRINTF(stderr, "init_gred: invoked with %s\n", *argv);
| ^~~~~~~
| |
| FILE *
This is due to the DPRINTF macro call. Indeed DPRINTF is defined as a
two-args macro when -DDEBUG is used, while it uses 3 args in this call.
Fix it simply dropping the useless first arg.
Fixes: aba5acdfdb34 ("(Logical change 1.3)")
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
DUALPI2 AQM is a combination of the DUALQ Coupled-AQM with a PI2
base-AQM. The PI2 AQM is in turn both an extension and a simplification
of the PIE AQM. PI2 makes quite some PIE heuristics unnecessary, while
being able to control scalable congestion controls like TCP-Prague.
With PI2, both Reno/Cubic can be used in parallel with Prague,
maintaining window fairness. DUALQ provides latency separation between
low latency Prague flows and Reno/Cubic flows that need a bigger queue.
This patch adds support to tc to configure it through its netlink
interface.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Olga Albisser <olga@albisser.org>
Signed-off-by: Olga Albisser <olga@albisser.org>
Co-developed-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Co-developed-by: Oliver Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Oliver Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Bob Briscoe <research@bobbriscoe.net>
Co-developed-by: Henrik Steen <henrist@henrist.net>
Signed-off-by: Henrik Steen <henrist@henrist.net>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Currently, NEXT_ARG() is called twice resulting in the first
weight being skipped. This results in the following errors:
$ sudo tc qdisc replace dev enP64183s1 root fq weights 589824 196608 65536
Not enough elements in weights
$ sudo tc qdisc replace dev enP64183s1 root fq weights 589824 196608 65536 nopacing
Illegal "weights" element, positive number expected
Fixes: 567eb4e41045 ("tc: fq: add TCA_FQ_WEIGHTS handling")
Signed-off-by: Hemanth Malla <vmalla@microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
As a preparatory step for supporting the NO_COLOR environment
variable, replace the direct use of CONF_COLOR with a
default_color_opt() function which initially returns CONF_COLOR.
Signed-off-by: Ben Hutchings <benh@debian.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
In print_nat the mask length is calculated as
len = ffs(sel->mask);
len = len ? 33 - len : 0;
The mask is stored in network byte order, it should be converted
to host byte order before calculating first bit set.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In parse_nat_args the network mask is calculated as
sel->mask = htonl(~0u << (32 - addr.bitlen));
According to ISO/IEC 9899:TC3 6.5.7 Bitwise shift operators:
The integer promotions are performed on each of the operands.
The type of the result is that of the promoted left operand.
If the value of the right operand is negative or is greater
than or equal to the width of the promoted left operand,
the behavior is undefined
Specifically this means that the mask is undefined for
addr.bitlen = 0
On x86_64 the result is 0xffffffff, on armv7l it is 0.
Promoting the left operand of the shift operator solves this issue.
Signed-off-by: Torben Nielsen <torben.nielsen@prevas.dk>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Currently, tc_calc_xmittime and tc_calc_xmitsize round from double to
int three times — once when they call tc_core_time2tick /
tc_core_tick2time (whose argument is int), once when those functions
return (their return value is int), and then finally when the tc_calc_*
functions return. This leads to extremely granular and inaccurate
conversions.
As a result, for example, on my test system (where tick_in_usec=15.625,
clock_factor=1, and hz=1000000000) for a bitrate of 1Gbps, all tc htb
burst values between 0 and 999 bytes get encoded as 0 ticks; all values
between 1000 and 1999 bytes get encoded as 15 ticks (equivalent to 960
bytes); all values between 2000 and 2999 bytes as 31 ticks (1984 bytes);
etc.
The patch changes the code so these calculations are done internally in
floating-point, and only rounded to integer values when the value is
returned. It also changes tc_calc_xmittime to round its calculated value
up, rather than down, to ensure that the calculated time is actually
sufficient for the requested size.
Signed-off-by: Jonathan Lennox <jonathan.lennox@8x8.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The netlink nest that carriers tc action statistics looks as follows:
[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_BASIC_HW]
Where 'TCA_STATS_BASIC' carries the combined software and hardware
packets (32-bits) and bytes (64-bit) counters and 'TCA_STATS_BASIC_HW'
carries the hardware statistics.
When the number of packets exceeds 0xffffffff, the kernel emits the
'TCA_STATS_PKT64' attribute:
[TCA_ACT_STATS]
[TCA_STATS_BASIC]
[TCA_STATS_PKT64]
[TCA_STATS_BASIC_HW]
[TCA_STATS_PKT64]
This layout is not ideal as the only way for user space to know what
each 'TCA_STATS_PKT64' attribute carries is to check which attribute
precedes it, which is exactly what some applications are doing [1].
Do the same in iproute2 so that users with existing kernels could read
the 64-bit hardware packets counter of tc actions instead of reading the
truncated 32-bit counter.
Before:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 47 sec used 23 sec
Action statistics:
Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 368689092544 bytes 1465799775 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
Where 5760767071 - 1465799775 = 0x100000000
After:
$ tc -s filter show dev swp2 ingress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device swp1) stolen
index 1 ref 1 bind 1 installed 71 sec used 47 sec
Action statistics:
Sent 368689092544 bytes 5760767071 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 368689092544 bytes 5760767071 pkt
backlog 0b 0p requeues 0
used_hw_stats immediate
[1] 006e1c6dbf
Reported-by: Joe Botha <joe@atomic.ac>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
In linux-6.13, we added the ability to offload pacing on
capable devices.
tc qdisc add ... fq ... offload_horizon 100ms
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The flower tc parser was using XATTR_SIZE_MAX from linux/limits.h,
but this constant is intended to before extended filesystem attributes
not for TC. Replace it with a local define.
This fixes issue on systems with musl and XATTR_SIZE_MAX is not
defined in limits.h there.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This commit enhances the q_taprio module by adding support for the
Hold/Release mechanism in Time-Sensitive Networking (TSN), as specified
in the IEEE 802.1Q-2018 standard.
Changes include:
- Addition of `TC_TAPRIO_CMD_SET_AND_HOLD` and `TC_TAPRIO_CMD_SET_AND_RELEASE`
cases in the `entry_cmd_to_str` function to return "H" and "R" respectively.
- Addition of corresponding string comparisons in the `str_to_entry_cmd`
function to map "H" and "R" to `TC_TAPRIO_CMD_SET_AND_HOLD` and
`TC_TAPRIO_CMD_SET_AND_RELEASE`.
The Hold/Release feature works as follows:
- Set-And-Hold-MAC (H): This command sets the gates and holds the current
configuration, preventing any further changes until a release command is
issued.
- Set-And-Release-MAC (R): This command releases the hold, allowing
subsequent gate configuration changes to take effect.
These changes ensure that the q_taprio module can correctly interpret and
handle the Hold/Release commands, aligning with the IEEE 802.1Q-2018 standard
for enhanced TSN configuration.
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
`print_num()` was born in `ip/ipaddress.c` but considering it has
nothing to do with IP addresses it should really live in `lib/utils.c`.
(I've had reason to call it from bridge/* on some random hackery.)
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: David Ahern <dsahern@kernel.org>
extend TC flower for matching on tunnel metadata.
Changes since v2:
- split uAPI changes and TC code in separate patches, as per David's request [2]
Changes since v1:
- fix incostintent naming in explain() and in tc-flower.8 (Asbjørn)
Changes since RFC:
- update uAPI bits to Asbjørn's most recent code [1]
- add 'tun' prefix to all flag names (Asbjørn)
- allow parsing 'enc_flags' multiple times, without clearing the match
mask every time, like happens for 'ip_flags' (Asbjørn)
- don't use "matches()" for parsing argv[] (Stephen)
- (hopefully) improve usage() printout (Asbjørn)
- update man page
[1] https://lore.kernel.org/netdev/20240709163825.1210046-1-ast@fiberby.net/
[2] https://lore.kernel.org/netdev/cc73004c-9aa8-9cd3-b46e-443c0727c34d@kernel.org/
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Expression 'ttl & ~(255 >> 0)' is always zero, because right operand
has 8 trailing zero bits, which is greater or equal than the size
of the left operand == 8 bits.
Found by RASU JSC.
Signed-off-by: Maks Mishin <maks.mishinFZ@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There is a helper in utilities to handle missing argument,
but it was not being used consistently.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Allow adding tc filter for PFCP header.
Add support for parsing TCA_FLOWER_KEY_ENC_OPTS_PFCP.
Options are as follows: TYPE:SEID.
TYPE is a 8-bit value represented in hex and can be 1
for session header and 0 for node header. In PFCP packet
this is S flag in header.
SEID is a 64-bit session id value represented in hex.
This patch enables adding hardware filters using PFCP fields, see [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d823265dd45bbf14bd67aa476057108feb4143ce
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
The function doesn't use the FILE handle.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
The pretty printing routines no longer use the file handle.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
The removal of tick usage in netem, means that some of the
helper functions in tc are no longer used and can be safely removed.
Other functions can be made static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The current version of netem in iproute2 has a maximum of 4.3 seconds
because of scaled 32 bit clock values. Some users would like to be
able to use larger delays to emulate things like storage delays.
Since kernel version 4.15, netem qdisc had netlink parameters
to express wider range of delays in nanoseconds. But the iproute2
side was never updated to use them.
This does break compatibility with older kernels (4.14 and earlier).
With these out of support kernels, the latency/delay parameter
will end up being ignored.
Reported-by: Marc Blanchet <marc.blanchet@viagenie.ca>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in exec_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in action_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in filter_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The callbacks in qdisc_util should not be modifying underlying
qdisc operations structure.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix json corruption when using the "-json" option in some cases
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
In the case of a process such as mapping a json to a structure,
it can be difficult if the keys have the same name but different types.
Since handle is used in hex string, change it to fw.
Signed-off-by: Takanori Hirano <me@hrntknr.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add assertion to check for case of snprintf failing (bad format?)
or buffer getting full.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>