iproute2

mirror of https://git.kernel.org/pub/scm/network/iproute2/iproute2.git synced 2026-01-26 22:22:18 +00:00

Author	SHA1	Message	Date
Ido Schimmel	f2b24087e1	devlink: Fix resource show output When the user asks to show device resources, devlink first queries the device's dpipe tables so that it will be able to show the association between resources and dpipe tables. In this flow, 'ctx->resources' is always NULL as resources have yet to be retrieved. As a result, the dpipe tables are not associated with a resource identifier and the resource show command does not show any dpipe tables: $ devlink resource show pci/0000:03:00.0 pci/0000:03:00.0: name kvd size 258048 unit entry dpipe_tables none resources: name linear size 98304 occ 1 unit entry size_min 0 size_max 159744 size_gran 128 dpipe_tables none resources: name singles size 16384 occ 1 unit entry size_min 0 size_max 159744 size_gran 1 dpipe_tables none name chunks size 49152 occ 0 unit entry size_min 0 size_max 159744 size_gran 32 dpipe_tables none name large_chunks size 32768 occ 0 unit entry size_min 0 size_max 159744 size_gran 512 dpipe_tables none name hash_double size 65408 unit entry size_min 32768 size_max 192512 size_gran 128 dpipe_tables none name hash_single size 94336 unit entry size_min 65536 size_max 225280 size_gran 128 dpipe_tables none name span_agents size 3 occ 0 unit entry dpipe_tables none name counters size 32766 occ 4 unit entry dpipe_tables none resources: name rif size 8192 occ 0 unit entry dpipe_tables none name flow size 24574 occ 4 unit entry dpipe_tables none name global_policers size 1000 unit entry dpipe_tables none resources: name single_rate_policers size 968 occ 0 unit entry dpipe_tables none name rif_mac_profiles size 1 occ 0 unit entry dpipe_tables none name rifs size 1000 occ 1 unit entry dpipe_tables none name port_range_registers size 16 occ 0 unit entry dpipe_tables none name physical_ports size 64 occ 32 unit entry dpipe_tables none Fix by moving the check against 'ctx->resources' to the place where it is actually used. Output after the fix: $ devlink resource show pci/0000:03:00.0 pci/0000:03:00.0: name kvd size 258048 unit entry dpipe_tables none resources: name linear size 98304 occ 1 unit entry size_min 0 size_max 159744 size_gran 128 dpipe_tables: table_name mlxsw_adj resources: name singles size 16384 occ 1 unit entry size_min 0 size_max 159744 size_gran 1 dpipe_tables none name chunks size 49152 occ 0 unit entry size_min 0 size_max 159744 size_gran 32 dpipe_tables none name large_chunks size 32768 occ 0 unit entry size_min 0 size_max 159744 size_gran 512 dpipe_tables none name hash_double size 65408 unit entry size_min 32768 size_max 192512 size_gran 128 dpipe_tables: table_name mlxsw_host6 name hash_single size 94336 unit entry size_min 65536 size_max 225280 size_gran 128 dpipe_tables: table_name mlxsw_host4 name span_agents size 3 occ 0 unit entry dpipe_tables none name counters size 32766 occ 4 unit entry dpipe_tables none resources: name rif size 8192 occ 0 unit entry dpipe_tables none name flow size 24574 occ 4 unit entry dpipe_tables none name global_policers size 1000 unit entry dpipe_tables none resources: name single_rate_policers size 968 occ 0 unit entry dpipe_tables none name rif_mac_profiles size 1 occ 0 unit entry dpipe_tables none name rifs size 1000 occ 1 unit entry dpipe_tables none name port_range_registers size 16 occ 0 unit entry dpipe_tables none name physical_ports size 64 occ 32 unit entry dpipe_tables none Fixes: 0e7e1819453c ("devlink: relax dpipe table show dependency on resources") Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2026-01-20 15:27:47 -08:00
Petr Oros	42f2f219c6	lib: Add str_to_bool helper function Add str_to_bool() helper function to lib/utils.c that uses parse_one_of() to parse boolean values. Update devlink to use this common implementation. Signed-off-by: Petr Oros <poros@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2025-11-21 09:10:22 -07:00
Petr Oros	0d61015ba9	lib: Move mnlg to lib for shared use Move mnlg.c to lib/ and mnlg.h to include/ to allow code reuse across multiple tools. Signed-off-by: Petr Oros <poros@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2025-11-21 09:10:22 -07:00
Saeed Mahameed	da3525408f	devlink: Support DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE Add support for the new inactive switchdev mode [1]. A user can start the eswitch in switchdev or switchdev_inactive mode. Active: Traffic is enabled on this eswitch FDB. Inactive: Traffic is ignored/dropped on this eswitch FDB. An example use case: $ devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive Setup FDB pipeline and netdev representors ... Once ready to start receiving traffic $ devlink dev eswitch set pci/0000:08:00.1 mode switchdev [1] https://lore.kernel.org/all/20251107000831.157375-1-saeed@kernel.org/ Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2025-11-14 18:17:04 -07:00
Ivan Vecera	cacba59f9a	devlink: Add support for 64bit parameters Kernel commit c0ef144695910 ("devlink: Add support for u64 parameters") added support for 64bit devlink parameters, add the support for them also into devlink utility userspace counterpart. Tested on Microchip EDS2 development board... Prior patch: root@eds2:~# devlink dev param set i2c/1-0070 name clock_id value 1234 cmode driverinit Value type not supported root@eds2:~# After patch: root@eds2:~# devlink dev param set i2c/1-0070 name clock_id value 1234 cmode driverinit root@eds2:~# Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2025-11-03 09:52:18 -07:00
David Ahern	0df5ebb38a	Merge remote-tracking branch 'main' into next Signed-off-by: David Ahern <dsahern@kernel.org>	2025-11-03 09:44:50 -07:00
Ivan Vecera	ab785eff0b	devlink: fix devlink flash error reporting Currently, devlink silently exits when a non-existent device is specified for flashing or when the user lacks sufficient permissions. This makes it hard to diagnose the problem. Print an appropriate error message in these cases to improve user feedback. Prior: $ devlink dev flash foo/bar file test $ sudo devlink dev flash foo/bar file test $ After patch: $ devlink/devlink dev flash foo/bar file test devlink answers: Operation not permitted $ sudo devlink/devlink dev flash foo/bar file test devlink answers: No such device Fixes: 9b13cddfe268 ("devlink: implement flash status monitoring") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2025-10-29 17:02:36 -07:00
Shahar Shitrit	56d3c99658	devlink: Introduce burst period for health reporter Add a new devlink health set option to configure the health reporter’s burst period. The burst period defines a time window during which recovery attempts for reported errors are allowed. Once this period expires, the configured grace period begins. This feature addresses cases where multiple errors occur simultaneously due to a common root cause. Without a burst period, the grace period starts immediately after the first error recovery attempt finishes. This means that only the first error might be recovered, while subsequent errors are blocked during the grace period. With the burst period, the reporter initiates a recovery attempt for every error reported within this time window before the grace period starts. Example: $ devlink health set pci/0000:00:09.0 reporter tx burst_period 500 Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2025-10-16 09:26:34 -06:00
Carolina Jubran	1a60e903d9	devlink: Update TC bandwidth parsing Kernel commit 1bbdb81a9836 ("devlink: Fix excessive stack usage in rate TC bandwidth parsing") introduced a dedicated attribute set (DEVLINK_RATE_TC_ATTR_*) for entries nested under DEVLINK_ATTR_RATE_TC_BWS. Update the parser to reflect this change by validating the nested attributes and sync the UAPI header to include the changes. Fixes: c83d1477f8b2 ("Add support for 'tc-bw' attribute in devlink-rate") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2025-07-28 16:42:40 +00:00
Carolina Jubran	c83d1477f8	Add support for 'tc-bw' attribute in devlink-rate Introduce a new attribute 'tc-bw' to devlink-rate, allowing users to set the bandwidth allocation per traffic class. The new attribute enables fine-grained QoS configurations by assigning relative bandwidth shares to each traffic class, supporting more precise traffic shaping, which helps in achieving more precise bandwidth management across traffic streams. Add support for configuring 'tc-bw' via the devlink userspace utility and parse the 'tc-bw' arguments for accurate bandwidth assignment per traffic class. This feature supports 8 traffic classes as defined by the IEEE 802.1Qaz standard. Example commands: - devlink port function rate add pci/0000:08:00.0/group \ tx_share 10Gbit tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0 - devlink port function rate set pci/0000:08:00.0/group \ tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0 Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2025-07-11 16:41:34 +00:00
Saeed Mahameed	413cf4f03a	devlink: use the correct handle flag for port param show Port param show command arg parser used the devlink dev flag instead of the port, which caused to not identify the port device argument, causing the following error: $ devlink port param show eth0 name link_type Wrong identification string format. Devlink identification ("bus_name/dev_name") expected Use the correct the devlink handle flag. Fixes: 70faecdca8f5 ("devlink: implement dump selector for devlink objects show commands") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2024-12-02 11:19:26 -08:00
Jiri Pirko	8c4918859e	devlink: do dry parse for extended handle with selector When parsing with selector, there's a list of extended handles (devname/busname/x) which require special treatment. DL_OPT_HANDLEP is one of them. The code tries to parse devname/busname handle and in case it is successful, it goes the "dump" way. However if it's not, parsing is directly done. That is wrong, as the options may still be incomplete. Do break in that case instead allowing to do dry parse and possibly go the "dump" way in case the option list is not complete. Fixes: 70faecdca8f5 ("devlink: implement dump selector for devlink objects show commands") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2024-12-02 11:19:26 -08:00
Minhong He	dc283e7b79	devlink: fix memory leak in ifname_map_rtnl_init() When the return value of rtnl_talk() is greater than or equal to 0, 'answer' will be allocated. The 'answer' should be free after using, otherwise it will cause memory leak. Signed-off-by: Minhong He <heminhong@kylinos.cn> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2024-11-27 19:58:54 -08:00
David Ahern	af9559b233	Merge remote-tracking branch 'main/main' into next Signed-off-by: David Ahern <dsahern@kernel.org>	2024-07-08 22:36:13 +00:00
Przemek Kitszel	77241a525b	devlink: print missing params even if an unknown one is present Print all of the missing parameters, also in the presence of unknown ones. Take for example a correct command: $ devlink resource set pci/0000:01:00.0 path /kvd/linear size 98304 And remove the "size" keyword: $ devlink resource set pci/0000:01:00.0 path /kvd/linear 98304 That yields output: Resource size expected. Unknown option "98304" Prior to the patch only the last line of output was present. And if user would forgot also the "path" keyword, there will be additional line: Resource path expected. in the stderr. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2024-07-07 16:48:51 +00:00
Parav Pandit	c6c39f3c6d	devlink: Fix setting max_io_eqs as the sole attribute dl_opts_put() function missed to consider IO eqs option flag. Due to this, when max_io_eqs setting is applied only when it is combined with other attributes such as roce/hw_addr. When max_io_eqs is the only attribute set, it missed to apply the attribute. Fix it by adding the missing flag. Fixes: e8add23c59b7 ("devlink: Support setting max_io_eqs") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2024-06-11 08:05:24 -07:00
William Tu	459ddd094d	devlink: trivial: fix err format on max_io_eqs Add missing ']'. Signed-off-by: William Tu <witu@nvidia.com> Fixes: e8add23c59b7 ("devlink: Support setting max_io_eqs") Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2024-06-11 08:05:16 -07:00
Parav Pandit	e8add23c59	devlink: Support setting max_io_eqs Devices send event notifications for the IO queues, such as tx and rx queues, through event queues. Enable a privileged owner, such as a hypervisor PF, to set the number of IO event queues for the VF and SF during the provisioning stage. example: Get maximum IO event queues of the VF device:: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 Set maximum IO event queues of the VF device:: $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2024-04-13 16:34:21 +00:00
Stephen Hemminger	0c3400cc8f	spelling fixes Use codespell and ispell to fix some spelling errors in comments and README's. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2024-01-25 16:49:10 -08:00
Jiri Pirko	1ac0c4450f	devlink: print nested devlink handle for devlink dev Devlink dev may contain one or more nested devlink instances. Print them using previously introduced pr_out_nested_handle_obj() helper. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-11-11 17:31:40 +00:00
Jiri Pirko	3e90f377f4	devlink: print nested handle for port function If port function contains nested handle attribute, print it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-11-11 17:31:34 +00:00
Jiri Pirko	2ded9c18a3	devlink: introduce support for netns id for nested handle Nested handle may contain DEVLINK_ATTR_NETNS_ID attribute that indicates the network namespace where the nested devlink instance resides. Process this converting to netns name if possible and print to user. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-11-11 17:31:29 +00:00
Jiri Pirko	e98d5084f7	devlink: extend pr_out_nested_handle() to print object For existing pr_out_nested_handle() user (line card), the output stays the same. For the new users, introduce __pr_out_nested_handle() to allow to print devlink instance as object allowing to carry attributes in it (like netns). Note that as __pr_out_handle_start() and pr_out_handle_end() are newly used, the function is moved below the definitions. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-11-11 17:31:24 +00:00
Jiri Pirko	fb47796cd6	devlink: do conditional new line print in pr_out_port_handle_end() Instead of printing out new line unconditionally, use __pr_out_newline() to print it only when needed avoiding double prints. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-11-11 17:31:19 +00:00
Jiri Pirko	8265b39f0c	devlink: use snprintf instead of sprintf Use snprintf instead of sprintf to ensure only valid memory is printed to and the output string is properly terminated. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-11-11 17:31:13 +00:00
Dima Chumak	994e80e9c9	devlink: Support setting port function ipsec_packet cap Support port function commands to enable / disable IPsec packet offloads, this is used to control the port IPsec device capabilities. When IPsec packet capability is disabled for a function of the port (default), function cannot offload IPsec operation. When enabled, IPsec operation can be offloaded by the function of the port. Enabling IPsec packet offloads lets the kernel to delegate encrypt/decrypt operations, as well as encapsulation and SA/policy and state to the device hardware. Example of a PCI VF port which supports IPsec packet offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable ipsec_packet enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-10-04 09:23:11 -06:00
Dima Chumak	27fd1bfa1b	devlink: Support setting port function ipsec_crypto cap Support port function commands to enable / disable IPsec crypto offloads, this is used to control the port IPsec device capabilities. When IPsec crypto capability is disabled for a function of the port (default), function cannot offload IPsec operation. When enabled, IPsec operation can be offloaded by the function of the port. Enabling IPsec crypto offloads lets the kernel to delegate XFRM state processing and encrypt/decrypt operation to the device hardware. Example of a PCI VF port which supports IPsec crypto offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable $ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-10-04 09:23:03 -06:00
Jiri Pirko	70faecdca8	devlink: implement dump selector for devlink objects show commands Introduce a new helper dl_argv_parse_with_selector() to be used by show() functions instead of dl_argv(). Implement it to check if all needed options got get commands are specified. In case they are not, ask kernel for dump passing only the options (attributes) that are present, creating sort of partial key to instruct kernel to do partial dump. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-09-11 09:16:51 -06:00
Jiri Pirko	fd1c2af8cb	devlink: return -ENOENT if argument is missing In preparation to the follow-up dump selector patch, make sure that the command line arguments parsing function returns -ENOENT in case the option is missing so the caller can distinguish. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-09-11 09:16:39 -06:00
Jiri Pirko	8eb894eda6	devlink: implement command line args dry parsing In preparation to the follow-up dump selector patch, introduce function dl_argv_dry_parse() which allows to do dry parsing of command line arguments without printing out any error messages to the user. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-09-11 09:16:27 -06:00
Jiri Pirko	5d9f42124c	devlink: make parsing of handle non-destructive to argv Currently, handle parsing is destructive as the "\0" string ends are being put in certain positions during parsing. That prevents it from being used repeatedly. This is problematic with the follow-up patch implementing dry-parsing. Fix by making a copy of handle argv during parsing. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-09-11 09:16:16 -06:00
Jiri Pirko	158215c536	devlink: move DL_OPT_SB into required options This is basically a cosmetic change. The SB index is not required to be passed by user and implicitly index 0 is used. This is ensured by special treating at the end of dl_argv_parse(). Move this option from optional to required options. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-09-11 09:15:32 -06:00
David Ahern	ce67bbcccb	Merge remote-tracking branch 'main' into next Signed-off-by: David Ahern <dsahern@kernel.org>	2023-08-20 10:42:35 -06:00
Jiri Pirko	872148f54e	devlink: spell out STATE in devlink port function help Be in-sync with port help and port man page and spell out the possible states instead of "STATE". Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2023-08-19 09:13:36 -07:00
Jiri Pirko	27724f3cbb	devlink: accept "name" command line option instead of "trap"/"group" It is common for all iproute2 apps to have command line option names matching with show command outputs. However, that is not true in case of trap and trap group devlink objects. Correct would be to have "trap" and "group" in the outputs, but that is not possible to change now. Instead of that, accept "name" instead of "trap" and "group" options. Examples: $ devlink trap show netdevsim/netdevsim1 netdevsim/netdevsim1: name source_mac_is_multicast type drop generic true action drop group l2_drops name vlan_tag_mismatch type drop generic true action drop group l2_drops name ingress_vlan_filter type drop generic true action drop group l2_drops name ingress_spanning_tree_filter type drop generic true action drop group l2_drops name port_list_is_empty type drop generic true action drop group l2_drops name port_loopback_filter type drop generic true action drop group l2_drops name fid_miss type exception generic false action trap group l2_drops name blackhole_route type drop generic true action drop group l3_drops name ttl_value_is_too_small type exception generic true action trap group l3_exceptions name tail_drop type drop generic true action drop group buffer_drops name ingress_flow_action_drop type drop generic true action drop group acl_drops name egress_flow_action_drop type drop generic true action drop group acl_drops name igmp_query type control generic true action mirror group mc_snooping name igmp_v1_report type control generic true action trap group mc_snooping $ devlink trap show netdevsim/netdevsim1 trap source_mac_is_multicast netdevsim/netdevsim1: name source_mac_is_multicast type drop generic true action drop group l2_drops $ devlink trap show netdevsim/netdevsim1 name source_mac_is_multicast netdevsim/netdevsim1: name source_mac_is_multicast type drop generic true action drop group l2_drops $ devlink trap group netdevsim/netdevsim1: name l2_drops generic true name l3_drops generic true policer 1 name l3_exceptions generic true policer 1 name buffer_drops generic true policer 2 name acl_drops generic true policer 3 name mc_snooping generic true policer 3 $ devlink trap group show netdevsim/netdevsim1 group l2_drops netdevsim/netdevsim1: name l2_drops generic true $ devlink trap group show netdevsim/netdevsim1 name l2_drops name l2_drops generic true Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2023-08-13 10:22:12 -06:00
zhaoshuang	7e8cdfa2ea	iproute2: optimize code and fix some mem-leak risk Signed-off-by: zhaoshuang <izhaoshuang@163.com> Reviewed-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2023-05-11 14:15:12 -07:00
Ido Schimmel	b6f4a62ba7	devlink: Fix dumps where interface map is used The devlink utility stores an interface map that can be used to map an interface name to a devlink port and vice versa. The map is populated by issuing a devlink port dump via 'DEVLINK_CMD_PORT_GET' command. Cited commits started to populate the map only when it is actually needed. One such case is when a dump (e.g., shared buffer dump) only returns devlink port handles. When pretty printing is required, the utility will consult the map to translate the devlink port handles to the corresponding interface names. The above is problematic as it means that the port dump response(s) will be queued to the same receive buffer as the response(s) of the dump that triggered the port dump, resulting in a failed dump [1]. Fix by using a different netlink socket for the population of the interface map. [1] $ devlink sb tc bind show kernel answers: Device or resource busy Failed to create index map //0: sb 0 tc 4 type egress pool 4 threshold 9 kernel answers: Device or resource busy [...] $ echo $? 1 Fixes: 5cddbb274eab ("devlink: load port-ifname map on demand") Fixes: 63d84b1fc98d ("devlink: load ifname map on demand from ifname_map_rev_lookup() as well") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2023-04-27 08:53:27 -07:00
Stephen Hemminger	46686c563b	add space after keyword The style standard is to use space after keywords. Example: if (expr) verus if(expr) Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2023-01-16 09:18:58 -08:00
Stephen Hemminger	18a13ec516	devlink: use SPDX Add SPDX tag instead of GPL 2.0 or later boilerplate Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2023-01-14 09:00:34 -08:00
Jiri Pirko	1e994cf69c	devlink: fix mon json output for trap-policer There is a json footer missed for trap-policer output in "devlink mon". So add it and fix the json output. Fixes: a66af5569337 ("devlink: Add devlink trap policer set and show commands") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2022-12-19 08:35:48 -08:00
David Ahern	3715a146e8	Merge branch 'main' into next Conflicts: devlink/devlink.c Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-16 09:12:38 -07:00
Shay Drory	32168d8a88	devlink: Support setting port function migratable cap Suppor port function commands to enable / disable migratable capability, this is used to set the port function as migratable. Live migration is the process of transferring a live virtual machine from one physical host to another without disrupting its normal operation. In order for a VM to be able to perform LM, all the VM components must be able to perform migration. e.g.: to be migratable. In order for VF to be migratable, VF must be bound to VFIO driver with migration support. When migratable capability is enable for a function of the port, the device is making the necessary preparations for the function to be migratable, which might include disabling features which cannot be migrated. Example of LM with migratable function configuration: Set migratable of the VF's port function. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable disable $ devlink port function set pci/0000:06:00.0/2 migratable enable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable enable Bind VF to VFIO driver with migration support: $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind $ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind Attach VF to the VM. Start the VM. Perform LM. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-14 08:59:33 -07:00
Shay Drory	bb2eea918b	devlink: Support setting port function roce cap Support port function commands to enable / disable RoCE, this is used to control the port RoCE device capabilities. When RoCE is disabled for a function of the port, function cannot create any RoCE specific resources (e.g GID table). It also saves system memory utilization. For example disabling RoCE enable a VF/SF to save 1 Mbytes of system memory per function. Example of a PCI VF port which supports a port function: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce enabled $ devlink port function set pci/0000:06:00.0/2 roce disable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce disabled Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-14 08:55:10 -07:00
Jiri Pirko	42b27dfc6e	devlink: update ifname map when message contains DEVLINK_ATTR_PORT_NETDEV_NAME Recent kernels send PORT_NEW message with when ifname changes, so benefit from that by having ifnames updated. Whenever there is a message containing DEVLINK_ATTR_PORT_NETDEV_NAME attribute, use it to update ifname map. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-08 10:47:48 -07:00
Jiri Pirko	18ff3ccbc8	devlink: push common code to __pr_out_port_handle_start_tb() There is a common code in pr_out_port_handle_start() and pr_out_port_handle_start_arr(). As the next patch is going to extend it even more, push the code into common helper. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-08 10:47:44 -07:00
Jiri Pirko	d5ae4c3fdb	devlink: get devlink port for ifname using RTNL get link command Currently, when user specifies ifname as a handle on command line of devlink, the related devlink port is looked-up in previously taken dump of all devlink ports on the system. There are 3 problems with that: 1) The dump iterates over all devlink instances in kernel and takes a devlink instance lock for each. 2) Dumping all devlink ports would not scale. 3) Alternative ifnames are not exposed by devlink netlink interface. Instead, benefit from RTNL get link command extension and get the devlink port handle info from IFLA_DEVLINK_PORT attribute, if supported. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-08 10:47:37 -07:00
Jiri Pirko	f04f5e1d08	devlink: add ifname_map_add/del() helpers Add couple of helpers to alloc/free of map object alongside with list addition/removal. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-08 10:46:59 -07:00
Jacob Keller	a74e7181c6	devlink: support direct region read requests The kernel has gained support for reading from regions without needing to create a snapshot. To use this support, the DEVLINK_ATTR_REGION_DIRECT attribute must be added to the command. For the "read" command, if the user did not specify a snapshot, add the new attribute to request a direct read. The "dump" command will still require a snapshot. While technically a dump could be performed without a snapshot it is not guaranteed to be atomic unless the region size is no larger than 256 bytes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-08 09:21:10 -07:00
Michal Wilczynski	68db921a0a	devlink: Fix setting parent for 'rate add' Setting a parent during creation of the node doesn't work, despite documentation [1] clearly saying that it should. [1] man/man8/devlink-rate.8 Example: $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0 Unknown option "parent" Fix this by passing DL_OPT_PORT_FN_RATE_PARENT as an argument to dl_argv_parse() when it gets called from cmd_port_fn_rate_add(). Fixes: 6c70aca76ef2 ("devlink: Add port func rate support") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2022-12-04 09:56:35 -08:00
Michal Wilczynski	0f71480932	devlink: Introduce new attribute 'tx_weight' to devlink-rate To fully utilize hierarchical QoS algorithm new attribute 'tx_weight' needs to be introduced. Weight attribute allows for usage of Weighted Fair Queuing arbitration scheme among siblings. This arbitration scheme can be used simultaneously with the strict priority. Introduce ability to configure tx_weight from devlink userspace utility. Make the new attribute optional. Example commands: $ devlink port function rate add pci/0000:4b:00.0/node_custom \ tx_weight 50 parent node_0 $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 20 Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David Ahern <dsahern@kernel.org>	2022-12-03 10:47:06 -07:00

1 2 3 4 5 ...

275 Commits