During testing we noticed significant memory leak that is easily
reproducible and detectable with valgrind:
==2006284== 393,216 bytes in 12 blocks are definitely lost in loss record 5 of 5
==2006284== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2006284== by 0x18C73E: rtnl_recvmsg (libnetlink.c:830)
==2006284== by 0x18CF9E: __rtnl_talk_iov (libnetlink.c:1032)
==2006284== by 0x18D3CE: __rtnl_talk (libnetlink.c:1140)
==2006284== by 0x18D4DE: rtnl_talk (libnetlink.c:1168)
==2006284== by 0x11BF04: tc_filter_modify (tc_filter.c:224)
==2006284== by 0x11DD70: do_filter (tc_filter.c:748)
==2006284== by 0x116B06: do_cmd (tc.c:210)
==2006284== by 0x116C7C: tc_batch_cmd (tc.c:231)
==2006284== by 0x1796F2: do_batch (utils.c:1701)
==2006284== by 0x116D05: batch (tc.c:246)
==2006284== by 0x117327: main (tc.c:331)
==2006284==
==2006284== LEAK SUMMARY:
==2006284== definitely lost: 884,736 bytes in 27 blocks
In case nlmsg_type == NLMSG_ERROR and if answer set to NULL, we
should free(buf) too.
Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
If `__rtnl_talk_iov` fails then callers are not expected to free `answer`.
Currently if `NLMSG_ERROR` was received with an error then the netlink
buffer was stored in `answer`, while still returning an error
This leak can be observed by running this snippet over time.
This triggers an `NLMSG_ERROR` because for each neighbour update, `ip`
will try to query for the name of interface 9999 in the wrong netns.
(which in itself is a separate bug)
set -e
ip netns del test-a || true
ip netns add test-a
ip netns del test-b || true
ip netns add test-b
ip -n test-a netns set test-b auto
ip -n test-a link add veth_a index 9999 type veth \
peer name veth_b netns test-b
ip -n test-b link set veth_b up
ip -n test-a monitor link address prefix neigh nsid label all-nsid \
> /dev/null &
monitor_pid=$!
clean() {
kill $monitor_pid
ip netns del test-a
ip netns del test-b
}
trap clean EXIT
while true; do
ip -n test-b neigh add dev veth_b 1.2.3.4 lladdr AA:AA:AA:AA:AA:AA
ip -n test-b neigh del dev veth_b 1.2.3.4
done
Fixes: 55870dfe7f8b ("Improve batch and dump times by caching link lookups")
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add a new function rtnl_echo_talk() that could be used when the
sub-component supports NLM_F_ECHO flag. With this function we can
remove the redundant code added by commit b264b4c6568c7 ("ip: add
NLM_F_ECHO support").
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
There is no rule to have an error code after NLMSG_DONE msg. The only reason
we has this offset is that kernel function netlink_dump_done() has an error
code followed by the netlink message header.
Making nl_dump_ext_ack_done() has an offset parameter. So we can adjust
this for NLMSG_DONE message without error code.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
This patch adds bridge command to manage
recently added vnifilter on a collect metadata
vxlan device.
examples:
$bridge vni add dev vxlan0 vni 400
$bridge vni add dev vxlan0 vni 200 group 239.1.1.101
$bridge vni del dev vxlan0 vni 400
$bridge vni show
$bridge -s vni show
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
A number of functions in the rtnl_*_req family accept a caller-provided
callback to set up arbitrary filtering. rtnl_statsdump_req_filter()
currently only allows setting a field in the IFSM header, not custom
attributes. So far these were not necessary, but with introduction of more
detailed filtering settings, the callback becomes necessary.
To that end, add a filter_fn and filter_data arguments to the function.
Unlike the other filters, this one is typed to expect an IFSM pointer, to
permit tweaking the header itself as well.
Pass NULLs in the existing callers.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
rtnl_open_byproto() does not close the opened socket in case of
errors, and the socket is returned to the caller in the `fd` field of
the struct. However, none of the callers care about the socket, so
close it in the function immediately to avoid any potential resource
leaks.
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
A successful call to recvmsg() causes msg.msg_controllen to contain the length
of the received ancillary data. However, the current code in the 'ip' utility
doesn't reset this value after each recvmsg().
This means that if a call to recvmsg() doesn't have ancillary data, then
'msg.msg_controllen' will be set to 0, causing future recvmsg() which do
contain ancillary data to get MSG_CTRUNC set in msg.msg_flags.
This fixes 'ip monitor' running with the all-nsid option - With this option the
kernel passes the nsid as ancillary data. If while 'ip monitor' is running an
even on the current netns is received, then no ancillary data will be sent,
causing 'msg.msg_controllen' to be set to 0, which causes 'ip monitor' to
indefinitely print "[nsid current]" instead of the real nsid.
Fixes: 449b824ad196 ("ipmonitor: allows to monitor in several netns")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Fix nullptr dereference of errhndlr from rtnl_dump_filter_arg
struct in rtnl_dump_done and rtnl_dump_error functions.
Fixes: 459ce6e3d792 ("ip route: ignore ENOENT during save if RT_TABLE_MAIN is being dumped")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Roi Dayan <roid@nvidia.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Don't initialize arguments that are NULL, and format initialization
in a more logical way.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
We started to use in-kernel filtering feature which allows to get only
needed tables (see iproute_dump_filter()). From the kernel side it's
implemented in net/ipv4/fib_frontend.c (inet_dump_fib), net/ipv6/ip6_fib.c
(inet6_dump_fib). The problem here is that behaviour of "ip route save"
was changed after
c7e6371bc ("ip route: Add protocol, table id and device to dump request").
If filters are used, then kernel returns ENOENT error if requested table
is absent, but in newly created net namespace even RT_TABLE_MAIN table
doesn't exist. It is really allocated, for instance, after issuing
"ip l set lo up".
Reproducer is fairly simple:
$ unshare -n ip route save > dump
Error: ipv4: FIB table does not exist.
Dump terminated
Expected result here is to get empty dump file (as it was before this
change).
v2: reworked, so, now it takes into account NLMSGERR_ATTR_MSG
(see nl_dump_ext_ack_done() function). We want to suppress error messages
in stderr about absent FIB table from kernel too.
v3: reworked to make code clearer. Introduced rtnl_suppressed_errors(),
rtnl_suppress_error() helpers. User may suppress up to 3 errors (may be
easily extended by changing SUPPRESS_ERRORS_INIT macro).
v4: reworked, rtnl_dump_filter_errhndlr() was introduced. Thanks
to Stephen Hemminger for comments and suggestions
v5: space fixes, commit message reformat, empty initializers
Fixes: c7e6371bc ("ip route: Add protocol, table id and device to dump request")
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add rtnl bridge vlan dump request helper which will be used to retrieve
bridge vlan information and options.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Add ability to dump multiple nexthop buckets and get a specific one.
Example:
# ip nexthop add id 10 group 1/2 type resilient buckets 8
# ip nexthop
id 1 via 192.0.2.2 dev dummy10 scope link
id 2 via 192.0.2.19 dev dummy20 scope link
id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
# ip nexthop bucket
id 10 index 0 idle_time 28.1 nhid 2
id 10 index 1 idle_time 28.1 nhid 2
id 10 index 2 idle_time 28.1 nhid 2
id 10 index 3 idle_time 28.1 nhid 2
id 10 index 4 idle_time 28.1 nhid 1
id 10 index 5 idle_time 28.1 nhid 1
id 10 index 6 idle_time 28.1 nhid 1
id 10 index 7 idle_time 28.1 nhid 1
# ip nexthop bucket show nhid 1
id 10 index 4 idle_time 53.59 nhid 1
id 10 index 5 idle_time 53.59 nhid 1
id 10 index 6 idle_time 53.59 nhid 1
id 10 index 7 idle_time 53.59 nhid 1
# ip nexthop bucket get id 10 index 5
id 10 index 5 idle_time 81 nhid 1
# ip -j -p nexthop bucket get id 10 index 5
[ {
"id": 10,
"bucket": {
"index": 5,
"idle_time": 104.89,
"nhid": 1
},
"flags": [ ]
} ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
On some systems (e.g. current Debian/stable) the inclusion
of utils.h pulled in some other things that may end up
defining __aligned, in a possibly different way than what
we had here.
Use our own definition only if there isn't one already.
Fixes: d5acae244f9d ("libnetlink: add nl_print_policy() helper")
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This prints out the data from the given nested attribute
to the given FILE pointer, interpreting the firmware that
the kernel has for showing netlink policies.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
fread(3) returns size_t data type which is unsigned, thus check
`if (fread(...) < 0)' is always false. To check if fread(3) has
failed, user should check error indicator with ferror(3).
This commit also changes read logic a little bit by being less
forgiving for errors. Previous logic was checking if fread(3)
read *at least* required ammount of data, now code checks if
fread(3) read *exactly* expected ammount of data. This makes
sense because code parses very specific binary file, and reading
even 1 less/more byte than expected, will later corrupt data anyway.
Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Devlink commands which implements the dumpit callback may return error.
The netlink function netlink_dump() sends the errno value as the payload
of the message, while answering user space with NLMSG_DONE.
To enable receiving errno value for dumpit commands we have to check for
it in the message. If it is a negative value then the dump returned an
error so we should set errno accordingly and check for ext_ack in case
it was set.
Fixes: 049c58539f5d ("devlink: mnlg: Add support for extended ack")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
groups > 31 have to be joined using the setsockopt. Since the nexthop
group is 32, add a helper to allow 'ip monitor' to listen for nexthop
messages.
Signed-off-by: David Ahern <dsahern@gmail.com>
In the past, we tried to increase the buffer size up to 32 KB in order
to reduce number of syscalls per dump.
Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
brought the size back to 4KB because the kernel can not know the application
is ready to receive bigger requests.
See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
for more details.
Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
While iproute2 correctly uses ifinfomsg struct as the ancillary header
when requesting an FDB dump on old kernels, it sets the message type to
RTM_GETLINK. This results in wrong reply being returned.
Fix this by using RTM_GETNEIGH instead.
Before:
$ bridge fdb show brport dummy0
Not RTM_NEWNEIGH: 00000158 00000010 00000002
After:
$ bridge fdb show brport dummy0
2a:0b:41:1c:92:d3 vlan 1 master br0 permanent
2a:0b:41:1c:92:d3 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent
Fixes: 05880354c2cf ("bridge: fdb: Fix filtering with strict checking disabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: LiLiang <liali@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Without this fix, the VF info can't be showed using command
"ip link".
146: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 24:8a:07:ad:78:52 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 02:25:d0:12:01:01, spoof checking off, link-state auto, trust off, query_rss off
vf 1 MAC 02:25:d0:12:01:02, spoof checking off, link-state auto, trust off, query_rss off
Fixes: d97b16b2c906 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The bridge command 'vlan show' calls rtnl_linkdump_req_filter for
family AF_BRIDGE. Update rtnl_linkdump_req_filter to send the filter
for that family as well.
Fixes: d97b16b2c906 ("libnetlink: linkdump_req: Only AF_UNSPEC family expects an ext_filter_mask")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Add RTNL_HANDLE_F_STRICT_CHK flag and set in rth flags to let know
commands know if the kernel supports strict checking.
Extracted from patch from Ido to fix filtering with strict checking
enabled.
Cc: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Add filter function to rtnl_neighdump_req and a buffer to the
request for the filter functions to append attributes.
Signed-off-by: David Ahern <dsahern@gmail.com>
iproute2 has been updated for the new strict policy in the kernel. Add a
helper to call setsockopt to enable the feature. Add a call to ip.c and
bridge.c
The setsockopt fails on older kernels and the error can be safely ignored
- any new fields or attributes are ignored by the older kernel.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add a filter function to rtnl_addrdump_req to set device index in the
address dump request if the user is filtering addresses by device. In
addition, add a new ipaddr_link_get to do a single RTM_GETLINK request
instead of a device dump yet still store the data in the linfo list.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add a filter option to rtnl_routedump_req and use it to set rtm_flags
removing the need for rtnl_rtcache_request for dump requests.
Signed-off-by: David Ahern <dsahern@gmail.com>
Only AF_UNSPEC handled by rtnl_dump_ifinfo expects an ext_filter_mask
on a dump request. Update the linkdump request functions to only set
and send ext_filter_mask for AF_UNSPEC.
Signed-off-by: David Ahern <dsahern@gmail.com>
Change nlmsg_len from sizeof(req) to use NLMSG_LENGTH on the header.
2 of the inner headers are not 4-byte aligned, so add a 0-length buf
after the header with the __aligned(NLMSG_ALIGNTO) to ensure the size
of the request is large enough. Use NLMSG_ALIGN in NLMSG_LENGTH to set
nlmsg_len.
Signed-off-by: David Ahern <dsahern@gmail.com>
Print any extack message that has been appended to a NLMSG_DONE message.
To avoid duplication, move the existing print code to a new helper.
Signed-off-by: David Ahern <dsahern@gmail.com>
When no error is reported in the first iov, do not prematurely return,
but process further iovs. This fixes batch processing.
Fixes: c60389e4f9ea ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
rntl_talk_extack and parse_rtattr_index not used in current code.
rtnl_dump_filter_l is only used in this file.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
No function, filter, or print function uses the sockaddr_nl arg,
so just drop it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
In __rtnl_talk_iov() main loop, err is a pointer to memory in dynamically
allocated 'buf' that is used to store netlink messages. If netlink message
is an error message, buf is deallocated before returning with error code.
However, on return err->error code is checked one more time to generate
return value, after memory which err points to has already been
freed. Save error code in temporary variable and use the variable to
generate return value.
Fixes: c60389e4f9ea ("libnetlink: fix leak and using unused memory on error")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
rtnl_wilddump_stats_req_filter only takes RTM_GETSTATS as the type argument
so rename to rtnl_statsdump_req_filter for consistency with other request
functions and hardcode the type argument.
Signed-off-by: David Ahern <dsahern@gmail.com>
Rename rtnl_wilddump_req_filter to rtnl_linkdump_req_filter,
rtnl_wilddump_request to rtnl_linkdump_req and
rtnl_wilddump_req_filter_fn to rtnl_linkdump_req_filter_fn.
In all cases drop the type argument which at this point is only
RTM_GETLINK and hardcode in the functions.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add rtnl_nsiddump_req for namespace id dumps using the proper rtgenmsg
as the header. Convert existing RTM_GETNSID dumps to use it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add rtnl_neightbldump_req for neighbor table dumps using the proper ndtmsg
as the header. Convert existing RTM_GETNEIGHTBL dumps to use it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add rtnl_neighdump_req for neighbor dumps using the proper ndmsg
as the header. Convert existing rtnl_wilddump_request for RTM_GETNEIGH
to use it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add rtnl_ruledump_req for fib fule dumps using the proper fib_rule_hdr
as the header. Convert existing RTM_GETRULE dumps to use it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Add rtnl_netconfdump_req for netconf dumps using the proper netconfmsg
as the header. Convert existing RTM_GETNETCONF dumps to use it.
Signed-off-by: David Ahern <dsahern@gmail.com>