drivers/net: fixup comments after "Future-proof tunnel offload handlers"
Some comments weren't updated to reflect the renaming of ndo's and the
change of arguments.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Sun, 10 Jul 2016 01:20:11 +0000 (10:20 +0900)]
tunnels: correct conditional build of MPLS and IPv6
Using a combination if #if conditionals and goto labels to unwind
tunnel4_init seems unwieldy. This patch takes a simpler approach of
directly unregistering previously registered protocols when an error
occurs.
This fixes a number of problems with the current implementation
including the potential presence of labels when they are unused
and the potential absence of unregister code when it is needed.
Fixes: 8afe97e5d416 ("tunnels: support MPLS over IPv4 tunnels") Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Jul 2016 20:25:39 +0000 (13:25 -0700)]
Merge branch 'sctp-rfc7496-support'
Xin Long says:
====================
sctp: implement rfc7496 in sctp
This patchset implements "Additional Policies for the Partially Reliable
Stream Control Transmission Protocol Extension" described on RFC7496.
The Partially Reliable SCTP (PR-SCTP) extension defined in [RFC3758]
provides a generic method for senders to abandon user messages. The
decision to abandon a user message is sender side only, and the exact
condition is called a "PR-SCTP policy". This patchset implements 3
policies:
1. Timed Reliability: This allows the sender to specify a timeout for
a user message after which the SCTP stack abandons the user message.
2. Limited Retransmission Policy: Allows limitation of the number of
retransmissions.
3. Priority Policy: Allows removal of lower-priority messages if space
for higher-priority messages is needed in the send buffer.
Patch 1-3 add some sockopts in sctp to set/get pr_sctp policy status.
Patch 4-6 implement these 3 policies one by one.
====================
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 9 Jul 2016 11:47:45 +0000 (19:47 +0800)]
sctp: implement prsctp PRIO policy
prsctp PRIO policy is a policy to abandon lower priority chunks when
asoc doesn't have enough snd buffer, so that the current chunk with
higher priority can be queued successfully.
Similar to TTL/RTX policy, we will set the priority of the chunk to
prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
So if PRIO policy is enabled, msg->expire_at won't work.
asoc->sent_cnt_removable will record how many chunks can be checked to
remove. If priority policy is enabled, when the chunk is queued into
the out_queue, we will increase sent_cnt_removable. When the chunk is
moved to abandon_queue or dequeue and free, we will decrease
sent_cnt_removable.
In sctp_sendmsg, we will check if there is enough snd buffer for current
msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
free chunks from out_queue in right order until the abandon+free size >
msg_len - sctp_wfree. For the abandon size, we have to wait until it
sends FORWARD TSN, receives the sack and the chunks are really freed.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 9 Jul 2016 11:47:44 +0000 (19:47 +0800)]
sctp: implement prsctp RTX policy
prsctp RTX policy is a policy to abandon chunks when they are
retransmitted beyond the max count.
This patch uses sent_count to count how many times one chunk has
been sent, and prsctp_param is the max rtx count, which is from
sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So similar
to TTL policy, if RTX policy is enabled, msg->expire_at won't
work.
Then in sctp_chunk_abandoned, this patch checks if chunk->sent_count
is bigger than chunk->prsctp_param to abandon this chunk.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 9 Jul 2016 11:47:43 +0000 (19:47 +0800)]
sctp: implement prsctp TTL policy
prsctp TTL policy is a policy to abandon chunks when they expire
at the specific time in local stack. It's similar with expires_at
in struct sctp_datamsg.
This patch uses sinfo->sinfo_timetolive to set the specific time for
TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at.
So if prsctp_enable or TTL policy is not enabled, msg->expires_at
still works as before.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 9 Jul 2016 11:47:42 +0000 (19:47 +0800)]
sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
to dump the prsctp statistics info from the asoc. The prsctp statistics
includes abandoned_sent/unsent from the asoc. abandoned_sent is the
count of the packets we drop packets from retransmit/transmited queue,
and abandoned_unsent is the count of the packets we drop from out_queue
according to the policy.
Note: another option for prsctp statistics dump described in rfc is
SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
info from each stream. But by now, linux doesn't yet have per stream
statistics info, it needs rfc6525 to be implemented. As the prsctp
statistics for each stream has to be based on per stream statistics,
we will delay it until rfc6525 is done in linux.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 9 Jul 2016 11:47:41 +0000 (19:47 +0800)]
sctp: add SCTP_DEFAULT_PRINFO into sctp sockopt
This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used
to set/get sctp Partially Reliable Policies' default params,
which includes 3 policies (ttl, rtx, prio) and their values.
Still, if we set policy params in sndinfo, we will use the params
of sndinfo against chunks, instead of the default params.
In this patch, we will use 5-8bit of sp/asoc->default_flags
to store prsctp policies, and reuse asoc->default_timetolive
to store their values. It means if we enable and set prsctp
policy, prior ttl timeout in sctp will not work any more.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 9 Jul 2016 11:47:40 +0000 (19:47 +0800)]
sctp: add SCTP_PR_SUPPORTED on sctp sockopt
According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
We will add prsctp_enable to both asoc and ep, and replace the places
where it used net.sctp->prsctp_enable with asoc->prsctp_enable.
ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
also modify it's value through sockopt SCTP_PR_SUPPORTED.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Fri, 8 Jul 2016 22:54:47 +0000 (00:54 +0200)]
Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"
This reverts commit 4386f5662e63 ("net: ethernet: bcmgenet: use
phy_ethtool_{get|set}_link_ksettings")
This patch is wrong, the function phy_ethtool_{get|set}_link_ksettings
don't check if the device is running, but the driver bcmgenet need this
check.
The function {get|set}_settings need to access the mdio bus, and this
bus may only be used when the device is running. Otherwise, the clock
is disable and a mdio access will fail.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Jul 2016 19:52:03 +0000 (12:52 -0700)]
Merge branch 'b53-nsp-switch'
Florian Fainelli says:
====================
net: dsa: b53: Add Broadcom NSP switch support
This patch series updates the B53 driver to support Broadcom's Northstar Plus
Soc integrated switch.
Unlike the version of the core present in BCM5301x/Northstar, we cannot read the
full chip id of the switch, so we need to get the information about our switch
id from Device Tree.
Other than that, this is a regular Broadcom Ethernet switch which is register
compatible for all practical purposes with the existing switch driver.
Since DSA requires a working CPU Ethernet MAC driver this depends on Jon
Mason's AMAC/BGMAC driver changes to support NSP. Board specific changes depend
on patches present in Broadcom's ARM SoC branches and will be posted in a short
while.
====================
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Allow SRAB driver to specify platform data
For Northstart Plus SoCs, we cannot detect the switch because only the
revision information is provied in the Management page, instead, rely on
Device Tree to tell us the chip id, and pass it down using the
b53_platform_data structure.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 8 Jul 2016 03:46:04 +0000 (05:46 +0200)]
ipv6: do not abuse GFP_ATOMIC in inet6_netconf_notify_devconf()
All inet6_netconf_notify_devconf() callers are in process context,
so we can use GFP_KERNEL allocations if we take care of not holding
a rwlock while not needed in ip6mr (we hold RTNL there)
Fixes: d67b8c616b48 ("netconf: advertise mc_forwarding status") Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 8 Jul 2016 03:18:24 +0000 (05:18 +0200)]
ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf()
inet_forward_change() runs with RTNL held.
We are allowed to sleep if required.
If we use __in_dev_get_rtnl() instead of __in_dev_get_rcu(),
we no longer have to use GFP_ATOMIC allocations in
inet_netconf_notify_devconf(), meaning we are less likely to miss
notifications under memory pressure, and wont touch precious memory
reserves either and risk dropping incoming packets.
inet_netconf_get_devconf() can also use GFP_KERNEL allocation.
Fixes: edc9e748934c ("rtnl/ipv4: use netconf msg to advertise forwarding status") Fixes: 9e5511106f99 ("rtnl/ipv4: add support of RTM_GETNETCONF") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 9 Jul 2016 22:10:48 +0000 (18:10 -0400)]
Merge branch 'bgmac-platform-device'
Jon Mason says:
====================
net: ethernet: bgmac: Add platform device support
David Miller, Please consider including patches 1-5 in net-next
Florian Fainelli, Please consider including patches 6 & 7 in
devicetree/next
Changes in v2:
* Made device tree binding changes suggested by Sergei Shtylyov,
Ray Jui, Rob Herring, Florian Fainelli, and Arnd Bergmann
* Removed devm_* error paths in the bgmac_platform.c suggested by
Florian Fainelli
* Added Arnd Bergmann's Acked-by to the first 5 (there were changes
outlined in the bullets above, but I believe them to be minor enough
for him to not revoke his acks)
This patch series adds support for other, non-bcma iProc SoC's to the
bgmac driver. This series only adds NSP support, but we are interested
in adding support for the Cygnus and NS2 families (with more possible
down the road).
To support non-bcma enabled SoCs, we need to add the standard device
tree "platform device" support. Unfortunately, this driver is very
tighly coupled with the bcma bus and much unwinding is needed. I tried
to break this up into a number of patches to make it more obvious what
was being done to add platform device support. I was able to verify
that the bcma code still works using a 53012K board (NS SoC), and that
the platform code works using a 58625K board (NSP SoC).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Thu, 7 Jul 2016 23:08:57 +0000 (19:08 -0400)]
net: ethernet: bgmac: Add platform device support
The bcma portion of the driver has been split off into a bcma specific
driver. This has been mirrored for the platform driver. The last
references to the bcma core struct have been changed into a generic
function call. These function calls are wrappers to either the original
bcma code or new platform functions that access the same areas via MMIO.
This necessitated adding function pointers for both platform and bcma to
hide which backend is being used from the generic bgmac code.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Thu, 7 Jul 2016 23:08:56 +0000 (19:08 -0400)]
net: ethernet: bgmac: convert to feature flags
The bgmac driver is using the bcma provides device ID and revision, as
well as the SoC ID and package, to determine which features are
necessary to enable, reset, etc in the driver. In anticipation of
removing the bcma requirement for this driver, these must be changed to
not reference that struct. In place of that, each "feature" has been
given a flag, and the flags are enabled for their respective device and
SoC.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Thu, 7 Jul 2016 23:08:55 +0000 (19:08 -0400)]
net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file
Move the BCMA MDIO phy into a separate file, as it is very tightly
coupled with the BCMA bus. This will help with the upcoming BCMA
removal from the bgmac driver. Optimally, this should be moved into
phy drivers, but it is too tightly coupled with the bgmac driver to
effectively move it without more changes to the driver.
Note: the phy_reset was intentionally removed, as the mdio phy subsystem
automatically resets the phy if a reset function pointer is present. In
addition to the moving of the driver, this reset function is added.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Thu, 7 Jul 2016 23:08:54 +0000 (19:08 -0400)]
net: ethernet: bgmac: add dma_dev pointer
The dma buffer allocation, etc references a dma_dev device pointer from
the bcma core. In anticipation of removing the bcma requirement for
this driver, these must be changed to not reference that struct. Add a
dma_dev device pointer to the bgmac stuct and reference that instead.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Thu, 7 Jul 2016 23:08:53 +0000 (19:08 -0400)]
net: ethernet: bgmac: change bgmac_* prints to dev_* prints
The bgmac_* print wrappers call dev_* prints with the dev pointer from
the bcma core. In anticipation of removing the bcma requirement for
this driver, these must be changed to not reference that struct. So,
simply change all of the bgmac_* prints to their dev_* counterparts. In
some cases netdev_* prints are more appropriate, so change those as
well.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: tracepoint napi:napi_poll add work and budget
An important information for the napi_poll tracepoint is knowing
the work done (packets processed) by the napi_poll() call. Add
both the work done and budget, as they are related.
Handle trace_napi_poll() param change in dropwatch/drop_monitor
and in python perf script netdev-times.py in backward compat way,
as python fortunately supports optional parameter handling.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
r8152: remove a netif_carrier_off in rtl8152_open function
After commit 90186af404ad ("r8152: fix lockup when runtime PM is enabled"),
the autoresume wouldn't start the device before rtl8152_open() is finished.
Therefore, we don't have to reset the linking status before and after
autoresume. That is, one of netif_carrier_off() in rtl8152_open() could be
removed.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In rtl_hw_phy_work_func_t(), the flag of PHY_RESET is set in
rtl_ops.hw_phy_cfg() and cleared in rtl8152_set_speed(). Therefore,
the rtl_phy_reset() is never run and is unnecessary.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 9 Jul 2016 21:46:02 +0000 (17:46 -0400)]
Merge branch 'mpls-in-ipv4-and-udp'
Simon Horman says:
====================
net: support MPLS in IPv4 and UDP
This short series provides support for MPLS in IPv4 (RFC4023), and by
virtue of FOU, MPLS in UDP (RFC7510).
The changes are as follows:
1. Teach tunnel4.c about AF_MPLS, it already understands AF_INET and
AF_INET6
2. Enhance IPIP and SIT to handle MPLS. Both already handle IPv4.
SIT also already handles IPv6.
3. Trivially enhance MPLS to allow routes over SIT and IPIP tunnels.
A corresponding patch set for iproute2 has also been provided.
Changes since v1
* Correct inverted IPIP protocol logic in SIT patch
* Provide usage example below
Sample configuration follows:
* The following creates a tunnel and routes MPLS packets whose outermost
label is 100 over it. The forwarded packets will have the outermost label
stack entry, 100, removed and two label stack entries added, the
outermost having label 200 and the next having label 300.
The local end-point for the tunnel is 10.0.99.192 and the remote
endpoint is 10.0.99.193.
The local address for encapsulated packets is 10.0.98.192 and the
remote address is 10.0.98.193.
# Create an MPLS over IPv4 tunnel using the IPIP driver
ip link add name tun1 type ipip remote 10.0.99.193 local 10.0.99.192 \
ttl 225 mode mplsip
# Bring the tunnel up and an add an IPv4 address and route
ip link set up dev tun1
ip addr add 10.0.98.192/24 dev tun1
# Set MPLS route
# Allow MPLS forwarding of packets recieved on eth0
echo 1 > /proc/sys/net/mpls/conf/eth0/input
# Larger than label to be routed (100)
echo 101 > /proc/sys/net/mpls/platform_labels
ip -f mpls route add 100 as 200/300 via inet 10.0.98.193
* For FOU (in this case MPLS over UDP) a tunnel may created using:
# Packets recieved on UDP port 6635 are MPLS over UDP (IP proto 137)
ip fou add port 6635 ipproto 137
# Create the tunnel netdev
ip link add name tun1 type ipip remote 10.0.99.193 local 10.0.99.192 \
ttl 225 mode mplsip encap fou encap-sport auto encap-dport 6635
IPv4 address, link and route, and MPLS routing commands are as per
the MPLS over IPv4 example
* To use the SIT driver instead of the IPIP driver "ipip" may be substituted
for "sit" in the above examples.
* To create a tunnel that forwards and receives all supported
inner-protocols "mplsip" may be substituted for "any" in the above
examples.
For the IPIP driver this configures both IPv4 and MPLS over IPv4.
For the SIT driver this configures IPv6, IPv4 and MPLS over IPv4.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 7 Jul 2016 05:56:15 +0000 (07:56 +0200)]
mpls: allow routes on ipip and sit devices
Allow MPLS routes on IPIP and SIT devices now that they
support forwarding MPLS packets.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 7 Jul 2016 05:56:14 +0000 (07:56 +0200)]
ipip: support MPLS over IPv4
Extend the IPIP driver to support MPLS over IPv4. The implementation is an
extension of existing support for IPv4 over IPv4 and is based of multiple
inner-protocol support for the SIT driver.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 7 Jul 2016 05:56:13 +0000 (07:56 +0200)]
sit: support MPLS over IPv4
Extend the SIT driver to support MPLS over IPv4. This implementation
extends existing support for IPv6 over IPv4 and IPv4 over IPv4.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 7 Jul 2016 05:56:12 +0000 (07:56 +0200)]
tunnels: support MPLS over IPv4 tunnels
Extend tunnel support to MPLS over IPv4. The implementation extends the
existing differentiation between IPIP and IPv6 over IPv4 to also cover MPLS
over IPv4.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As was suggested this patch adds support for the different versions of MLD
and IGMP query types. Since the user visible structure is still in net-next
we can augment it instead of adding netlink attributes.
The distinction between the different IGMP/MLD query types is done as
suggested in Section 7.1, RFC 3376 [1] and Section 8.1, RFC 3810 [2] based
on query payload size and code for IGMP. Since all IGMP packets go through
multicast_rcv() and it uses ip_mc_check_igmp/ipv6_mc_check_mld we can be
sure that at least the ip/ipv6 header can be directly used.
Suggested-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
When we introduced GSO support, if using auth the auth chunk was being
left queued on the packet even after the final segment was generated.
Later on sctp_transmit_packet it calls sctp_packet_reset, which zeroed
the packet len while not accounting for this left-over. This caused more
space to be used the next packet due to the chunk still being queued,
but space which wasn't allocated as its size wasn't accounted.
The fix is to only queue it back when we know that we are going to
generate another segment.
Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 7 Jul 2016 08:23:09 +0000 (11:23 +0300)]
bnxt: fix a condition
This code generates as static checker warning because htons(ETH_P_IPV6)
is always true. From the context it looks like the && was intended to
be !=.
Fixes: 94758f8de037 ('bnxt_en: Add GRO logic for BCM5731X chips.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
over time there were multiple requests to access different data
structures and fields of task_struct current, so finally add
the helper to access 'current' as-is. Tracing bpf programs will do
the rest of walking the pointers via bpf_probe_read().
Note that current can be null and bpf program has to deal it with,
but even dumb passing null into bpf_probe_read() is still safe.
Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 7 Jul 2016 00:03:54 +0000 (20:03 -0400)]
net: dsa: initialize the routing table
The routing table of every switch in a tree is currently initialized to
all zeros. This is an issue since 0 is a valid port number.
Add a DSA_RTABLE_NONE=-1 constant to initialize the signed values of the
routing table pointing to other switches.
This fixes the device mapping of the mv88e6xxx driver where the port
pointing to the switch itself and to non-existent switches was wrongly
configured to be 0. It is now set to the expected 0xf value.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The referenced change added a netlink notifier for processing
device queue size events. These events are fired for all devices
but the registered callback assumed they only occurred for tun
devices. This fix adds a check (borrowed from macvtap.c) to discard
non-tun device events.
For reference, this fixes the following splat:
[ 71.505935] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 71.513870] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[ 71.519906] PGD 3f41f56067 PUD 3f264b7067 PMD 0
[ 71.524497] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[ 71.529374] gsmi: Log Shutdown Reason 0x03
[ 71.533417] Modules linked in:[ 71.533826] mlx4_en: eth1: Link Up
Fixes: 1576d9860599 ("tun: switch to use skb array for tx") Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 9 Jul 2016 03:52:12 +0000 (23:52 -0400)]
Merge tag 'rxrpc-rewrite-20160706' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Improve conn/call lookup and fix call number generation [ver #3]
I've fixed a couple of patch descriptions and excised the patch that
duplicated the connections list for reconsideration at a later date.
For reference, the excised patch is sitting on the rxrpc-experimental
branch of my git tree, based on top of the rxrpc-rewrite branch. Diffing
it against yesterday's tag shows no differences.
Would you prefer the patch set to be emailed afresh instead of a git-pull
request?
David
---
Here's the next part of the AF_RXRPC rewrite. The two main purposes of
this set are to fix the call number handling and to make use of RCU when
looking up the connection or call to pass a received packet to.
Important changes in this set include:
(1) Avoidance of placing stack data into SG lists in rxkad so that kernel
stacks can become vmalloc'd (Herbert Xu).
(2) Calls cease pinning the connection they used as soon as possible,
which allows the connection to be discarded sooner and allows the call
channel on that connection to be reused earlier.
(3) Make each call channel on a connection have a separate and independent
call number space rather than having a shared number space for the
connection. Call numbers should increment monotonically per channel
on the client, and the server should ignore a call with a lower call
number for that channel than the latest it has seen. The RESPONSE
packet sets the minimum values of each call ID counter on a
connection.
(4) Look up calls by indexing the channel array on a connection rather
than by keeping calls in an rbtree on that connection. Also look up
calls using the channel array rather than using a hashtable.
The call hashtable can then be removed.
(5) Call terminal statuses are cached in the channel array for the last
call. It is assumed that if we the server have seen call N, then the
client no longer cares about call N-1 on the same channel.
This will allow retransmission of the terminal status in future
without the need to keep the rxrpc_call struct around.
(6) Peer lookups are moved out of common connection handling code and into
service connection handling code as client connections (a) must point
to a peer before they can be used and (b) are looked up by a
machine-unique connection ID directly, so we only need to look up the
peer first if we're going to deal with a service call.
(7) The reference count on a connection is held elevated by 1 whilst it is
alive (ie. idle unused connections have a refcount of 1). The reaper
will attempt to change the refcount from 1->0 and skip if this cannot
be done, whilst look ups only increment the refcount if it's non-zero.
This makes the implementation of RCU lookups easier as we don't have
to get a ref on the connection or a lock on the connection list to
prevent a connection being reaped whilst we're contemplating queueing
a packet that initiates a new service call upon it.
If we need to get a connection, but there's a dead connection in the
tree, we use rb_replace_node() to replace the dead one with a new one.
(8) Use a seqlock to validate the walk over the service connection rbtree
attached to a peer when it's being walked in RCU mode.
(9) Make the incoming call/connection packet handling code use RCU mode
and locks and make it only take a reference if the call/connection
gets queued on a workqueue.
The intention is that the next set will introduce the connection lifetime
management and capacity limits to prevent clients from overloading the
server.
There are some fixes too:
(1) Verifying that a packet coming in to a client connection came from the
expected source.
(2) Fix handling of connection failure in client call creation where we
don't reinitialise the list linkage block and a second attempt to
unlink the failed connection oopses and also we don't set the state
correctly, which causes an assertion failure.
(3) New service calls were being added to the socket's accept queue under
the wrong lock.
Changes:
(V2) In rxrpc_find_service_conn_rcu() initialised the sequence number to 0.
Fixed the RCU handling in conn_service.c by introducing and using
rb_replace_node_rcu() as an RCU-safe alternative in
rxrpc_publish_service_conn().
Modified and used rcu_dereference_raw() to avoid RCU sparse warnings
in rxrpc_find_service_conn_rcu().
Added in some missing RCU dereference wrappers. It seems to be
necessary to turn on CONFIG_PROVE_RCU_REPEATEDLY as well as
CONFIG_SPARSE_RCU_POINTER to get the static __rcu annotation checking
to happen.
Fixed some other sparse warnings, including a missing ntohs() in
jumbo packet processing.
(V3) Fixed some commit descriptions.
Excised the patch that duplicated the connection list to separate out
the procfs list for reconsideration at a later date.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netvsc: Use the new in-place consumption APIs in the rx path
Use the new APIs for eliminating a copy on the receive path. These new APIs also
help in minimizing the number of memory barriers we end up issuing (in the
ringbuffer code) since we can better control when we want to expose the ring
state to the host.
The patch is being resent to address earlier email issues.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hfsc_sched is huge (size: 920, cachelines: 15), but we can get it to 14
cachelines by placing level after filter_cnt (covering 4 byte hole) and
reducing period/nactive/flags to u32 (period is just a counter,
incremented when class becomes active -- 2**32 is plenty for this
purpose, also, long is only 32bit wide on 32bit platforms anyway).
cl_vtperiod is exported to userspace via tc_hfsc_stats, but its period
member is already u32, so no precision is lost there either.
Cc: Michal Soltys <soltys@ziu.info> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 7 Jul 2016 05:32:15 +0000 (22:32 -0700)]
Merge tag 'mac80211-next-for-davem-2016-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
One more set of new features:
* beacon report (for radio measurement) support in cfg80211/mac80211
* hwsim: allow wmediumd in namespaces
* mac80211: extend 160MHz workaround to CSA IEs
* mesh: properly encrypt group-addressed privacy action frames
* mesh: allow setting peer AID
* first steps for MU-MIMO monitor mode
* along with various other cleanups and improvements
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
1) All users of AF_PACKET's fanout feature want a symmetric packet
header hash for load balancing purposes, so give it to them.
2) Fix vlan state synchronization in e1000e, from Jarod Wilson.
3) Use correct socket pointer in ip_skb_dst_mtu(), from Shmulik
Ladkani.
4) mlx5 bug fixes from Mohamad Haj Yahia, Daniel Jurgens, Matthew
Finlay, Rana Shahout, and Shaker Daibes. Mostly to do with
operation timeouts and PCI error handling.
5) Fix checksum handling in mirred packet action, from WANG Cong.
6) Set skb->dev correctly when transmitting in !protect_frames case of
macsec driver, from Daniel Borkmann.
7) Fix MTU calculation in geneve driver, from Haishuang Yan.
8) Missing netif_napi_del() in unregister path of qeth driver, from
Ursula Braun.
9) Handle malformed route netlink messages in decnet properly, from
Vergard Nossum.
10) Memory leak of percpu data in ipv6 routing code, from Martin KaFai
Lau.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
ipv6: Fix mem leak in rt6i_pcpu
net: fix decnet rtnexthop parsing
cxgb4: update latest firmware version supported
net/mlx5: Avoid setting unused var when modifying vport node GUID
bonding: fix enslavement slave link notifications
r8152: fix runtime function for RTL8152
qeth: delete napi struct when removing a qeth device
Revert "fsl/fman: fix error handling"
fsl/fman: fix error handling
cdc_ncm: workaround for EM7455 "silent" data interface
RDS: fix rds_tcp_init() error path
geneve: fix max_mtu setting
net: phy: dp83867: Fix initialization of PHYCR register
enc28j60: Fix race condition in enc28j60 driver
net: stmmac: Fix null-function call in ISR on stmmac1000
tipc: fix nl compat regression for link statistics
net: bcmsysport: Device stats are unsigned long
macsec: set actual real device for xmit when !protect_frames
net_sched: fix mirrored packets checksum
packet: Use symmetric hash for PACKET_FANOUT_HASH.
...
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next,
they are:
1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.
2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.
3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():
4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.
5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.
6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.
7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.
8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.
9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.
10) Owner match now works from any namespace, from Eric Bierdeman.
11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.
12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.
13) Introduce generic macros for nf_tables object generation masks.
14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.
15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.
16) Support for deletion of just added elements in the hash set type.
17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.
18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.
19) Support for matching inverted set lookups, from Arturo Borrero.
20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.
21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.
22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.
23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.
24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'sound-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are a collection of small fixes: at this time, we've got a
slightly high amount, but all small and trivial fixes, and nothing
scary can be seen there"
* tag 'sound-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek: Add Lenovo L460 to docking unit fixup
ALSA: timer: Fix negative queue usage by racy accesses
ASoC: rt5645: fix reg-2f default value.
ASoC: fsl_ssi: Fix number of words per frame for I2S-slave mode
ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift()
ALSA: hda - Add PCI ID for Kabylake-H
ALSA: echoaudio: Fix memory allocation
ASoC: Intel: atom: fix missing breaks that would cause the wrong operation to execute
ALSA: hda - fix read before array start
ASoC: cx20442: set tty->receiver_room in v253_open
ASoC: ak4613: Enable cache usage to fix crashes on resume
ASoC: wm8940: Enable cache usage to fix crashes on resume
ASoC: Intel: Skylake: Initialize module list for Broxton
ASoC: wm5102: Correct supported channels on trace compressed DAI
ASoC: wm5110: Add missing route from OUT3R to SYSCLK
ASoC: rt5670: fix HP Playback Volume control
ASoC: hdmi-codec: select CONFIG_HDMI
ASoC: davinci-mcasp: Fix dra7 DMA offset when using CFG port
ASoC: hdac_hdmi: Fix potential NULL dereference
ASoC: ak4613: Remove owner assignment from platform_driver
...
Previously, mesh power management functionality works only with kernel
MPM. Because user space MPM did not report mesh peer AID to kernel,
the kernel could not identify the bit in TIM element. So this patch
adds mesh peer AID setting API.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 5 Jul 2016 12:23:14 +0000 (15:23 +0300)]
mac80211: parse wide bandwidth channel switch IE with workaround
Continuing the workaround implemented in commit 23665aaf9170
("mac80211: Interoperability workaround for 80+80 and 160 MHz channels")
use the same code to parse the Wide Bandwidth Channel Switch element
by converting to VHT Operation element since the spec also just refers
to that for parsing semantics, particularly with the workaround.
While at it, remove some dead code - the IEEE80211_STA_DISABLE_40MHZ
flag can never be set at this point since it's checked earlier and the
wide_bw_chansw_ie pointer is set to NULL if it's set.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mac80211: Add support for beacon report radio measurement
Add the following to support beacon report radio measurement
with the measurement mode field set to passive or active:
1. Propagate the required scan duration to the device
2. Report the scan start time (in terms of TSF)
3. Report each BSS's detection time (also in terms of TSF)
TSF times refer to the BSS that the interface that requested the
scan is connected to.
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: Avraham Stern <avraham.stern@intel.com>
[changed ath9k/10k, at76c59x-usb, iwlegacy, wl1251 and wlcore to match
the new API] Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Beacon report radio measurement requires reporting observed BSSs
on the channels specified in the beacon request. If the measurement
mode is set to passive or active, it requires actually performing a
scan (passive or active, accordingly), and reporting the time that
the scan was started and the time each beacon/probe was received
(both in terms of TSF of the BSS of the requesting AP). If the
request mode is table, this information is optional.
In addition, the radio measurement request specifies the channel
dwell time for the measurement.
In order to use scan for beacon report when the mode is active or
passive, add a parameter to scan request that specifies the
channel dwell time, and add scan start time and beacon received time
to scan results information.
Supporting beacon report is required for Multi Band Operation (MBO).
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com> Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ilan Peer [Tue, 5 Jul 2016 12:23:09 +0000 (15:23 +0300)]
mac80211_hwsim: Add radar bandwidths to the P2P Device combination
Add radar_detect_widths to the interface combination that allows
concurrent P2P Device dedicated interface and AP interfaces, to enable
testing of radar detection when P2P Device interface is used.
Clear the radar_detect_widths in case of multi channel contexts
as this is not currently supported.
As radar_detect_widths are now supported in all combinations,
remove the hwsim_if_dfs_limits definition since it is no longer
needed.
Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211: Add API to support VHT MU-MIMO air sniffer
add API to support VHT MU-MIMO air sniffer.
in MU-MIMO there are parallel frames on the air while the HW
has only one RX.
add the capability to sniff one of the MU-MIMO parallel frames by
giving the sniffer additional information so it'll know which
of the parallel frames it shall follow.
Add attribute - NL80211_ATTR_MU_MIMO_GROUP_DATA - for getting
a MU-MIMO groupID in order to monitor packets from that group
using VHT MU-MIMO.
And add attribute -NL80211_ATTR_MU_MIMO_FOLLOW_ADDR - for passing
MAC address to monitor mode.
that option will be used by VHT MU-MIMO air sniffer to follow a
station according to it's MAC address using VHT MU-MIMO.
Johannes Berg [Wed, 6 Jul 2016 12:44:14 +0000 (14:44 +0200)]
mac80211: agg-rx: refuse ADDBA Request with timeout update
The current implementation of handling ADDBA Request while a session
is already active with the peer is wrong - in case the peer is using
the existing session's dialog token this should be treated as update
to the session, which can update the timeout value.
We don't really have a good way of supporting that, so reject, but
implement the required behaviour in the spec of "Even if the updated
ADDBA Request frame is not accepted, the original Block ACK setup
remains active." (802.11-2012 10.5.4)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
David Howells [Fri, 1 Jul 2016 06:51:50 +0000 (07:51 +0100)]
rxrpc: Use RCU to access a peer's service connection tree
Move to using RCU access to a peer's service connection tree when routing
an incoming packet. This is done using a seqlock to trigger retrying of
the tree walk if a change happened.
Further, we no longer get a ref on the connection looked up in the
data_ready handler unless we queue the connection's work item - and then
only if the refcount > 0.
Note that I'm avoiding the use of a hash table for service connections
because each service connection is addressed by a 62-bit number
(constructed from epoch and connection ID >> 2) that would allow the client
to engage in bucket stuffing, given knowledge of the hash algorithm.
Peers, however, are hashed as the network address is less controllable by
the client. The total number of peers will also be limited in a future
commit.
Signed-off-by: David Howells <dhowells@redhat.com>
rcu: Suppress sparse warnings for rcu_dereference_raw()
Data structures that are used both with and without RCU protection
are difficult to write in a sparse-clean manner. If you mark the
relevant pointers with __rcu, sparse will complain about all non-RCU
uses, but if you don't mark those pointers, sparse will complain about
all RCU uses.
This commit therefore suppresses sparse warnings for rcu_dereference_raw(),
allowing mixed-protection data structures to avoid these warnings.
Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 30 Jun 2016 09:45:22 +0000 (10:45 +0100)]
rxrpc: Maintain an extra ref on a conn for the cache list
Overhaul the usage count accounting for the rxrpc_connection struct to make
it easier to implement RCU access from the data_ready handler.
The problem is that currently we're using a lock to prevent the garbage
collector from trying to clean up a connection that we're contemplating
unidling. We could just stick incoming packets on the connection we find,
but we've then got a problem that we may race when dispatching a work item
to process it as we need to give that a ref to prevent the rxrpc_connection
struct from disappearing in the meantime.
Further, incoming packets may get discarded if attached to an
rxrpc_connection struct that is going away. Whilst this is not a total
disaster - the client will presumably resend - it would delay processing of
the call. This would affect the AFS client filesystem's service manager
operation.
To this end:
(1) We now maintain an extra count on the connection usage count whilst it
is on the connection list. This mean it is not in use when its
refcount is 1.
(2) When trying to reuse an old connection, we only increment the refcount
if it is greater than 0. If it is 0, we replace it in the tree with a
new candidate connection.
(3) Two connection flags are added to indicate whether or not a connection
is in the local's client connection tree (used by sendmsg) or the
peer's service connection tree (used by data_ready). This makes sure
that we don't try and remove a connection if it got replaced.
The flags are tested under lock with the removal operation to prevent
the reaper from killing the rxrpc_connection struct whilst someone
else is trying to effect a replacement.
This could probably be alleviated by using memory barriers between the
flag set/test and the rb_tree ops. The rb_tree op would still need to
be under the lock, however.
(4) When trying to reap an old connection, we try to flip the usage count
from 1 to 0. If it's not 1 at that point, then it must've come back
to life temporarily and we ignore it.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 29 Jun 2016 13:40:39 +0000 (14:40 +0100)]
rxrpc: Move peer lookup from call-accept to new-incoming-conn
Move the lookup of a peer from a call that's being accepted into the
function that creates a new incoming connection. This will allow us to
avoid incrementing the peer's usage count in some cases in future.
Note that I haven't bother to integrate rxrpc_get_addr_from_skb() with
rxrpc_extract_addr_from_skb() as I'm going to delete the former in the very
near future.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 4 Apr 2016 13:00:40 +0000 (14:00 +0100)]
rxrpc: Split service connection code out into its own file
Split the service-specific connection code out into into its own file. The
client-specific code has already been split out. This will leave just the
common code in the original file.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 4 Apr 2016 13:00:40 +0000 (14:00 +0100)]
rxrpc: Split client connection code out into its own file
Split the client-specific connection code out into its own file. It will
behave somewhat differently from the service-specific connection code, so
it makes sense to separate them.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 27 Jun 2016 13:39:44 +0000 (14:39 +0100)]
rxrpc: Call channels should have separate call number spaces
Each channel on a connection has a separate, independent number space from
which to allocate callNumber values. It is entirely possible, for example,
to have a connection with four active calls, each with call number 1.
Note that the callNumber values for any particular channel don't have to
start at 1, but they are supposed to increment monotonically for that
channel from a client's perspective and may not be reused once the call
number is transmitted (until the epoch cycles all the way back round).
Currently, however, call numbers are allocated on a per-connection basis
and, further, are held in an rb-tree. The rb-tree is redundant as the four
channel pointers in the rxrpc_connection struct are entirely capable of
pointing to all the calls currently in progress on a connection.
To this end, make the following changes:
(1) Handle call number allocation independently per channel.
(2) Get rid of the conn->calls rb-tree. This is overkill as a connection
may have a maximum of four calls in progress at any one time. Use the
pointers in the channels[] array instead, indexed by the channel
number from the packet.
(3) For each channel, save the result of the last call that was in
progress on that channel in conn->channels[] so that the final ACK or
ABORT packet can be replayed if necessary. Any call earlier than that
is just ignored. If we've seen the next call number in a packet, the
last one is most definitely defunct.
(4) When generating a RESPONSE packet for a connection, the call number
counter for each channel must be included in it.
(5) When parsing a RESPONSE packet for a connection, the call number
counters contained therein should be used to set the minimum expected
call numbers on each channel.
To do in future commits:
(1) Replay terminal packets based on the last call stored in
conn->channels[].
(2) Connections should be retired before the callNumber space on any
channel runs out.
(3) A server is expected to disregard or reject any new incoming call that
has a call number less than the current call number counter. The call
number counter for that channel must be advanced to the new call
number.
Note that the server cannot just require that the next call that it
sees on a channel be exactly the call number counter + 1 because then
there's a scenario that could cause a problem: The client transmits a
packet to initiate a connection, the network goes out, the server
sends an ACK (which gets lost), the client sends an ABORT (which also
gets lost); the network then reconnects, the client then reuses the
call number for the next call (it doesn't know the server already saw
the call number), but the server thinks it already has the first
packet of this call (it doesn't know that the client doesn't know that
it saw the call number the first time).
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 27 Jun 2016 16:11:19 +0000 (17:11 +0100)]
rxrpc: Add RCU destruction for connections and calls
Add RCU destruction for connections and calls as the RCU lookup from the
transport socket data_ready handler is going to come along shortly.
Whilst we're at it, move the cleanup workqueue flushing and RCU barrierage
into the destruction code for the objects that need it (locals and
connections) and add the extra RCU barrier required for connection cleanup.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 4 Apr 2016 13:00:38 +0000 (14:00 +0100)]
rxrpc: Release a call's connection ref on call disconnection
When a call is disconnected, clear the call's pointer to the connection and
release the associated ref on that connection. This means that the call no
longer pins the connection and the connection can be discarded even before
the call is.
As the code currently stands, the call struct is effectively pinned by
userspace until userspace has enacted a recvmsg() to retrieve the final
call state as sk_buffs on the receive queue pin the call to which they're
related because:
(1) The rxrpc_call struct contains the userspace ID that recvmsg() has to
include in the control message buffer to indicate which call is being
referred to. This ID must remain valid until the terminal packet is
completely read and must be invalidated immediately at that point as
userspace is entitled to immediately reuse it.
(2) The final ACK to the reply to a client call isn't sent until the last
data packet is entirely read (it's probably worth altering this in
future to be send the ACK as soon as all the data has been received).
This change requires a bit of rearrangement to make sure that the call
isn't going to try and access the connection again after protocol
completion:
(1) Delete the error link earlier when we're releasing the call. Possibly
network errors should be distributed via connections at the cost of
adding in an access to the rxrpc_connection struct.
(2) Remove the call from the connection's call tree before disconnecting
the call. The call tree needs to be removed anyway and incoming
packets delivered by channel pointer instead.
(3) The release call event should be considered last after all other
events have been processed so that we don't need access to the
connection again.
(4) Move the channel_lock taking from rxrpc_release_call() to
rxrpc_disconnect_call() where it will be required in future.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 4 Apr 2016 13:00:39 +0000 (14:00 +0100)]
rxrpc: Fix handling of connection failure in client call creation
If rxrpc_connect_call() fails during the creation of a client connection,
there are two bugs that we can hit that need fixing:
(1) The call state should be moved to RXRPC_CALL_DEAD before the call
cleanup phase is invoked. If not, this can cause an assertion failure
later.
(2) call->link should be reinitialised after being deleted in
rxrpc_new_client_call() - which otherwise leads to a failure later
when the call cleanup attempts to delete the link again.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 27 Jun 2016 09:32:03 +0000 (10:32 +0100)]
rxrpc: Move usage count getting into rxrpc_queue_conn()
Rather than calling rxrpc_get_connection() manually before calling
rxrpc_queue_conn(), do it inside the queue wrapper.
This allows us to do some important fixes:
(1) If the usage count is 0, do nothing. This prevents connections from
being reanimated once they're dead.
(2) If rxrpc_queue_work() fails because the work item is already queued,
retract the usage count increment which would otherwise be lost.
(3) Don't take a ref on the connection in the work function. By passing
the ref through the work item, this is unnecessary. Doing it in the
work function is too late anyway. Previously, connection-directed
packets held a ref on the connection, but that's not really the best
idea.
And another useful changes:
(*) Don't need to take a refcount on the connection in the data_ready
handler unless we invoke the connection's work item. We're using RCU
there so that's otherwise redundant.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 27 Jun 2016 09:32:02 +0000 (10:32 +0100)]
rxrpc: Check that the client conns cache is empty before module removal
Check that the client conns cache is empty before module removal and bug if
not, listing any offending connections that are still present. Unfortunately,
if there are connections still around, then the transport socket is still
unexpectedly open and active, so we can't just unallocate the connections.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 27 Jun 2016 09:32:02 +0000 (10:32 +0100)]
rxrpc: Provide queuing helper functions
Provide queueing helper functions so that the queueing of local and
connection objects can be fixed later.
The issue is that a ref on the object needs to be passed to the work queue,
but the act of queueing the object may fail because the object is already
queued. Testing the queuedness of an object before hand doesn't work
because there can be a race with someone else trying to queue it. What
will have to be done is to adjust the refcount depending on the result of
the queue operation.
Signed-off-by: David Howells <dhowells@redhat.com>
Herbert Xu [Sun, 26 Jun 2016 21:55:24 +0000 (14:55 -0700)]
rxrpc: Avoid using stack memory in SG lists in rxkad
rxkad uses stack memory in SG lists which would not work if stacks were
allocated from vmalloc memory. In fact, in most cases this isn't even
necessary as the stack memory ends up getting copied over to kmalloc
memory.
This patch eliminates all the unnecessary stack memory uses by supplying
the final destination directly to the crypto API. In two instances where a
temporary buffer is actually needed we also switch use a scratch area in
the rxrpc_call struct (only one DATA packet will be being secured or
verified at a time).
Finally there is no need to split a split-page buffer into two SG entries
so code dealing with that has been removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 30 Jun 2016 10:34:30 +0000 (11:34 +0100)]
rxrpc: Check the source of a packet to a client conn
When looking up a client connection to which to route a packet, we need to
check that the packet came from the correct source so that a peer can't try
to muck around with another peer's connection.
Signed-off-by: David Howells <dhowells@redhat.com>
It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581
free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().
However, after fixing a deadlock bug in
commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.
It is worth to note that rt6i_pcpu is protected by table->tb6_lock.
kmemleak somehow did not report it. We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt") Reported-by: Petr Novopashenniy <pety@rusnet.ru> Tested-by: Petr Novopashenniy <pety@rusnet.ru> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Petr Novopashenniy <pety@rusnet.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
(i.e. if userspace passes a malformed netlink message).
Let's use the helpers from net/nexthop.h which take care of all this
stuff. We can do exactly the same as e.g. fib_count_nexthops() and
fib_get_nhs() from net/ipv4/fib_semantics.c.
This fixes the softlockup for me.
Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 21 Jun 2016 13:58:46 +0000 (16:58 +0300)]
platform/chrome: cros_ec_dev - double fetch bug in ioctl
We verify "u_cmd.outsize" and "u_cmd.insize" but we need to make sure
that those values have not changed between the two copy_from_user()
calls. Otherwise it could lead to a buffer overflow.
Additionally, cros_ec_cmd_xfer() can set s_cmd->insize to a lower value.
We should use the new smaller value so we don't copy too much data to
the user.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Fixes: a841178445bb ('mfd: cros_ec: Use a zero-length array for command data') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Gwendal Grignou <gwendal@chromium.org> Cc: <stable@vger.kernel.org> # v4.2+ Signed-off-by: Olof Johansson <olof@lixom.net>
Or Gerlitz [Tue, 5 Jul 2016 09:17:12 +0000 (12:17 +0300)]
net/mlx5: Avoid setting unused var when modifying vport node GUID
GCC complains on unused-but-set-variable, clean this up.
Fixes: 23898c763f4a ('net/mlx5: E-Switch, Modify node guid on vf set MAC') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, link notifications are not sent by
bond_set_slave_link_state() upon enslavement if
the slave is enslaved when up.
This happens because slave->link default init value
is 0, which is the same as BOND_LINK_UP, resulting
in bond_set_slave_link_state() ignoring this transition.
This patch sets the default value of slave->link to
BOND_LINK_NOCHANGE, assuring it will count as a state
transition and thus trigger notification logic.
Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 5 Jul 2016 01:47:41 +0000 (18:47 -0700)]
net: vrf: Add support for PREROUTING rules on vrf device
Add support for PREROUTING rules with skb->dev set to the vrf device.
INPUT rules are already allowed. Provides symmetry with the output path
which allows POSTROUTING rules.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Mon, 4 Jul 2016 21:50:58 +0000 (17:50 -0400)]
connector: make cn_proc explicitly non-modular
The Kconfig controlling build of this code is currently:
drivers/connector/Kconfig:config PROC_EVENTS
drivers/connector/Kconfig: bool "Report process events to userspace"
...meaning that it currently is not being built as a module by anyone.
Lets remove the two modular references, so that when reading the driver
there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.
Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns: fix return value check in hns_dsaf_get_cfg()
In case of error, function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset enables IPv4 unicast routing in the Mellanox Spectrum ASIC
switch driver. This builds upon the work that was done by a couple of
previous patchsets.
Patches 1,2,6 add a couple of dependencies outside the driver. Namely, the
ability to propagate ndo_neigh_construct()/destroy() through stacked devices and
a notification whenever DELAY_PROBE_TIME changes. When propagated down, the
ndos allow drivers to add and remove neighbour entries from their private
neighbour table. The DELAY_PROBE_TIME notification gives drivers the ability to
correctly configure their polling interval for neighbour activity, so that
active neighbour won't be marked as STALE.
Patches 3-5,7-8 add the neighbour offloading infrastructure, where patch 7 uses
the DELAY_PROBE_TIME notification in order to correctly configure the device's
polling interval. Patch 8 finally programs neighbours to the device's table
based on NEIGH_UPDATE notifications, so that directly connected routes can
be used.
Patches 9-16 build upon the previous patches and extend the router with
remote routes (nexthop) support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now, the driver sends arp probes for all unresolved neighbours that are
currently a nexthop for some route on the system. The job is set
periodically every 5 seconds.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Add the nexthop neigh activity update
For nexthop neighbours we need to make kernel to think there is a traffic
flowing to them preventing it from going to stale state. Otherwise
kernel would stale it and eventually the neigh would be removed from HW
and nexthop as well. That would reduce ECMP group in HW.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement next-hop routing offload including ECMP. To make it possible,
introduce next-hop group entity. This entity keeps track of resolved
neighbours and updates HW adjacency table accordingly. Note that HW
next-hops are stored in this adjacency table, in form of MAC.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: Introduce simplistic KVD linear area manager
This is a very simple manager for KVD linear area. Currently, the
allocator will either allocate a single entry from pre-defined sub-area,
or in case more than one entry is needed, it will allocate 32-entry chunk
in other pre-defined sub-area.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>