net: neighbour: Add mcast_resolicit to configure the number of multicast resolicitations in PROBE state.
We send unicast neighbor (ARP or NDP) solicitations ucast_probes
times in PROBE state. Zhu Yanjun reported that some implementation
does not reply against them and the entry will become FAILED, which
is undesirable.
We had been dealt with such nodes by sending multicast probes mcast_
solicit times after unicast probes in PROBE state. In 2003, I made
a change not to send them to improve compatibility with IPv6 NDP.
Let's introduce per-protocol per-interface sysctl knob "mcast_
reprobe" to configure the number of multicast (re)solicitation for
reconfirmation in PROBE state. The default is 0, since we have
been doing so for 10+ years.
Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> CC: Ulf Samuelsson <ulf.samuelsson@ericsson.com> Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Fri, 20 Mar 2015 22:14:31 +0000 (23:14 +0100)]
bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets
On ARM SoCs with bgmac Ethernet hardware we don't have any normal PHY.
There is always a switch attached but it's not even controlled over MDIO
like in case of MIPS devices.
We need a fixed PHY to be able to send/receive packets from the switch.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Mar 2015 01:40:47 +0000 (21:40 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-03-20
This series contains updates to ixgb, e1000e, igb and igbvf.
Eliezer and Todd provide patches to fix a potential issue found during code
inspection. When bringing down an interface netif_carrier_off() should
be one of the first things we do, since this will prevent the stack from
queueing more packets to this interface.
Yanir provides a fix for e1000e that was found in validating i219,
where the call to e1000e_write_protect_nvm_ich8lan() is no longer
supported in newer hardware. Access to these registers causes a
system freeze in early steppings and is ignored in later steppings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mathieu Olivari [Sat, 21 Mar 2015 01:31:03 +0000 (18:31 -0700)]
net: dsa: make NET_DSA manually selectable from the config
Change bd76a116707bd2381da36cf7c3183df11293f1d6 made all DSA drivers
depend on NET_DSA rather than selecting them. However, as the only way
to select this option was to actually select a driver, it made DSA
impossible to enable at all.
This patch adds an explicit entry which the user will have to enable
prior selecting a driver.
Signed-off-by: Mathieu Olivari <mathieu@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Sat, 21 Mar 2015 00:42:46 +0000 (17:42 -0700)]
switchdev: kernel-doc cleanup on swithdev ops
Suggested-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Todd Fujinaka [Sat, 21 Mar 2015 00:41:54 +0000 (17:41 -0700)]
igbvf: use netif_carrier_off earlier when bringing if down
Use netif_carrier_off() first, since that will prevent the stack from
queuing more packets to this IF. This operation is fast, and should
behave much nicer when trying to bring down an interface under load.
Todd Fujinaka [Sat, 21 Mar 2015 00:41:53 +0000 (17:41 -0700)]
igb: use netif_carrier_off earlier when bringing if down
Use netif_carrier_off() first, since that will prevent the stack from
queuing more packets to this IF. This operation is fast, and should
behave much nicer when trying to bring down an interface under load.
Yanir Lubetkin [Sat, 21 Mar 2015 00:41:53 +0000 (17:41 -0700)]
e1000e: NVM write protect access removed from SPT HW
The call to e1000e_write_protect_nvm_ich8lan() is no longer supported by HW.
Access to these registers causes a system freeze in A step hardware and is
ignored in B step hardware. This function must not be called in hardware
newer than LPT.
Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eliezer Tamir [Sat, 21 Mar 2015 00:41:52 +0000 (17:41 -0700)]
e1000e: call netif_carrier_off early on down
When bringing down an interface netif_carrier_off() should be
one the first things we do, since this will prevent the stack
from queuing more packets to this interface.
This operation is very fast, and should make the device behave
much nicer when trying to bring down an interface under load.
Also, this would Do The Right Thing (TM) if this device has some
sort of fail-over teaming and redirect traffic to the other IF.
Move netif_carrier_off as early as possible.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eliezer Tamir [Sat, 21 Mar 2015 00:41:52 +0000 (17:41 -0700)]
ixgb: call netif_carrier_off early on down
When bringing down an interface netif_carrier_off() should be
one the first things we do, since this will prevent the stack
from queuing more packets to this interface.
This operation is very fast, and should make the device behave
much nicer when trying to bring down an interface under load.
Also, this would Do The Right Thing (TM) if this device has some
sort of fail-over teaming and redirect traffic to the other IF.
Move netif_carrier_off as early as possible.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Fri, 20 Mar 2015 23:10:50 +0000 (19:10 -0400)]
Merge branch 'ebpf-next'
Daniel Borkmann says:
====================
This set adds native eBPF support also to act_bpf and thus covers tc
with eBPF in the classifier *and* action part.
A link to iproute2 preview has been provided in patch 2 and the code
will be pushed out after Stephen has processed the classifier part
and helper bits for tc.
This set depends on ced585c83b27 ("act_bpf: allow non-default TC_ACT
opcodes as BPF exec outcome"), so a net into net-next merge would be
required first. Hope that's fine by you, Dave. ;)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 20 Mar 2015 14:11:12 +0000 (15:11 +0100)]
act_bpf: add initial eBPF support for actions
This work extends the "classic" BPF programmable tc action by extending
its scope also to native eBPF code!
Together with commit e2e9b6541dd4 ("cls_bpf: add initial eBPF support
for programmable classifiers") this adds the facility to implement fully
flexible classifier and actions for tc that can be implemented in a C
subset in user space, "safely" loaded into the kernel, and being run in
native speed when JITed.
Also, since eBPF maps can be shared between eBPF programs, it offers the
possibility that cls_bpf and act_bpf can share data 1) between themselves
and 2) between user space applications. That means that, f.e. customized
runtime statistics can be collected in user space, but also more importantly
classifier and action behaviour could be altered based on map input from
the user space application.
For the remaining details on the workflow and integration, see the cls_bpf
commit e2e9b6541dd4. Preliminary iproute2 part can be found under [1].
Daniel Borkmann [Fri, 20 Mar 2015 14:11:11 +0000 (15:11 +0100)]
ebpf: add sched_act_type and map it to sk_filter's verifier ops
In order to prepare eBPF support for tc action, we need to add
sched_act_type, so that the eBPF verifier is aware of what helper
function act_bpf may use, that it can load skb data and read out
currently available skb fields.
This is bascially analogous to 96be4325f443 ("ebpf: add sched_cls_type
and map it to sk_filter's verifier ops").
BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be
separate since both will have a different set of functionality in
future (classifier vs action), thus we won't run into ABI troubles
when the point in time comes to diverge functionality from the
classifier.
The future plan for act_bpf would be that it will be able to write
into skb->data and alter selected fields mirrored in struct __sk_buff.
For an initial support, it's sufficient to map it to sk_filter_ops.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The be_main.c conflict resolution was really tricky. The conflict
hunks generated by GIT were very unhelpful, to say the least. It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().
So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged. And this worked beautifully.
The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 22:18:45 +0000 (18:18 -0400)]
rhashtable: Fix undeclared EEXIST build error on ia64
We need to include linux/errno.h in rhashtable.h since it doesn't
always get included otherwise.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
The following series of patches includes functional updates and changes
to the driver.
- Use the phydev->advertising field instead of the phydev->supported
field when configuring for auto-negotiation, etc.
- Use the phy_driver flags field for setting the transceiver type
instead of hardcoding it in the ethtool support.
- Provide an auto-negotiation timeout check
- Clarify the Tx/Rx queue information messages
- Use the new DMA memory barrier operations
- Set the device DMA mask based on what the hardware reports
- Remove the software implementation of Tx coalescing
- Fix the reporting of the Rx coalescing value
- Use napi_alloc_skb when allocating an SKB in softirq
This patch series is based on net-next.
Changes from v2:
- Use jiffies instead of timespec for the auto-negotiation timeout check
- Remove the Rx path SKB allocation re-work patch since we should only
inline the headers and the current code guards better against any
hardware bugs
Changes from v1:
- Default to 32-bit DMA width (minimum supported) if hardware returns
an unexpected DMA width value
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:34 +0000 (11:50 -0500)]
amd-xgbe: Fix Rx coalescing reporting
The Rx coalescing value is internally converted from usecs to a value
that the hardware can use. When reporting the Rx coalescing value, this
internal value is converted back to usecs. During the conversion from
and back to usecs some rounding occurs. So, for example, when setting an
Rx usec of 30, it will be reported as 29. Fix this reporting issue by
keeping the original usec value and using that during reporting.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:28 +0000 (11:50 -0500)]
amd-xgbe: Remove Tx coalescing
The Tx coalescing support in the driver was a software implementation
for something lacking in the hardware. Using hrtimers, the idea was to
trigger a timer interrupt after having queued a packet for transmit.
Unfortunately, as the timer value was lowered, the timer expired before
the hardware actually did the transmit and so it was racey and resulted
in unnecessary interrupts.
Remove the Tx coalescing support and hrtimer and replace with a Tx timer
that is used as a reclaim timer in case of inactivity.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:10 +0000 (11:50 -0500)]
amd-xgbe: Clarify output message about queues
Clarify that the queues referred to in a message when the device is
brought up are hardware queues and not necessarily related to the
Linux network queues.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:04 +0000 (11:50 -0500)]
amd-xgbe-phy: Provide support for auto-negotiation timeout
Currently, there is no interrupt code that indicates auto-negotiation
has timed out. If the auto-negotiation has timed out then the start of
a new auto-negotiation will begin again with a new base page being
received. The state machine could be in a state that is not expecting
this interrupt code which results in an error during auto-negotiation.
Update the code to timestamp when the auto-negotiation starts. Should
another page received interrupt code occur before auto-negotiation has
completed but after the auto-negotiation timeout, then reset the state
machine to allow the auto-negotiation to continue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:49:53 +0000 (11:49 -0500)]
amd-xgbe-phy: Use the phy_driver flags field
Remove the setting of the transceiver type when retrieving the device
settings using ethtool and instead set the transceiver type in the
phy_driver structure flags field. Change the transceiver type to be
internal, also.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:49:42 +0000 (11:49 -0500)]
amd-xgbe-phy: Use phydev advertising field vs supported
With ethtool being able to control what is advertised, the advertising
field is what should be used for priming the auto-negotiation registers
and for various other checks, instead of the supported field.
Also, move the initial setting of the supported and advertising fields
into the probe function so that they are not reset each time the device
is brought up, thus allowing the user to set as desired before bringing
the device up.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Catalin Marinas [Fri, 20 Mar 2015 16:48:13 +0000 (16:48 +0000)]
net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour
Commit db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit dbb490b96584 (net: socket: error on a negative msg_namelen).
In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (6a2a2b3ae075 net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by 08adb7dabd48 (fold verify_iovec() into
copy_msghdr_from_user()).
This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().
Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) Cc: David S. Miller <davem@davemloft.net> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series of patches introduces the inlined rhashtable interface.
The idea is to make all the function pointers visible to the compiler
by providing the rhashtable_params structure explicitly to each
inline rhashtable function. For example, instead of doing
obj = rhashtable_lookup(ht, key);
you would now do
obj = rhashtable_lookup_fast(ht, key, params);
Where params is the same data that you would give to rhashtable_init.
In particular, within rhashtable.c itself we would simply supply
ht->p.
So to convert users over, you simply have to make params globally
accessible, e.g., by placing it in a static const variable, which
can then be used at each inlined call site, as well as by the
rhashtable_init call.
The only ticky bit is that some users (i.e., netfilter) has a
dynamic key length. This is dealt with by using params.key_len
in the inline functions when it is non-zero, and otherwise falling
back on ht->p.key_len.
Note that I've only tested this on one compiler, gcc 4.7.2. So
please test this with your compilers as well and make sure that
the code is actually inlined without indirect function calls.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:02 +0000 (21:57 +1100)]
netfilter: Convert nft_hash to inlined rhashtable
This patch converts nft_hash to the inlined rhashtable interface.
This patch also replaces the call to rhashtable_lookup_compare with
a straight rhashtable_lookup_fast because it's simply doing a memcmp
(in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).
Furthermore, the compare function is only meant to compare, it is not
supposed to have side-effects. The current side-effect code can
simply be moved into the nft_hash_get.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:01 +0000 (21:57 +1100)]
netlink: Move namespace into hash key
Currently the name space is a de facto key because it has to match
before we find an object in the hash table. However, it isn't in
the hash value so all objects from different name spaces with the
same port ID hash to the same bucket.
This is bad as the number of name spaces is unbounded.
This patch fixes this by using the namespace when doing the hash.
Because the namespace field doesn't lie next to the portid field
in the netlink socket, this patch switches over to the rhashtable
interface without a fixed key.
This patch also uses the new inlined rhashtable interface where
possible.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:00 +0000 (21:57 +1100)]
rhashtable: Allow hash/comparison functions to be inlined
This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable. We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.
The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.
This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.
This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not. It
also adds a new function obj_cmpfn to compare a key against an
object. This means that the caller no longer needs to supply
explicit compare functions.
All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:56:59 +0000 (21:56 +1100)]
rhashtable: Make rhashtable_init params argument const
This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.
This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 19 Mar 2015 18:38:27 +0000 (19:38 +0100)]
ebpf, filter: do not convert skb->protocol to host endianess during runtime
Commit c24973957975 ("bpf: allow BPF programs access 'protocol' and 'vlan_tci'
fields") has added support for accessing protocol, vlan_present and vlan_tci
into the skb offset map.
As referenced in the below discussion, accessing skb->protocol from an eBPF
program should be converted without handling endianess.
The reason for this is that an eBPF program could simply do a check more
naturally, by f.e. testing skb->protocol == htons(ETH_P_IP), where the LLVM
compiler resolves htons() against a constant automatically during compilation
time, as opposed to an otherwise needed run time conversion.
After all, the way of programming both from a user perspective differs quite
a lot, i.e. bpf_asm ["ld proto"] versus a C subset/LLVM.
Reference: https://patchwork.ozlabs.org/patch/450819/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: invert join/leave anycast rtnl/socket locking order
Commit baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.
As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
do_ipv6_setsockopt, we are good to just move the rtnl lock upper.
Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket") Reported-by: Ying Huang <ying.huang@intel.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan: fix possible use of uninitialized in vxlan_igmp_{join, leave}
Test robot noticed that we check the return of vxlan_igmp_join and leave
but inside them there was a path that it could be used initialized.
It's not really possible because those if() inside these igmp functions
would always match as we can't have sockets of other type in there, but
this way we keep the compiler happy.
Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 17:25:56 +0000 (13:25 -0400)]
Merge branch 'be2net'
Sathya Perla says:
====================
be2net: patch set
Hi David, this patch set includes 3 bug fixes to the be2net driver.
Patch 1 fixes a vlan isolation issue with VFs. When a VF is placed in
promiscous mode, it could receive packets belonging to any vlan, as
the PF driver grants vlan promisc capability to VFs. The PF
driver now disables the vlan promisc capability for VFs to fix this
problem.
Patch 2 fixes the call to MODIFY_EQ_DELAY FW cmd to not include more
than 8 EQs per cmd. The FW is not capable of handling more than 8 EQs
per cmd.
Patch 3 fixes an EEH error detection issue. On Power platforms,
when an EEH error occurs, the slot disconnect state is more reliably
detected via an MMIO read compared to a config read. So, the error
register reads that occur every second are now done via MMIO.
Pls apply this patch set to the "net" tree. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Fri, 20 Mar 2015 10:28:25 +0000 (06:28 -0400)]
be2net: use PCI MMIO read instead of config read for errors
When an EEH error occurs, the device/slot is disconnected. This condition
is more reliably detected (i.e., returns all ones) with an MMIO read rather
than a config read -- especially on power platforms.
Hence, this patch fixes EEH error detection by replacing config reads with
MMIO reads for reading the error registers. The error registers in
Skyhawk-R/BE2/BE3 are accessible both via the config space and the
PCICFG (BAR0) memory space.
Reported-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Fri, 20 Mar 2015 10:28:24 +0000 (06:28 -0400)]
be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs
Issuing this cmd for more than 8 EQs does not have the intended effect
even on BEx and Skyhawk-R.
This patch fixes this by issuing this cmd for upto 8 EQs at a time. Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Fri, 20 Mar 2015 10:28:23 +0000 (06:28 -0400)]
be2net: Prevent VFs from enabling VLAN promiscuous mode
Currently, a PF does not restrict its VF interface from enabling vlan
promiscuous mode. This breaks vlan isolation when a vlan
(transparent tagging) is configured on a VF.
This patch fixes this problem by disabling the vlan promisc capability
for VFs.
Reported-by: Yoann Juet <veilletechno-irts@univ-nantes.fr> Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Hunt [Thu, 19 Mar 2015 23:19:30 +0000 (19:19 -0400)]
tcp: fix tcp fin memory accounting
tcp_send_fin() does not account for the memory it allocates properly, so
sk_forward_alloc can be negative in cases where we've sent a FIN:
ss example output (ss -amn | grep -B1 f4294):
tcp FIN-WAIT-1 0 1 192.168.0.1:45520 192.0.2.1:8080
skmem:(r0,rb87380,t0,tb87380,f4294966016,w1280,o0,bl0) Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Barth [Thu, 19 Mar 2015 15:16:04 +0000 (16:16 +0100)]
ipv6: fix backtracking for throw routes
for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4
A simple testcase for verification is:
ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1
Signed-off-by: Steven Barth <cyrus@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Markos Chandras [Thu, 19 Mar 2015 10:28:14 +0000 (10:28 +0000)]
net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}
On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.
Cc: <netdev@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Don Fry <pcnet32@frontier.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Thu, 19 Mar 2015 10:22:32 +0000 (11:22 +0100)]
ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment
Matt Grant reported frequent crashes in ipv6_select_ident when
udp6_ufo_fragment is called from openvswitch on a skb that doesn't
have a dst_entry set.
ipv6_proxy_select_ident generates the frag_id without using the dst
associated with the skb. This approach was suggested by Vladislav
Yasevich.
Fixes: 0508c07f5e0c ("ipv6: Select fragment id during UFO segmentation if not set.") Cc: Vladislav Yasevich <vyasevic@redhat.com> Reported-by: Matt Grant <matt@mattgrant.net.nz> Tested-by: Matt Grant <matt@mattgrant.net.nz> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Palik, Imre [Thu, 19 Mar 2015 10:05:42 +0000 (11:05 +0100)]
xen-netback: making the bandwidth limiter runtime settable
With the current netback, the bandwidth limiter's parameters are only
settable during vif setup time. This patch register a watch on them, and
thus makes them runtime changeable.
When the watch fires, the timer is reset. The timer's mutex is used for
fencing the change.
Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Imre Palik <imrep@amazon.de> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 20 Mar 2015 02:04:21 +0000 (19:04 -0700)]
net: increase sk_[max_]ack_backlog
sk_ack_backlog & sk_max_ack_backlog were 16bit fields, meaning
listen() backlog was limited to 65535.
It is time to increase the width to allow much bigger backlog,
if admins change /proc/sys/net/core/somaxconn &
/proc/sys/net/ipv4/tcp_max_syn_backlog default values.
Eric Dumazet [Fri, 20 Mar 2015 02:04:20 +0000 (19:04 -0700)]
inet: get rid of central tcp/dccp listener timer
One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.
This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.
SYNACK are sent in huge bursts, likely to cause severe drops anyway.
This model was OK 15 years ago when memory was very tight.
We now can afford to have a timer per request sock.
Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.
With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.
This is ~100 times more what we could achieve before this patch.
Later, we will get rid of the listener hash and use ehash instead.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 19 Mar 2015 22:31:13 +0000 (22:31 +0000)]
rhashtable: Round up/down min/max_size to ensure we respect limit
Round up min_size respectively round down max_size to the next power
of two to make sure we always respect the limit specified by the
user. This is required because we compare the table size against the
limit before we expand or shrink.
Also fixes a minor bug where we modified min_size in the params
provided instead of the copy stored in struct rhashtable.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 19 Mar 2015 23:43:10 +0000 (16:43 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"An update to Synaptics driver that makes it usable with the 2015
lineup from Lenovo"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: synaptics - use dmax in input_mt_assign_slots"
Input: synaptics - remove X250 from the topbuttonpad list
Input: synaptics - remove X1 Carbon 3rd gen from the topbuttonpad list
Input: synaptics - re-route tracksticks buttons on the Lenovo 2015 series
Input: synaptics - remove TOPBUTTONPAD property for Lenovos 2015
Input: synaptics - retrieve the extended capabilities in query $10
Input: synaptics - do not retrieve the board id on old firmwares
Input: synaptics - handle spurious release of trackstick buttons
Input: synaptics - fix middle button on Lenovo 2015 products
Input: synaptics - skip quirks when post-2013 dimensions
Input: synaptics - support min/max board id in min_max_pnpid_table
Input: synaptics - remove obsolete min/max quirk for X240
Input: synaptics - query min dimensions for fw v8.1
Input: synaptics - log queried and quirked dimension values
Input: synaptics - split synaptics_resolution(), query first
Linus Torvalds [Thu, 19 Mar 2015 23:27:36 +0000 (16:27 -0700)]
Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"This fixes minor issues with the multi-layer update in v4.0"
* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: upper fs should not be R/O
ovl: check lowerdir amount for non-upper mount
ovl: print error message for invalid mount options
Linus Torvalds [Thu, 19 Mar 2015 22:52:28 +0000 (15:52 -0700)]
Merge tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here is a slew of pin control fixes I've accumulated for the v4.0
kernel. Nothing special, just driver fixes (mainly embedded Intel it
seems) and a misunderstanding regarding the stub functions was
reverted:
- Fix up consumer return values on pin control stubs.
- Four patches fixing up the interrupt handling and sleep context
save in the Baytrail driver.
- Make default output directions work properly in the Cherryview
driver.
- Fix interrupt locking in the AT91 driver.
- Fix setting interrupt generating lines as input in the sunxi
driver"
* tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sun4i: GPIOs configured as irq must be set to input before reading
pinctrl: at91: move lock/unlock_as_irq calls into request/release
pinctrl: update direction_output function of cherryview driver
pinctrl: baytrail: Save pin context over system sleep
pinctrl: baytrail: Rework interrupt handling
pinctrl: baytrail: Clear interrupt triggering from pins that are in GPIO mode
pinctrl: baytrail: Relax GPIO request rules
Revert "pinctrl: consumer: use correct retval for placeholder functions"
Linus Torvalds [Thu, 19 Mar 2015 22:24:28 +0000 (15:24 -0700)]
Merge tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next
Pull two arch/nios2 fixes from Ley Foon Tan:
- Remove ucontext.h from exported arch headers
- nios2: mm: do not invoke OOM killer on kernel fault OOM
* tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next:
nios2: mm: do not invoke OOM killer on kernel fault OOM
nios2: Remove ucontext.h from exported arch headers
Shannon Nelson [Thu, 19 Mar 2015 21:32:01 +0000 (14:32 -0700)]
i40e: add NVM update events to AQ clean
Quit complaining about a couple of events that we actually expect to see
during an NVM update.
Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We can't directly call ipv6_sock_mc_join() but should use the stub
instead and protect it around IS_ENABLED.
Fixes: d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 19:30:25 +0000 (15:30 -0400)]
Merge branch 'cxgb4-next'
Hariprasad Shenai says:
====================
Add Device ID and make device ID table const
This patch series adds new device ID and makes device ID table const
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 and csiostor driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4/cxgb4vf/csiostor: Make PCI Device ID Tables be "const"
Make PCI Device ID Tables be "const" to move them out of the data segment and
remove a redundant check on CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN in
t4_pci_id_tbl.h to guard the contents of the include file.
Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This wont the last 4.1 bluetooth-next pull request, but we've piled up
enough patches in less than a week that I wanted to save you from a
single huge "last-minute" pull somewhere closer to the merge window.
The main changes are:
- Simultaneous LE & BR/EDR discovery support for HW that can do it
- Complete LE OOB pairing support
- More fine-grained mgmt-command access control (normal user can now do
harmless read-only operations).
- Added RF power amplifier support in cc2520 ieee802154 driver
- Some cleanups/fixes in ieee802154 code
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Fix packet header offset calculation in _decode_session6(), from
Hajime Tazaki.
2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang.
3) Be sure to clear state properly when scans fail in iwlwifi mvm code,
from Luciano Coelho.
4) iwlwifi tries to stop scans that aren't actually running, also from
Luciano Coelho.
5) mac80211 should drop mesh frames that are not encrypted, fix from
Bob Copeland.
6) Add new device ID to b43 wireless driver for BCM432228 chips, from
Rafał Miłecki.
7) Fix accidental addition of members after variable sized array in
struct tc_u_hnode, from WANG Cong.
8) Don't re-enable interrupts until after we call napi_complete() in
ibmveth and WIZnet drivers, frm Yongbae Park.
9) Fix regression in vlan tag handling of fec driver, from Fugang Duan.
10) If a network namespace change fails during rtnl_newlink(), we don't
unwind the device registry properly.
11) Fix two TCP regressions, from Neal Cardwell:
- Don't allow snd_cwnd_cnt to accumulate huge values due to missing
test in tcp_cong_avoid_ai().
- Restore CUBIC back to advancing cwnd by 1.5x packets per RTT.
12) Fix performance regression in xne-netback involving push TX
notifications, from David Vrabel.
13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not
dereference blindly. From Willem de Bruijn.
14) Fix potential stack overflow in RDS protocol stack, from Arnd
Bergmann.
15) VXLAN_VID_MASK used incorrectly in new remote checksum offload
support of VXLAN driver. Fix from Alexey Kodanev.
16) Fix too small netlink SKB allocation in inet_diag layer, from Eric
Dumazet.
17) ieee80211_check_combinations() does not count interfaces correctly,
from Andrei Otcheretianski.
18) Hardware feature determination in bxn2x driver references a piece of
software state that actually isn't initialized yet, fix from Michal
Schmidt.
19) inet_csk_wait_for_connect() needs a sched_annotate_sleep()
annoation, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
Revert "net: cx82310_eth: use common match macro"
net/mlx4_en: Set statistics bitmap at port init
IB/mlx4: Saturate RoCE port PMA counters in case of overflow
net/mlx4_en: Fix off-by-one in ethtool statistics display
IB/mlx4: Verify net device validity on port change event
act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome
Revert "smc91x: retrieve IRQ and trigger flags in a modern way"
inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()
ip6_tunnel: fix error code when tunnel exists
netdevice.h: fix ndo_bridge_* comments
bnx2x: fix encapsulation features on 57710/57711
mac80211: ignore CSA to same channel
nl80211: ignore HT/VHT capabilities without QoS/WMM
mac80211: ask for ECSA IE to be considered for beacon parse CRC
mac80211: count interfaces correctly for combination checks
isdn: icn: use strlcpy() when parsing setup options
rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
bridge: reset bridge mtu after deleting an interface
can: kvaser_usb: Fix tx queue start/stop race conditions
...
Erik Hugne [Thu, 19 Mar 2015 08:02:19 +0000 (09:02 +0100)]
tipc: add support for connect() on dgram/rdm sockets
Following the example of ip4_datagram_connect, we store the
address in the socket structure for dgram/rdm sockets and use
that as the default destination for subsequent send() calls.
It is allowed to connect to any address types, and the behaviour
of send() will be the same as a normal sendto() with this address
provided. Binding to an AF_UNSPEC address clears the association.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Thu, 19 Mar 2015 08:02:18 +0000 (09:02 +0100)]
tipc: do not report -EHOSTUNREACH for failed local delivery
Since commit 1186adf7df04 ("tipc: simplify message forwarding and
rejection in socket layer") -EHOSTUNREACH is propagated back to
the sending process if we fail to deliver the message to another
socket local to the node.
This is wrong, host unreachable should only be reported when the
destination port/name does not exist in the cluster, and that
check is always done before sending the message. Also, this
introduces inconsistent sendmsg() behavior for local/remote
destinations. Errors occurring on the receiving side should not
trickle up to the sender. If message delivery fails TIPC should
either discard the packet or reject it back to the sender based
on the destination droppable option.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Thu, 19 Mar 2015 08:02:17 +0000 (09:02 +0100)]
tipc: remove redundant call to tipc_node_remove_conn
tipc_node_remove_conn may be called twice if shutdown() is
called on a socket that have messages in the receive queue.
Calling this function twice does no harm, but is unnecessary
and we remove the redundant call.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Van Braeckel [Mon, 12 Jan 2015 04:22:16 +0000 (05:22 +0100)]
fuse: explicitly set /dev/fuse file's private_data
The misc subsystem (which is used for /dev/fuse) initializes private_data to
point to the misc device when a driver has registered a custom open file
operation, and initializes it to NULL when a custom open file operation has
*not* been provided.
This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc device
structure. And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.
So to simplify things in the misc subsystem, a patch [1] has been proposed to
*always* set the private_data to point to the misc device, instead of only
doing this when a custom open file operation has been registered.
But before this patch can be applied we need to modify drivers that make the
assumption that a misc device file's private_data is initialized to NULL
because they didn't register a custom open file operation, so they don't rely
on this assumption anymore. FUSE uses private_data to store the fuse_conn and
errors out if this is not initialized to NULL at mount time.
Hence, we now set a file's private_data to NULL explicitly, to be independent
of whatever value the misc subsystem initializes it to by default.
[1] https://lkml.org/lkml/2014/12/4/939
Reported-by: Giedrius Statkevicius <giedriuswork@gmail.com> Reported-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
NeilBrown [Sat, 21 Feb 2015 04:15:16 +0000 (15:15 +1100)]
mmc: pwrseq_simple: fix error path in mmc_pwrseq_simple_alloc
The current error-path code (when gpiod_get_index() reports
an error) can never free pwrseq->reset_gpios[0], but might
try to tree pwrseq->reset_gpios[-1], which has unfortunate
consequences.
Use jiffies_to_msecs for converting jiffies as it handles all of the corner
cases reliably and also helps readability. The printk format is fixed up
as jiffies_to_msecs returns unsigned int not unsigned long.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 02:52:33 +0000 (22:52 -0400)]
net: Fix high overhead of vlan sub-device teardown.
When a networking device is taken down that has a non-trivial number
of VLAN devices configured under it, we eat a full synchronize_net()
for every such VLAN device.
This is kind of rediculous because we already have infrastructure for
batching doing operation X to a list of net devices so that we only
incur one sync.
So make use of that by exporting dev_close_many() and adjusting it's
interfaace so that the caller can fully manage the batch list. Use
this in vlan_device_event() and all the overhead goes away.
Reported-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The custom USB_DEVICE_CLASS macro matches
bDeviceClass, bDeviceSubClass and bDeviceProtocol
but the common USB_DEVICE_AND_INTERFACE_INFO matches
bInterfaceClass, bInterfaceSubClass and bInterfaceProtocol instead, which are
not specified.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: David S. Miller <davem@davemloft.net>
where 'sw1' comes from the qemu command line -device rocker,name=sw1, and
'p1' is port 1.
Patch is adapted from Scott's phys_port_id patch.
Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 18 Mar 2015 02:23:15 +0000 (20:23 -0600)]
net: add support for phys_port_name
Similar to port id allow netdevices to specify port names and export
the name via sysfs. Drivers can implement the netdevice operation to
assist udev in having sane default names for the devices using the
rule:
Use of phys_name versus phys_id was suggested-by Jiri Pirko.
Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 02:15:28 +0000 (19:15 -0700)]
sparc: Fix /proc/kcore
/proc/kcore investigates the "System RAM" elements in /proc/iomem to
initialize it's memory tables. Therefore we have to register them
before it tries to do so. kcore uses device_initcall() so let's
use arch_initcall() for the registry.
Also we need ARCH_PROC_KCORE_TEXT to get the virtual addresses of
the kernel image correct.
Reported-by: David Ahern <david.ahern@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan: Move socket initialization to within rtnl scope
Currently, if a multicast join operation fail, the vxlan interface will
be UP but not functional, without even a log message informing the user.
Now that we can grab socket lock after already having rntl, we don't
need to defer socket creation and multicast operations. By not deferring
we can do proper error reporting to the user through ip exit code.
This patch thus removes all deferred work that vxlan had and put it back
inline. Now the socket will only be created, bound and join multicast
group when one bring the interface up, and will undo all that as soon as
one put the interface down.
As vxlan_sock_hold() is not used after this patch, it was removed too.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}
in favor of their inner __ ones, which doesn't grab rtnl.
As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.
So this patch:
- move rtnl handling to callers instead while already fixing some
reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
__ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
__ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
There are some setsockopt operations in ipv4 and ipv6 that are grabbing
rtnl after having grabbed the socket lock. Yet this makes it impossible
to do operations that have to lock the socket when already within a rtnl
protected scope, like ndo dev_open and dev_stop.
We normally take coarse grained locks first but setsockopt inverted that.
So this patch invert the lock logic for these operations and makes
setsockopt grab rtnl if it will be needed prior to grabbing socket lock.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Mar 2015 02:00:44 +0000 (22:00 -0400)]
Merge branch 'listen_refactor_part_13'
Eric Dumazet says:
====================
inet: tcp listener refactoring, part 13
inet_hash functions are in a bad state : Too much IPv6/IPv4 copy/pasting.
Lets refactor a bit.
Idea is that we do not want to have an equivalent of inet_csk(sk)->icsk_af_ops
for request socks in order to be able to use the right variant.
In this patch series, I started to let IPv6/IPv4 converge to common helpers.
Idea is to use ipv6_addr_set_v4mapped() even for AF_INET sockets, so that
we can test
if (sk->sk_family == AF_INET6 &&
!ipv6_addr_v4mapped(&sk->sk_v6_daddr))
to tell if we deal with an IPv6 socket, or IPv4 one, at least in slow paths.
Ideally, we could save 8 bytes per struct sock_common, if we
alias skc_daddr & skc_rcv_saddr to skc_v6_daddr[3]/skc_v6_rcv_saddr[3].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Wed, 18 Mar 2015 14:51:38 +0000 (16:51 +0200)]
net/mlx4_en: Set statistics bitmap at port init
Port statistics bitmap will now be initialized at port init. Even before
starting the port, statistics are visible to the user and must be properly masked.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Majd Dibbiny [Wed, 18 Mar 2015 14:51:37 +0000 (16:51 +0200)]
IB/mlx4: Saturate RoCE port PMA counters in case of overflow
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.
Fixes: c37791349cc7 ('IB/mlx4: Support PMA counters for IBoE') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Wed, 18 Mar 2015 14:51:36 +0000 (16:51 +0200)]
net/mlx4_en: Fix off-by-one in ethtool statistics display
NUM_PORT_STATS was 9 instead of 10, which caused off-by-one bug when
displaying the statistics starting from tx_chksum_offload in ethtool.
Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>