Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
aio: block io_destroy() until all context requests are completed
deletes aio context and all resources related to. It makes sense that
no IO operations connected to the context should be running after the context
is destroyed. As we removed io_context we have no chance to
get requests status or call io_getevents().
man page for io_destroy says that this function may block until
all context's requests are completed. Before kernel 3.11 io_destroy()
blocked indeed, but since aio refactoring in 3.11 it is not true anymore.
Here is a pseudo-code that shows a testcase for a race condition discovered
in 3.11:
initialize io_context
io_submit(read to buffer)
io_destroy()
// context is destroyed so we can free the resources
free(buffers);
// if the buffer is allocated by some other user he'll be surprised
// to learn that the buffer still filled by an outstanding operation
// from the destroyed io_context
The fix is straight-forward - add a completion struct and wait on it
in io_destroy, complete() should be called when number of in-fligh requests
reaches zero.
If two or more io_destroy() called for the same context simultaneously then
only the first one waits for IO completion, other calls behaviour is undefined.
Tested: ran http://pastebin.com/LrPsQ4RL testcase for several hours and
do not see the race condition anymore.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
1) Fix BPF filter validation of netlink attribute accesses, from
Mathias Kruase.
2) Netfilter conntrack generation seqcount not initialized properly,
from Andrey Vagin.
3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
from Patrick McHardy.
4) Properly limit MTU over ipv6, from Eric Dumazet.
5) Fix seccomp system call argument population on 32-bit, from Daniel
Borkmann.
6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
skb->mac_len needs to be used. From Vlad Yasevich.
7) We have several cases of using socket based communications to
implement a tunnel. For example, some tunnels are encapsulations
over UDP so we use an internal kernel UDP socket to do the
transmits.
These tunnels should behave just like other software devices and
pass the packets on down to the next layer.
Most importantly we want the top-level socket (eg TCP) that created
the traffic to be charged for the SKB memory.
However, once you get into the IP output path, we have code that
assumed that whatever was attached to skb->sk is an IP socket.
To keep the top-level socket being charged for the SKB memory,
whilst satisfying the needs of the IP output path, we now pass in an
explicit 'sk' argument.
From Eric Dumazet.
8) ping_init_sock() leaks group info, from Xiaoming Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
cxgb4: use the correct max size for firmware flash
qlcnic: Fix MSI-X initialization code
ip6_gre: don't allow to remove the fb_tunnel_dev
ipv4: add a sock pointer to dst->output() path.
ipv4: add a sock pointer to ip_queue_xmit()
driver/net: cosa driver uses udelay incorrectly
at86rf230: fix __at86rf230_read_subreg function
at86rf230: remove check if AVDD settled
net: cadence: Add architecture dependencies
net: Start with correct mac_len in skb_network_protocol
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
qlcnic: Fix PVID configuration on eSwitch port.
qlcnic: Fix max ring count calculation
qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
...
Function qlcnic_setup_tss_rss_intr() might enter endless
loop in case pci_enable_msix() contiguously returns a
positive number of MSI-Xs that could have been allocated.
Besides, the function contains 'err = -EIO;' assignment
that never could be reached. This update fixes the
aforementioned issues.
Cc: Shahed Shaikh <shahed.shaikh@qlogic.com> Cc: Dept-HSGLinuxNICDev@qlogic.com Cc: netdev@vger.kernel.org Cc: linux-pci@vger.kernel.org Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Mon, 14 Apr 2014 15:11:38 +0000 (17:11 +0200)]
ip6_gre: don't allow to remove the fb_tunnel_dev
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
this is unsafe, the module always supposes that this device exists. For example,
ip6gre_tunnel_lookup() may use it unconditionally.
Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
let ip6gre_destroy_tunnels() do the job).
Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6").
CC: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Apr 2014 17:47:15 +0000 (13:47 -0400)]
ipv4: add a sock pointer to dst->output() path.
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.
The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.
Fixes: 8f646c922d55 ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Apr 2014 16:58:34 +0000 (12:58 -0400)]
ipv4: add a sock pointer to ip_queue_xmit()
ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.
One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.
Fixes: 31c70d5956fc ("l2tp: keep original skb ownership") Reported-by: Zhan Jianyu <nasa4836@gmail.com> Tested-by: Zhan Jianyu <nasa4836@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 14 Apr 2014 16:48:02 +0000 (18:48 +0200)]
at86rf230: fix __at86rf230_read_subreg function
The __at86rf230_read_subreg function don't mask and shift register
contents which it should do. This patch adds the necessary masks and
shift operations in this function.
Since we have csma support this can make some trouble on state changes.
Since CSMA support turned on some bits in the TRX_STATUS register that
used to be zero, not masking broke checking of the TRX_STATUS field
after commanding a state change.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 14 Apr 2014 16:48:01 +0000 (18:48 +0200)]
at86rf230: remove check if AVDD settled
The AVDD regulator is only enabled when the RF section is active TX_ON
(PLL_ON) state. Since commit 7dcbd22a97eb0689e6c583ad630ae0e7341e34c1
("ieee802154: ensure that first RF212 state comes from TRX_OFF").
We are in TRX_OFF state at the time at86rf230_hw_init is run.
Note that this test would only fail in case of a severe hardware
malfunction (faulty/shorted power supply, etc.) so it wasn't all that
useful in the first place.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Reviewed-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Delvare [Mon, 14 Apr 2014 13:38:49 +0000 (15:38 +0200)]
net: cadence: Add architecture dependencies
The Cadence ethernet chipsets are only used on specific ARM
architectures. Add Kconfig dependencies so that drivers for these
chipsets are only buildable on the relevant architectures.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pull KVM fixes from Marcelo Tosatti:
- Fix for guest triggerable BUG_ON (CVE-2014-0155)
- CR4.SMAP support
- Spurious WARN_ON() fix
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: remove WARN_ON from get_kernel_ns()
KVM: Rename variable smep to cr4_smep
KVM: expose SMAP feature to guest
KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
KVM: Add SMAP support when setting CR4
KVM: Remove SMAP bit from CR4_RESERVED_BITS
KVM: ioapic: try to recover if pending_eoi goes out of range
KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
Pull bmc2835 crypto fix from Herbert Xu:
"This fixes a potential boot crash on bcm2835 due to the recent change
that now causes hardware RNGs to be accessed on registration"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: bcm2835 - fix oops when rng h/w is accessed during registration
smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.
In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
instructions in the body of the for loop
The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.
The following patchset contains three Netfilter fixes for your net tree,
they are:
* Fix missing generation sequence initialization which results in a splat
if lockdep is enabled, it was introduced in the recent works to improve
nf_conntrack scalability, from Andrey Vagin.
* Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
disabled otherwise this crashes due to a double release, from Andrey
Vagin.
* Fix nf_tables cmp fast in big endian, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: Start with correct mac_len in skb_network_protocol
Sometimes, when the packet arrives at skb_mac_gso_segment()
its skb->mac_len already accounts for some of the mac lenght
headers in the packet. This seems to happen when forwarding
through and OpenSSL tunnel.
When we start looking for any vlan headers in skb_network_protocol()
we seem to ignore any of the already known mac headers and start
with an ETH_HLEN. This results in an incorrect offset, dropped
TSO frames and general slowness of the connection.
We can start counting from the known skb->mac_len
and return at least that much if all mac level headers
are known and accounted for.
Fixes: 53d6471cef17262d3ad1c7ce8982a234244f68ec (net: Account for all vlan headers in skb_mac_gso_segment) CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Borkman <dborkman@redhat.com> Tested-by: Martin Filip <nexus+kernel@smoula.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.
Daniel Borkmann [Mon, 14 Apr 2014 19:45:17 +0000 (21:45 +0200)]
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
This reverts commit ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management
to reflect real state of the receiver's buffer") as it introduced a
serious performance regression on SCTP over IPv4 and IPv6, though a not
as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.
With the reverted patch applied, the SCTP/IPv4 performance is back
to normal on latest upstream for IPv4 and IPv6 and has same throughput
as 3.4.2 test kernel, steady and interval reports are smooth again.
Fixes: ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler <pbutler@sonusnet.com> Reported-by: Dongsheng Song <dongsheng.song@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Peter Butler <pbutler@sonusnet.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 14 Apr 2014 19:20:12 +0000 (21:20 +0200)]
net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has
been wrongly decoded by commit a8fc927780 ("sk-filter: Add ability to
get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS
although it should have been decoded as BPF_LD|BPF_W|BPF_ABS.
In practice, this should not have much side-effect though, as such
conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse
operation PR_GET_SECCOMP will only return the current seccomp mode, but
not the filter itself. Since the transition to the new BPF infrastructure,
it's also not used anymore, so we can simply remove this as it's
unreachable.
Fixes: a8fc927780 ("sk-filter: Add ability to get socket filter program (v2)") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.
The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:
Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:
syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Apr 2014 17:43:58 +0000 (13:43 -0400)]
Merge branch 'qlcnic'
Shahed Shaikh says:
====================
qlcnic: Bug fixes
This patch series contains following bug fixes -
* Send INIT_NIC_FUNC mailbox command as first mailbox
* Fix a panic because of uninitialized delayed_work.
* Fix inconsistent calculation of max rings count.
* Fix PVID configuration issue. Driver needs to clear older
PVID before adding new one.
* Fix QLogic application/driver interface by packing vNIC information
array.
* Fix a crash when user tries to disable SR-IOV while VFs are
still assigned to VMs.
Please apply to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
o While disabling SR-IOV when VFs are assigned to VMs causes host crash
so return -EPERM when user request to disable SR-IOV using pci sysfs in
case of VFs are assigned to VMs.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
o Application expect vNIC number as the array index but driver interface
return configuration in array index form.
o Pack the vNIC information array in the buffer such that application can
access it using vNIC number as the array index.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Clear older PVID before adding a newer PVID to the eSwicth port
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Do not read max rings count from qlcnic_get_nic_info(). Use driver defined
values for 82xx adapters. In case of 83xx adapters, use minimum of firmware
provided and driver defined values.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
o INIT_NIC_FUNC should be first mailbox sent. Sending DCB capability and
parameter query commands after that command.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
o AEN event was being received before initializing delayed_work struct
and handlers for it. This was resulting in crash. This patch fixes it.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Apr 2014 17:41:42 +0000 (13:41 -0400)]
Merge branch 'be2net'
Sathya Perla says:
====================
be2net: patch set
Patch 1/2 is a v2 of a patch that was submitted earlier (as a part of a
different patch-set). v2 incorporates a suggestion given by David Laight
for how long to poll for pending TX completions while disabling a device.
Patch 2/2 fixes a crash in be_remove()->be_close()
path after be2net has aborted an EEH error recovery
due to a permanant failure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Mon, 14 Apr 2014 10:42:41 +0000 (16:12 +0530)]
be2net: Fix invocation of be_close() after be_clear()
In the EEH error recovery path, when a permanent failure occurs,
we clean up adapter structure (i.e. destroy queues etc) by calling
be_clear() and return PCI_ERS_RESULT_DISCONNECT.
After this the stack tries to remove device from bus and calls
be_remove() which invokes netdev_unregister()->be_close().
be_close() operating on destroyed queues results in a
NULL dereference.
This patch fixes this problem by introducing a flag to keep track
of the setup state.
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: Fix to reap TX compls till HW doesn't respond for some time
be_close() currently waits for a max of 200ms to receive all pending
TX compls. This timeout value was roughly calculated based on 10G
transmission speeds and the TX queue depth. This timeout may not be
enough when the link is operating at lower speeds or in multi-channel/SR-IOV
configs with TX-rate limiting setting.
It is hard to calculate a "proper timeout value" that works in all
configurations. This patch solves this problem by continuing to reap
TX completions till the HW is completely silent for a period of 10ms or
a HW error is detected.
v2: implements the new scheme (as suggested by David Laight) instead of
just waiting longer than 200ms for reaping all completions.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Mon, 14 Apr 2014 08:17:22 +0000 (11:17 +0300)]
net/mlx4_core: Defer VF initialization till PF is fully initialized
Fix in commit [1] is not sufficient since a deferred VF initialization
could happen after pci_enable_sriov() is finished, but before the PF is
fully initialized.
Need to prevent VFs from initializing till the PF is fully ready and
comm channel is operational.
[1] - 9798935 "net/mlx4_core: mlx4_init_slave() shouldn't access comm
channel before PF is ready"
CC: Stuart Hayes <Stuart_Hayes@Dell.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 11 Apr 2014 04:23:36 +0000 (21:23 -0700)]
ipv6: Limit mtu to 65575 bytes
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.
We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
Tested:
ifconfig lo mtu 70000
netperf -H ::1
Before patch : Throughput : 0.05 Mbits
After patch : Throughput : 35484 Mbits
Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sat, 12 Apr 2014 11:17:57 +0000 (13:17 +0200)]
netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4
nft_cmp_fast is used for equality comparisions of size <= 4. For
comparisions of size < 4 byte a mask is calculated that is applied to
both the data from userspace (during initialization) and the register
value (during runtime). Both values are stored using (in effect) memcpy
to a memory area that is then interpreted as u32 by nft_cmp_fast.
This works fine on little endian since smaller types have the same base
address, however on big endian this is not true and the smaller types
are interpreted as a big number with trailing zero bytes.
The mask therefore must not include the lower bytes, but the higher bytes
on big endian. Add a helper function that does a cpu_to_le32 to switch
the bytes on big endian. Since we're dealing with a mask of just consequitive
bits, this works out fine.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
filter: prevent nla extensions to peek beyond the end of the message
The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
for a minimal message length before testing the supplied offset to be
within the bounds of the message. This allows the subtraction of the nla
header to underflow and therefore -- as the data type is unsigned --
allowing far to big offset and length values for the search of the
netlink attribute.
The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
also wrong. It has the minuend and subtrahend mixed up, therefore
calculates a huge length value, allowing to overrun the end of the
message while looking for the netlink attribute.
The following three BPF snippets will trigger the bugs when attached to
a UNIX datagram socket and parsing a message with length 1, 2 or 3.
,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
| ld #0x87654321
| ldx #42
| ld #nla
| ret a
`---
,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
| ld #0x87654321
| ldx #42
| ld #nlan
| ret a
`---
,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
| ; (needs a fake netlink header at offset 0)
| ld #0
| ldx #42
| ld #nlan
| ret a
`---
Fix the first issue by ensuring the message length fulfills the minimal
size constrains of a nla header. Fix the second bug by getting the math
for the remainder calculation right.
Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction") Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..") Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Extend commit 13378cad02afc2adc6c0e07fca03903c7ada0b37
("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid
RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0
which is displayed as 'iif *'.
inet_iif is not appropriate to use because skb_iif is not set.
Use the skb->dev->ifindex instead.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Sun, 13 Apr 2014 09:15:33 +0000 (11:15 +0200)]
Revert "net: mvneta: fix usage as a module on RGMII configurations"
This reverts commit e3a8786c10e75903f1269474e21fe8cb49c3a670. While
this commit allows to use the mvneta driver as a module on some
configurations, it breaks other configurations even if mvneta is used
built-in.
This breakage is due to the fact that on some RGMII platforms, the PCS
bit has to be set, and on some other platforms, it has to be
cleared. At the moment, we lack informations to know exactly the
significance of this bit (the datasheet only says "enables PCS"), and
so we can't produce a patch that will work on all platforms at this
point. And since this change is breaking the network completely for
many users, it's much better to revert it for now. We'll come back
later with a proper fix that takes into account all platforms.
Basically:
* Armada XP GP is configured as RGMII-ID, and needs the PCS bit to be
set.
* Armada 370 Mirabox is configured as RGMII-ID, and needs the PCS bit
to be cleared.
And at the moment, we don't know how to make the distinction between
those two cases. One hint is that the Armada XP GP appears in fact to
be using a QSGMII connection with the PHY (Quad-SGMII), but
configuring it as SGMII doesn't work, while RGMII-ID works. This needs
more investigation, but in the mean time, let's unbreak the network
for all those users.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reported-by: Arnaud Ebalard <arno@natisbad.org> Reported-by: Alexander Reuter <Alexander.Reuter@gmx.net> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=73401 Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yang [Mon, 14 Apr 2014 01:51:19 +0000 (09:51 +0800)]
net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()
pci_match_id() just match the static pci_device_id, which may return NULL if
someone binds the driver to a device manually using
/sys/bus/pci/drivers/.../new_id.
This patch wrap up a helper function __mlx4_remove_one() which does the tear
down function but preserve the drv_data. Functions like
mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
releasing drvdata.
Fixes: 97a5221 "net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset". CC: Bjorn Helgaas <bhelgaas@google.com> CC: Amir Vadai <amirv@mellanox.com> CC: Jack Morgenstein <jackm@dev.mellanox.co.il> CC: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ipv4: current group_info should be put after using.
Plug a group_info refcount leak in ping_init.
group_info is only needed during initialization and
the code failed to release the reference on exit.
While here move grabbing the reference to a place
where it is actually needed.
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com> Signed-off-by: xiaoming wang <xiaoming.wang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cifs: Use min_t() when comparing "size_t" and "unsigned long"
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":
fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast
Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().
Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab changes from Pekka Enberg:
"The biggest change is byte-sized freelist indices which reduces slab
freelist memory usage:
https://lkml.org/lkml/2013/12/2/64"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: slab/slub: use page->list consistently instead of page->lru
mm/slab.c: cleanup outdated comments and unify variables naming
slab: fix wrongly used macro
slub: fix high order page allocation problem with __GFP_NOFAIL
slab: Make allocations with GFP_ZERO slightly more efficient
slab: make more slab management structure off the slab
slab: introduce byte sized index for the freelist of a slab
slab: restrict the number of objects in a slab
slab: introduce helper functions to get/set free object
slab: factor out calculate nr objects in cache_estimate
Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull misc kbuild changes from Michal Marek:
"Here is the non-critical part of kbuild:
- One bogus coccinelle check removed, one check fixed not to suggest
the obsolete PTR_RET macro
- scripts/tags.sh does not index the generated *.mod.c files
- new objdiff tool to list differences between two versions of an
object file
- A fix for scripts/bootgraph.pl"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts/coccinelle: Use PTR_ERR_OR_ZERO
scripts/bootgraph.pl: Add graphic header
scripts: objdiff: detect object code changes between two commits
Coccicheck: Remove memcpy to struct assignment test
scripts/tags.sh: Ignore *.mod.c
sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue
This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.
When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function
sym_dequeue_from_squeue.
This function aborts them with DID_SOFT_ERROR.
If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.
The result is that disk returning QUEUE FULL causes request failures.
The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags. The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags. The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mackerras [Fri, 11 Apr 2014 06:43:35 +0000 (16:43 +1000)]
powerpc: Don't try to set LPCR unless we're in hypervisor mode
Commit 8f619b5429d9 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode. The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic. The visible result is that
the kernel hangs after printing "returning from prom_init".
This fixes it by checking for hypervisor mode being available before
setting LPCR. If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
futex: update documentation for ordering guarantees
Commits 11d4616bd07f ("futex: revert back to the explicit waiter
counting code") and 69cd9eba3886 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.
The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.
Pull yet more networking updates from David Miller:
1) Various fixes to the new Redpine Signals wireless driver, from
Fariya Fatima.
2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
Dmitry Petukhov.
3) UFO and TSO packets differ in whether they include the protocol
header in gso_size, account for that in skb_gso_transport_seglen().
From Florian Westphal.
4) If VLAN untagging fails, we double free the SKB in the bridging
output path. From Toshiaki Makita.
5) Several call sites of sk->sk_data_ready() were referencing an SKB
just added to the socket receive queue in order to calculate the
second argument via skb->len. This is dangerous because the moment
the skb is added to the receive queue it can be consumed in another
context and freed up.
It turns out also that none of the sk->sk_data_ready()
implementations even care about this second argument.
So just kill it off and thus fix all these use-after-free bugs as a
side effect.
6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
7) pktgen needs to do locking properly for LLTX devices, from Daniel
Borkmann.
8) xen-netfront driver initializes TX array entries in RX loop :-) From
Vincenzo Maffione.
9) After refactoring, some tunnel drivers allow a tunnel to be
configured on top itself. Fix from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vti: don't allow to add the same tunnel twice
gre: don't allow to add the same tunnel twice
drivers: net: xen-netfront: fix array initialization bug
pktgen: be friendly to LLTX devices
r8152: check RTL8152_UNPLUG
net: sun4i-emac: add promiscuous support
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
net: ipv6: Fix oif in TCP SYN+ACK route lookup.
drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
drivers: net: cpsw: discard all packets received when interface is down
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Drivers: net: hyperv: Address UDP checksum issues
Drivers: net: hyperv: Negotiate suitable ndis version for offload support
Drivers: net: hyperv: Allocate memory for all possible per-pecket information
bridge: Fix double free and memory leak around br_allowed_ingress
bonding: Remove debug_fs files when module init fails
i40evf: program RSS LUT correctly
i40evf: remove open-coded skb_cow_head
ixgb: remove open-coded skb_cow_head
igbvf: remove open-coded skb_cow_head
...
Merge tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc
Pull remoteproc cleanups from Ohad Ben-Cohen:
"Several remoteproc cleanup patches coming from Jingoo Han, Julia
Lawall and Uwe Kleine-König"
* tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
remoteproc/ste_modem: staticize local symbols
remoteproc/davinci: simplify use of devm_ioremap_resource
remoteproc/davinci: drop needless devm_clk_put
Merge tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel
Pull llvm patches from Behan Webster:
"These are some initial updates to support compiling the kernel with
clang.
These patches have been through the proper reviews to the best of my
ability, and have been soaking in linux-next for a few weeks. These
patches by themselves still do not completely allow clang to be used
with the kernel code, but lay the foundation for other patches which
are still under review.
Several other of the LLVMLinux patches have been already added via
maintainer trees"
* tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel:
x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id"
x86 kbuild: LLVMLinux: More cc-options added for clang
x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI
LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h
LLVMLinux: Remove warning about returning an uninitialized variable
kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang
Documentation: LLVMLinux: Update Documentation/dontdiff
kbuild: LLVMLinux: Adapt warnings for compilation with clang
kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"Here are the target pending updates for v3.15-rc1. Apologies in
advance for waiting until the second to last day of the merge window
to send these out.
The highlights this round include:
- iser-target support for T10 PI (DIF) offloads (Sagi + Or)
- Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
- Pass in transport supported PI at session initialization (Sagi + MKP + nab)
- Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
- Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
- Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
- Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)
Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
metadata into KVM guest have been left-out for now, as there where a
few comments from MST + Paolo that where not able to be addressed in
time for v3.15. Please expect this feature for v3.16-rc1"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
ib_srpt: Use correct ib_sg_dma primitives
target/tcm_fc: Rename ft_tport_create to ft_tport_get
target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
target/tcm_fc: Rename structs and list members for clarity
target/tcm_fc: Limit to 1 TPG per wwn
target/tcm_fc: Don't export ft_lport_list
target/tcm_fc: Fix use-after-free of ft_tpg
target: Add check to prevent Abort Task from aborting itself
target: Enable READ_STRIP emulation in target_complete_ok_work
target/sbc: Add sbc_dif_read_strip software emulation
target: Enable WRITE_INSERT emulation in target_execute_cmd
target/sbc: Add sbc_dif_generate software emulation
target/sbc: Only expose PI read_cap16 bits when supported by fabric
target/spc: Only expose PI mode page bits when supported by fabric
target/spc: Only expose PI inquiry bits when supported by fabric
target: Pass in transport supported PI at session initialization
target/iblock: Fix double bioset_integrity_free bug
Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
target/rd: T10-Dif: RAM disk is allocating more space than required.
iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
...
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A series of bug fix patches for v3.15-rc1. Most are just driver
fixes. There are some changes at remote controller core level, fixing
some definitions on a new API added for Kernel v3.15.
It also adds the missing include at include/uapi/linux/v4l2-common.h,
to allow its compilation on userspace, as pointed by you"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (24 commits)
[media] gpsca: remove the risk of a division by zero
[media] stk1160: warrant a NUL terminated string
[media] v4l: ti-vpe: retain v4l2_buffer flags for captured buffers
[media] v4l: ti-vpe: Set correct field parameter for output and capture buffers
[media] v4l: ti-vpe: zero out reserved fields in try_fmt
[media] v4l: ti-vpe: Fix initial configuration queue data
[media] v4l: ti-vpe: Use correct bus_info name for the device in querycap
[media] v4l: ti-vpe: report correct capabilities in querycap
[media] v4l: ti-vpe: Allow usage of smaller images
[media] v4l: ti-vpe: Use video_device_release_empty
[media] v4l: ti-vpe: Make sure in job_ready that we have the needed number of dst_bufs
[media] lgdt3305: include sleep functionality in lgdt3304_ops
[media] drx-j: use customise option correctly
[media] m88rs2000: fix sparse static warnings
[media] r820t: fix size and init values
[media] rc-core: remove generic scancode filter
[media] rc-core: split dev->s_filter
[media] rc-core: do not change 32bit NEC scancode format for now
[media] rtl28xxu: remove duplicate ID 0458:707f Genius TVGo DVB-T03
[media] xc2028: add missing break to switch
...
Merge tag 'ntb-3.15' of git://github.com/jonmason/ntb
Pull PCIe non-transparent bridge fixes and features from Jon Mason:
"NTB driver bug fixes to address issues in list traversal, skb leak in
ntb_netdev, a typo, and a leak of msix entries in the error path.
Clean ups of the event handling logic, as well as a overall style
cleanup. Finally, the driver was converted to use the new
pci_enable_msix_range logic (and the refactoring to go along with it)"
* tag 'ntb-3.15' of git://github.com/jonmason/ntb:
ntb: Use pci_enable_msix_range() instead of pci_enable_msix()
ntb: Split ntb_setup_msix() into separate BWD/SNB routines
ntb: Use pci_msix_vec_count() to obtain number of MSI-Xs
NTB: Code Style Clean-up
NTB: client event cleanup
ntb: Fix leakage of ntb_device::msix_entries[] array
NTB: Fix typo in setting one translation register
ntb_netdev: Fix skb free issue in open
ntb_netdev: Fix list_for_each_entry exit issue
In file included from fs/ceph/super.h:4:0,
from fs/ceph/ioctl.c:3:
include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
^
In file included from include/linux/kernel.h:13:0,
from include/linux/uio.h:12,
from include/linux/socket.h:7,
from include/uapi/linux/in.h:22,
from include/linux/in.h:23,
from fs/ceph/ioctl.c:1:
include/linux/printk.h:214:0: note: this is the location of the previous definition
#define pr_fmt(fmt) fmt
^
where the reason is that <linux/ceph_debug.h> is included much too late
for the "pr_fmt()" define.
The include of <linux/ceph_debug.h> needs to be the first include in the
file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
noticeable until some unrelated header file changes brought in an
indirect earlier include of <linux/kernel.h>.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
David S. Miller [Sat, 12 Apr 2014 21:03:20 +0000 (17:03 -0400)]
Merge branch 'tunnels'
Nicolas Dichtel says:
====================
tunnels: don't allow to add the same tunnel twice
This series fixes the check of an existing tunnel with the same
parameters when a new tunnel is added. I've checked all users of
ip_tunnel_newlink(): gre, gretap, ipip and vti. The bug exists only
for gre and vti.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 11 Apr 2014 13:51:19 +0000 (15:51 +0200)]
vti: don't allow to add the same tunnel twice
Before the patch, it was possible to add two times the same tunnel:
ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code").
CC: Cong Wang <amwang@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 11 Apr 2014 13:51:18 +0000 (15:51 +0200)]
gre: don't allow to add the same tunnel twice
Before the patch, it was possible to add two times the same tunnel:
ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code.").
CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the initialization of an array used in the TX
datapath that was mistakenly initialized together with the
RX datapath arrays. An out of range array access could happen
when RX and TX rings had different sizes.
Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 12 Apr 2014 20:36:44 +0000 (16:36 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to e1000, e1000e, igb, igbvf, ixgb, ixgbe,
ixgbevf and i40evf.
Mark fixes an issue with ixgbe and ixgbevf by adding a bit to indicate
when workqueues have been initialized. This permits the register read
error handling from attempting to use them prior to that, which also
generates warnings. Checking for a detected removal after initializing
the work queues allows the probe function to return an error without
getting the workqueue involved. Further, if the error_detected
callback is entered before the workqueues are initialized, exit without
recovery since the device initialization was so truncated.
Francois Romieu provides several patches to all the drivers to remove
the open coded skb_cow_head.
Jakub Kicinski provides a fix for igb where last_rx_timestamp should be
updated only when Rx time stamp is read.
Mitch provides a fix for i40evf where a recent change broke the RSS LUT
programming causing it to be programmed with all 0's.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
"This includes the final patch to clean up and fix the issue with the
design of tracepoints and how a user could register a tracepoint and
have that tracepoint not be activated but no error was shown.
The design was for an out of tree module but broke in tree users. The
clean up was to remove the saving of the hash table of tracepoint
names such that they can be enabled before they exist (enabling a
module tracepoint before that module is loaded). This added more
complexity than needed. The clean up was to remove that code and just
enable tracepoints that exist or fail if they do not.
This removed a lot of code as well as the complexity that it brought.
As a side effect, instead of registering a tracepoint by its name, the
tracepoint needs to be registered with the tracepoint descriptor.
This removes having to duplicate the tracepoint names that are
enabled.
The second patch was added that simplified the way modules were
searched for.
This cleanup required changes that were in the 3.15 queue as well as
some changes that were added late in the 3.14-rc cycle. This final
change waited till the two were merged in upstream and then the change
was added and full tests were run. Unfortunately, the test found some
errors, but after it was already submitted to the for-next branch and
not to be rebased. Sparse errors were detected by Fengguang Wu's bot
tests, and my internal tests discovered that the anonymous union
initialization triggered a bug in older gcc compilers. Luckily, there
was a bugzilla for the gcc bug which gave a work around to the
problem. The third and fourth patch handled the sparse error and the
gcc bug respectively.
A final patch was tagged along to fix a missing documentation for the
README file"
* tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add missing function triggers dump and cpudump to README
tracing: Fix anonymous unions in struct ftrace_event_call
tracepoint: Fix sparse warnings in tracepoint.c
tracepoint: Simplify tracepoint module search
tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
Al Viro [Thu, 3 Apr 2014 14:27:17 +0000 (10:27 -0400)]
cifs: fix the race in cifs_writev()
O_APPEND handling there hadn't been completely fixed by Pavel's
patch; it checks the right value, but it's racy - we can't really
do that until i_mutex has been taken.
Fix by switching to __generic_file_aio_write() (open-coding
generic_file_aio_write(), actually) and pulling mutex_lock() above
inode_size_read().
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Daniel Borkmann [Fri, 11 Apr 2014 11:22:00 +0000 (13:22 +0200)]
pktgen: be friendly to LLTX devices
Similarly to commit 43279500deca ("packet: respect devices with
LLTX flag in direct xmit"), we can basically apply the very same
to pktgen. This will help testing against LLTX devices such as
dummy driver (or others), which only have a single netdevice txq
and would otherwise require locking their txq from pktgen side
while e.g. in dummy case, we would not need any locking. Fix this
by making use of HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will
be respected.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the device is unplugged, the driver would try to disable the
device. Add checking the flag of RTL8152_UNPLUG to skip setting
the device when it is unplugged. This could shorten the time of
unloading the driver.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Zyngier [Fri, 11 Apr 2014 09:46:17 +0000 (10:46 +0100)]
net: sun4i-emac: add promiscuous support
The sun4i-emac driver is rather primitive, and doesn't support
promiscuous mode. This makes usage such as bridging impossible,
which is a shame on virtualization capable HW such as the
Allwinner A20.
The fix is fairly simple: move the RX setup code to the ndo_set_rx_mode
vector, and add the required HW configuration when IFF_PROMISC is passed
by the core code.
This has been tested on a generic A20 box running a few virtual
machines hanging off a bridge with the EMAC chip as the link to the
outside world.
Cc: Stefan Roese <sr@denx.de> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Stefan Roese <sr@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Bolle [Thu, 4 Apr 2013 10:31:06 +0000 (12:31 +0200)]
Blackfin: bf537: rename "CONFIG_ADT75"
In v3.2 the Analog Devices ADT75 temperature sensor driver was removed
as an IIO driver and support for it was added to the LM75 HWMON driver.
But it was apparently overlooked to rename one reference to CONFIG_ADT75
to CONFIG_SENSORS_LM75. Do so now. Use the IS_ENABLED() macro, while
we're at it.
Paul Bolle [Thu, 4 Apr 2013 10:08:25 +0000 (12:08 +0200)]
Blackfin: bf537: rename "CONFIG_AD7314"
In v3.2 the Analog Devices AD7314 temperature sensor driver was removed
as an IIO driver and added as a HWMON driver. But it was apparently
overlooked to rename two references to CONFIG_AD7314 to
CONFIG_SENSORS_AD7314. Do so now. Use the IS_ENABLED() macro, while
we're at it.
Paul Bolle [Thu, 4 Apr 2013 11:02:10 +0000 (13:02 +0200)]
Blackfin: bf537: rename ad2s120x ->ad2s1200
In v3.2 the Analog Devices ad2s1200/ad2s1205 driver was renamed from
ad2s120x to ad2s1200. But it apparently forgot to rename the references
to this driver in the BF537-STAMP code. Rename these now, and use the
IS_ENABLED() macro, while we're at it.
There's a (rather subtle) typo in "CONFIG_SND_SOC_ADV80X_MODULE". Fix it
once and for all by using IS_ENABLED(), which is designed to avoid
issues like this.
Pull md updates from Neil Brown:
"Just a few md patches for the 3.15 merge window.
Not much happening in md/raid at the moment. Just a few bug fixes
(one for -stable) and a couple of performance tweaks"
* tag 'md/3.15' of git://neil.brown.name/md:
raid5: get_active_stripe avoids device_lock
raid5: make_request does less prepare wait
md: avoid oops on unload if some process is in poll or select.
md/raid1: r1buf_pool_alloc: free allocate pages when subsequent allocation fails.
md/bitmap: don't abuse i_writecount for bitmap files.
Pull NVMe driver updates from Matthew Wilcox:
"Various updates to the NVMe driver. The most user-visible change is
that drive hotplugging now works and CPU hotplug while an NVMe drive
is installed should also work better"
* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Retry failed commands with non-fatal errors
NVMe: Add getgeo to block ops
NVMe: Start-stop nvme_thread during device add-remove.
NVMe: Make I/O timeout a module parameter
NVMe: CPU hot plug notification
NVMe: per-cpu io queues
NVMe: Replace DEFINE_PCI_DEVICE_TABLE
NVMe: Fix divide-by-zero in nvme_trans_io_get_num_cmds
NVMe: IOCTL path RCU protect queue access
NVMe: RCU protected access to io queues
NVMe: Initialize device reference count earlier
NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions
Andy Grover [Fri, 4 Apr 2014 23:54:13 +0000 (16:54 -0700)]
target/tcm_fc: Rename structs and list members for clarity
Rename struct ft_lport_acl to ft_lport_wwn. "acl" is associated with
something different in LIO terms. Really, ft_lport_wwn is the
fabric-specific wrapper for the struct se_wwn.
Rename "lacl" local variables to "ft_wwn" as well.
Rename list_heads used as list members to make it clear they're nodes, not
heads.
Rename lport_node to ft_wwn_node.
Rename ft_lport_list to ft_wwn_list
Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Fri, 4 Apr 2014 23:44:37 +0000 (16:44 -0700)]
target/tcm_fc: Fix use-after-free of ft_tpg
ft_del_tpg checks tpg->tport is set before unlinking the tpg from the
tport when the tpg is being removed. Set this pointer in ft_tport_create,
or the unlinking won't happen in ft_del_tpg and tport->tpg will reference
a deleted object.
This patch sets tpg->tport in ft_tport_create, because that's what
ft_del_tpg checks, and is the only way to get back to the tport to
clear tport->tpg.
The bug was occuring when:
- lport created, tport (our per-lport, per-provider context) is
allocated.
tport->tpg = NULL
- tpg created
- a PRLI is received. ft_tport_create is called, tpg is found and
tport->tpg is set
- tpg removed. ft_tpg is freed in ft_del_tpg. Since tpg->tport was not
set, tport->tpg is not cleared and points at freed memory
- Future calls to ft_tport_create return tport via first conditional,
instead of searching for new tpg by calling ft_lport_find_tpg.
tport->tpg is still invalid, and will access freed memory.
see https://bugzilla.redhat.com/show_bug.cgi?id=1071340
Cc: stable@vger.kernel.org # 3.0+ Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Alex Leung [Fri, 4 Apr 2014 04:38:19 +0000 (04:38 +0000)]
target: Add check to prevent Abort Task from aborting itself
This patch addresses an issue that occurs when an ABTS is received
for an se_cmd that completes just before the sess_cmd_list is searched
in core_tmr_abort_task(). When the sess_cmd_list is searched, since
the ABTS and the FCP_CMND being aborted (that just completed) both
have the same OXID, TFO->get_task_tag(TMR) returns a value that
matches tmr->ref_task_tag (from TFO->get_task_tag(FCP_CMND)), and
the Abort Task tries to abort itself. When this occurs,
transport_wait_for_tasks() hangs forever since the TMR is waiting
for itself to finish.
This patch adds a check to core_tmr_abort_task() to make sure the
TMR does not attempt to abort itself.
Signed-off-by: Alex Leung <alex.leung@emulex.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>