Daniel Borkmann [Wed, 15 Jul 2015 12:21:41 +0000 (14:21 +0200)]
cls_cgroup: factor out classid retrieval
Split out retrieving the cgroups net_cls classid retrieval into its
own function, so that it can be reused later on from other parts of
the traffic control subsystem. If there's no skb->sk, then the small
helper returns 0 as well, which in cls_cgroup terms means 'could not
classify'.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
enic: allow adaptive coalesce setting for msi/legacy intr
* Allow setting of adaptive coalescing setting for all types of interrupt.
* In msi & legacy intr, we use single interrupt for rx & tx. In this case
tx_coalesce_usecs is invalid. We should use only rx_coalesce_usecs.
Do not display tx_coal values for msi/intx. And do not allow user to set
this as well.
* Driver supports only tx/rx_coalesce_usec and adaptive coalesce settings.
For other values, driver does not return error. So ethtool succeeds for
unsupported values. Introduce enic_coalesce_valid() function to validate
the coalescing values.
* If user requests for coalesce value greater than what adaptor supports,
driver uses the max value. We should at least log this.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
enic: add adaptive coalescing intr for intx and msi poll
Adaptive interrupt coalescing is available for msix. This patch adds the support
for msi poll. Interface for adaptive interrupt coalescing is already added in
driver. We just did not enable it for legacy intr & msi.
enic_calc_int_moderation() & enic_set_int_moderation() are defined as static
after enic_poll. Since enic_poll needs it, move both of these function
definitions above enic_poll. No change in functionality.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
User space daemons can detect errors in the network that need to be
notified to the switch device drivers.
Drivers can react to this error state by doing a phy-down on the
switch-port which would result in a carrier-off locally and on the directly
connected switch. Doing that would prevent loops and black-holes in the
network.
One such use case is the multi-chassis LAG application -
1. The MLAG application runs on peer switches (say Switch0 and Switch1)
synchronizing states, forwarding entries etc. between the two
switches over the peer-link (this is a link directly connecting the
two switches).
2. An MLAG election process designates one of the switches as a primary
(for e.g. Switch0 is primary and Switch1 is secondary).
3. The peer link plays a critical role in allowing Switch0-Switch1 to
function as a single LAG partner to the downstream dual-connected
servers. When the peer-link between the switches goes down we have a
split-brain situation. Switch0 and Switch1 are no longer in sync and
are acting independently. This can result in traffic loops and
traffic black-holing in the network.
4. To prevent these problems the MLAG application on the secondary
switch phy-downs the MLAG ports on detecting the peer-link down.
This will be seen as a carrier down on servers that are
dual-connected to Switch0 and Switch1.
5. Specifically a dual-connected server will see a carrier-down on the
port connected to the MLAG secondary, Switch1, and will stop using
that port for traffic TX. So traffic black holing is prevented.
v6 to v7:
Removed some unnecessary code in response to review comments.
v5 to v6:
Replaced proto_flags with a simple proto_down boolean attribute in
response to Dave's comments.
v4 to v5:
Changed the ip link display format for protodown to match the set as
recommended by Stephen.
v3 to v4:
I have moved protodown out of IFF_XXX and introduced a separate
proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify
switch port errors. This is in response to Stephen's comments that
adding a new IFF_XXX may break user space.
I have used rocker as the sample switch driver. And to test this
functionality I used the qemu-rocker patch that Scott sent out in
response to the v3 posting (needed to set link up/down when phy is
enabled/disabled).
v1 to v2:
Based on Dave's suggestion I have moved out aggregating of error bits
across applications to a user space framework. This patch now simply
notifies an aggregated error bit to drivers enabling them to handle
the error gracefully.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
protodown can be set by user space applications like MLAG on detecting
errors on a switch port. This patch provides sample switch driver changes
for handling protodown. Rocker PHYS disables the port in response to
protodown.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netlink: changes for setting and clearing protodown via netlink.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.
The switch driver can react to protodown notification by doing a phys down
on the associated switch port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Tue, 14 Jul 2015 15:51:51 +0000 (10:51 -0500)]
ibmveth: add support for TSO6
This patch adds support for a new method of signalling the firmware
that TSO packets are being sent. The new method removes the need to
alter the ip and tcp checksums and allows TSO6 support.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: Add close of RNDIS filter into change mtu call
The current change mtu call only stops tx before removing RNDIS filter.
In case ringbufer is not empty, the rndis_filter_device_remove() may
hang on removing the buffers.
This patch adds close of RNDIS filter before removing it, also a
gradual waiting loop until the ring is empty. The change_mtu hang
issue under heavy traffic is solved by this patch.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Fix finding best source address in ipv6_dev_get_saddr().
Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when
finding source address on specific interface.") did not properly
update best source address available. Plus, it introduced
possible NULL pointer dereference.
Bug was reported by Erik Kline <ek@google.com>.
Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>.
Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not
iterate over all interfaces when finding source address
on specific interface.") Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 Jul 2015 00:31:14 +0000 (17:31 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-07-14
This series contains updates to i40e and i40evf only.
Joe Stringer and Jesse Gross add a ndo_features_check function to ensure
that the i40e driver does not try to offload packets that exceed 80 bytes
in length.
Anjali adds additional stats to track flow director ATR and SB current
state and flow director flush count which will help the need for verbose
debug logs with respect to flow director. Also refines an error message
to avoid confusion, so that it indicates what may have really happened
when the init_shared_code() call possibly fails.
Pawel adds new fields to the capabilities structures to handle Flex-10
device/function capabilities which is needed to support Flex-10 configs.
Jesse improves the transmit performance by added a prefetch for the
next transmit descriptor to be used when we know there are more coming.
Mitch modifies i40evf driver to handle/allow an abundance of vectors.
Currently the driver only maps transmit and receive queues to a single
MSI-X vector per queue if there are exactly enough vectors for this, but
if we have too many vectors, it will fail and allocate queues to vectors
in a suboptimal manner. So change the condition check to allow for an
excess number of vectors and won't use the extras. Also update the
driver to just return success if the user attempts to set a port VLAN on
a VF that already has the same port VLAN configured, instead of going
through unnecessary filter removals & adds. Fix the MAC filters for VFs,
which were being programmed with 0 for the VLAN value when there was no
VLAN assigned. Instead, we must use -1 to indicate that no VLAN is in
use. Fix the VF disable code, which was not properly cleaning up the VF
and would leave the VF in an indeterminate state, so fix this by
notifying the VF and then call the normal VF reset routine. Fix the
logic in the driver so that MAC filters are added and removed correctly
and added a check for the driver's hardware MAC address so that this
filter does not get removed incorrectly.
Carolyn removes incorrect #ifdef's which should not have been added in
the first place and with the #ifdef's removed, make the necessary
changes in the driver to resolve compile errors.
Greg updates the admin queue command header defines.
v2: fix indentation in patch 12 based on feedback from Sergei Shtylyov
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Parri [Mon, 13 Jul 2015 22:12:05 +0000 (00:12 +0200)]
pkt_sched: sch_qfq: remove unused member of struct qfq_sched
The member (u32) "num_active_agg" of struct qfq_sched has been unused
since its introduction in 462dbc9101acd38e92eda93c0726857517a24bbd
"pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT)
there is no active plan to use it; this removes the member.
Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to memset memory allocated with vzalloc.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 16 Jul 2015 00:13:24 +0000 (17:13 -0700)]
Merge branch 'gianfar_rx_sg'
Claudiu Manoil says:
====================
gianfar: Add Rx S/G
This patch-set introduces scatter/gather support
on the Rx side, addressing Rx path performance
issues in the driver.
Thanks.
As an example, two boards connected back-to-back
were used to measure the throughput, running the
same kernel 4.1, before and after applying these
patches.
The netperf UDP_STREAM results below show that the
bottleneck lies on the Rx side BEFORE applying the
patches, and that the Rx throughput is even lower
with a larger MTU. AFTER applying the patches the
Rx bottleneck is gone (Rx throughput matches the
Tx one) and the RX throughput is not influenced by
MTU size any longer (as expected).
BEFORE:
1) MTU 1500 (default)
root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SS us/KB
The eTSEC h/w is capable of scatter/gather on the receive side
too if MAXFRM > MRBLR, when the allowed maximum Rx frame size
is set to be greater than the maximum Rx buffer size (MRBLR).
It's about time the driver makes use of this h/w capability,
by supporting fixed buffer sizes and Rx S/G.
The buffer size given to eTSEC for reception is fixed to
1536B (must be multiple of 64), which is the same default
buffer size as before, used to accommodate standard MTU
(1500B) size frames. As before, eTSEC can receive frames of
up to 9600B. Individual Rx buffers are mapped to page halves
(page size for eTSEC systems is 4KB). The skb is built around
the first buffer of a frame (using build_skb()). In case the
frame spans multiple buffers, the trailing buffers are added
as Rx fragments to the skb. The last buffer in frame is marked
by the L status flag. A mechanism is in place to reuse the pages
owned by the driver (for Rx) for subsequent receptions.
Supporting fixed size buffers allows the implementation of Rx S/G,
which in turn removes the memory pressure issues the driver had
before when MTU was set for jumbo frame reception.
Also, in most cases, the Rx path becomes faster due to Rx page
reusal, since the overhead of allocating new rx buffers is removed
from the fast path.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Use "ndev" instead of "dev", as the rx queue back pointer
to a net_device struct, to avoid name clashing with a
"struct device" reference. This prepares the addition of a
"struct device" back pointer to the rx queue structure.
Remove duplicated rxq registration in the process.
Move napi_gro_receive() outside gfar_process_frame().
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There are several (long standing) problems about how the status
field of the rx buffer descriptor (rxbd) is currently handled on
the error path:
- too many unnecessary 16bit reads of the two halves of the rxbd
status field (32bit), also resulting in overuse of endianness
convesion macros;
- "bdp->status = RXBD_LARGE" makes no sense, since the "large"
flag is read only (only eTSEC can write it), and trying to clear
the other status bits is also error prone in this context
(most of the rx status bits are read only anyway).
This is fixed with a single 32bit read of the "status" field,
and then the appropriate 16bit shifting is applied to access
the various status bits or the rx frame length. Also corrected
the use of the RXBD_LARGE flag.
Additional fix:
"rx_over_errors" stat is incremented instead of "rx_crc_errors"
in case of RXBD_OVERRUN occurrence.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Use a more common consumer/ producer index design to improve
rx buffer allocation. Instead of allocating a single new buffer
(skb) on each iteration, bundle the allocation of several rx
buffers at a time. This also opens the path for further memory
optimizations.
Remove useless check of rxq->rfbptr, since this patch touches
rx pause frame handling code as well. rxq->rfbptr is always
initialized as part of Rx BD ring init.
Remove redundant (and misleading) 'amount_pull' parameter.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
i40e/i40evf: Bump version to 1.3.6 for i40e and 1.3.2 for i40evf
Bump.
Change-ID: I84573d9fa51effc5b29bf5b8c74e3cc8b2673f48 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Change a warning message to indicate what may have really happened when
the init_shared_code call fails.
Change-ID: I616ace40fed120d0dec86dfc91ab2d7cde466904 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e/i40evf: Add support for pre-allocated pages for PD
The i40e_add_pd_table_entry() routine is being modified to handle both
cases where a backing page is passed and where backing page is allocated
in i40e_add_pd_table_entry().
For PBLE resource management, it is more efficient for it to manage its
backing pages. For VF, PBLE backing page addresses will be send to PF
driver for PBLE resource.
The i40e_remove_pd_bp() is also modified to not free pre-allocated pages and
free only ones which were allocated in i40e_add_pd_table_entry().
Change-ID: Ie673f0403f22979e9406f5a94048dceb91bcf9a8 Signed-off-by: Faisal Latif <faisal.latif@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 27 Apr 2015 18:57:17 +0000 (14:57 -0400)]
i40evf: add MAC address filter in open, not init
During close, all of the MAC filters are cleared, so the driver would be
unable to receive unicast packets after being closed and reopened.
Add the adapter's "hardware" MAC address filter in open, not init. This
ensures that the correct filter is present each time.
Change-ID: I51a11e9c1200139dab6f66a5353bd38c7d26f875 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 27 Apr 2015 18:57:16 +0000 (14:57 -0400)]
i40evf: don't delete all the filters
Due to an inverted conditional, the driver was marking all of its MAC
filters for deletion every time set_rx_mode was called. Depending upon
the timing of the calls to set_rx_mode and the processing of the admin
queue, the driver would (accidentally) end up with a varying number of
functional filters.
Correct this logic so that MAC filters are added and removed correctly.
Add a check for the driver's "hardware" MAC address so that this filter
doesn't get removed incorrectly.
Change-ID: Ib3e7c4a5b53df6835f164fe44cb778cb71f8aff8 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 27 Apr 2015 18:57:15 +0000 (14:57 -0400)]
i40e: un-disable VF after reset
When a VF is disabled, there is no way for it to recover until either
the PF driver is reloaded or SR-IOV is disabled and enabled. To correct
this, enable the VF after a successful reset.
Change-ID: I9e0788476c4d53d5407961b503febdfff2b8a7c6 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 27 Apr 2015 18:57:14 +0000 (14:57 -0400)]
i40e: do a proper reset when disabling a VF
The VF disable code was just whanging on the reset bit without properly
cleaning up the VF, which would leave the VF in an indeterminate state
from which it could not recover. Fix this by notifying the VF and then
by calling the normal VF reset routine.
Change-ID: I862b9dfa919368773cbdc212b805b520db2f7430 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 27 Apr 2015 18:57:13 +0000 (14:57 -0400)]
i40e: correctly program filters for VFs
MAC filters for VFs were being programmed with 0 for the VLAN value when
there was no VLAN assigned. This is incorrect and actually assigns the
VF to VLAN 0. Instead, we must use -1 to indicate that no VLAN is in
use. This change programs the filters correctly and gets rid of a bogus
error message when setting a port VLAN on an active VF.
Change-ID: Ica9a9906d768405377ff3308e27f7d0b5b2ea96e Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Mon, 27 Apr 2015 18:57:12 +0000 (14:57 -0400)]
i40e/i40evf: Update the admin queue command header
Make the necessary updates to i40e_adminq_cmd.h.
Change-ID: Ib031c86cc6cab78e5aa44c64d8ce5474be8d7e42 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch removes some #ifdef's that should not be there. They
were stopping code that is needed from being compiled in.
With these #ifdef's removed, changes are needed in the driver
to fix some compile errors: adding missing parameters to
the definition of ndo_bridge_setlink and a ndo_dflt_brige_getlink call.
Change-ID: I5516614e1bc50b6bca0647cef971bc96161ba2de Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 27 Apr 2015 18:57:10 +0000 (14:57 -0400)]
i40e: ignore duplicate port VLAN requests
If user attempts to set a port VLAN on a VF that already has the same
port VLAN configured, the driver will go through a completely
unnecessary flurry of filter removals and filter adds. Just check for
this condition and return success instead of doing a bunch of busywork.
Change-ID: Ia1a9e83e6ed48b3f4658bc20dfc6af0cf525d54a Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 27 Apr 2015 18:57:09 +0000 (14:57 -0400)]
i40evf: Allow for an abundance of vectors
The driver currently only maps TX and RX queues to a single MSI-X vector
per queue pair if there are exactly enough vectors for this.
Unfortunately, if we have too many vectors it will fail and allocate
queues to vectors in a suboptimal manner. Change the condition check to
allow for excess vectors. In this case, the extras just won't be used.
Change-ID: I23e1e2955c64739c86612db88a25583e6a7e0b17 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e/i40evf: improve Tx performance with a small tweak
Add a prefetch for the next Tx descriptor to be used when we know
there are more coming.
Change-ID: Ibb9acab11d508eec2db7da795df74debc16eeacb Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e/i40evf: Update Flex-10 related device/function capabilities
The Flex10 device/function capability has been upgraded to include
information needed to support Flex-10 configurations. This patch adds new
fields to the i40e_hw_capabilities structure and updates
i40e_parse_discover_capabilities functions to extract them from the AQ
response. Naming convention has changed to use flex10 mode instead of
existing mfp_mode_1.
Change-ID: I305dd888866985a30293acb3fb14fa43ca6b79ea Signed-off-by: Pawel Orlowski <pawel.orlowski@intel.com> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e/i40evf: Add stats to track FD ATR and SB dynamic enable state
Since the driver can dynamically enable/disable FD ATR and SB features,
these stats help keep track of the current state and along with
fd_flush count provide a means to debug what could be going on
with the flow director filters. This will take away the need for
being verbose in our debug logs with respect to FD.
Change-ID: I29224f750fe6602391043655d18996570720377d Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Until now all user mdb entries were added in vlan 0, this patch adds
support to allow the user to specify the vlan for the entry.
About the uapi change a hole in struct br_mdb_entry is used so the size
and offsets are kept the same (verified with pahole and tested with older
iproute2).
Example:
$ bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000
dev br0 port eth1 grp 239.0.0.1 permanent vlan 200
dev br0 port eth1 grp 239.0.0.1 permanent
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 13 Jul 2015 18:49:32 +0000 (20:49 +0200)]
ebpf: remove self-assignment in interpreter's tail call
ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1]
and thus has no effect. Add a comment instead, explaining what happens and
why it's okay to just remove it. Since from user space side, a tail call is
invoked as a pseudo helper function via bpf_tail_call_proto, the verifier
checks the arguments just like with any other helper function and makes
sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 13 Jul 2015 15:48:00 +0000 (08:48 -0700)]
net: Build IPv6 into kernel by default
This patch makes the default to build IPv6 into the kernel. IPv6
now has significant traction and any remaining vestiges of IPv6
not being provided parity with IPv4 should be swept away. IPv6 is now
core to the Internet and kernel.
Points on IPv6 adoption:
- Per Google statistics, IPv6 usage has reached 7% on the Internet
and continues to exhibit an exponential growth rate
https://www.google.com/intl/en/ipv6/statistics.html
- Just a few days ago ARIN officially depleted its IPv4 pool
- IPv6 only data centers are being successfully built
(e.g. at Facebook)
This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6
is set to "y" and the text has been updated to reflect the maturity of
IPv6.
Impact:
Under some circumstances building modules in to kernel might have a
performance advantage. In my testing, I did notice a very slight
improvement.
This will obviously increase the size of the kernel image. In my
configuration I see:
Which increases text size by ~270K (2.8% increase in size for me). If
image size is an issue, presumably for a device which does not do IP
networking (IMO we should be discouraging IPv4-only devices), IPV6 can
be disabled or still built as a module.
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Missing list head init in bluetooth hidp session creation, from Tedd
Ho-Jeong An.
2) Don't leak SKB in bridge netfilter error paths, from Florian
Westphal.
3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien
Grall.
4) Fix regression in IP over hamradio bpq encapsulation, from Ralf
Baechle.
5) Fix race between rhashtable resize events and table walks, from Phil
Sutter.
6) Missing validation of IFLA_VF_INFO netlink attributes, fix from
Daniel Borkmann.
7) Missing security layer socket state initialization in tipc code,
from Stephen Smalley.
8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from
Denys Vlasenko.
9) Missing minor_idr destroy on module unload on macvtap driver, from
Johannes Thumshirn.
10) Various pktgen kernel thread races, from Oleg Nesterov.
11) Fix races that can cause packets to be processed in the backlog even
after a device attached to that SKB has been fully unregistered.
From Julian Anastasov.
12) bcmgenet driver doesn't account packet drops vs. errors properly,
fix from Petri Gynther.
13) Array index validation and off by one fix in DSA layer from Florian
Fainelli
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits)
can: replace timestamp as unique skb attribute
ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
can: c_can: Fix default pinmux glitch at init
can: rcar_can: unify error messages
can: rcar_can: print request_irq() error code
can: rcar_can: fix typo in error message
can: rcar_can: print signed IRQ #
can: rcar_can: fix IRQ check
net: dsa: Fix off-by-one in switch address parsing
net: dsa: Test array index before use
net: switchdev: don't abort unsupported operations
net: bcmgenet: fix accounting of packet drops vs errors
cdc_ncm: update specs URL
Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
bridge: mdb: allow the user to delete mdb entry if there's a querier
net: call rcu_read_lock early in process_backlog
net: do not process device backlog during unregistration
bridge: fix potential crash in __netdev_pick_tx()
net: axienet: Fix devm_ioremap_resource return value check
...
Pull crypto fixes from Herbert Xu:
"This fixes a duplicate dma_unmap_sg call in omap-des and reentrancy
bugs in the powerpc nx driver which may cause bogus output or worse
memory corruption"
David S. Miller [Mon, 13 Jul 2015 05:24:01 +0000 (22:24 -0700)]
Merge tag 'linux-can-fixes-for-4.2-20150712' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2015-07-12
this is a pull request of 8 patchs for net/master.
Sergei Shtylyov contributes 5 patches for the rcar_can driver, fixing the IRQ
check and several info and error messages. There are two patches by J.D.
Schroeder and Roger Quadros for the c_can driver and dra7x-evm device tree,
which precent a glitch in the DCAN1 pinmux. Oliver Hartkopp provides a better
approach to make the CAN skbs unique, the timestamp is replaced by a counter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
So the change to test 'crtc_state->base.active' cannot possibly be
correct as-is.
There may be some other minimal fix (like just checking crtc_state for
NULL), but I'm just reverting it now for the rc2 release, and people
like Daniel Vetter who actually know this code will figure out what the
right solution is in the longer term.
Reported-and-bisected-by: Jörg Otte <jrg.otte@gmail.com> Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
"Fixes for this cycle regression in overlayfs and a couple of
long-standing (== all the way back to 2.6.12, at least) bugs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
freeing unlinked file indefinitely delayed
fix a braino in ovl_d_select_inode()
9p: don't leave a half-initialized inode sitting around
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"A fair number of 4.2 fixes also because Markos opened the flood gates.
- Patch up the math used calculate the location for the page bitmap.
- The FDC (Not what you think, FDC stands for Fast Debug Channel) IRQ
around was causing issues on non-Malta platforms, so move the code
to a Malta specific location.
- A spelling fix replicated through several files.
- Fix to the emulation of an R2 instruction for R6 cores.
- Fix the JR emulation for R6.
- Further patching of mindless 64 bit issues.
- Ensure the kernel won't crash on CPUs with L2 caches with >= 8
ways.
- Use compat_sys_getsockopt for O32 ABI on 64 bit kernels.
- Fix cache flushing for multithreaded cores.
- A build fix"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: O32: Use compat_sys_getsockopt.
MIPS: c-r4k: Extend way_string array
MIPS: Pistachio: Support CDMM & Fast Debug Channel
MIPS: Malta: Make GIC FDC IRQ workaround Malta specific
MIPS: c-r4k: Fix cache flushing for MT cores
Revert "MIPS: Kconfig: Disable SMP/CPS for 64-bit"
MIPS: cps-vec: Use macros for various arithmetics and memory operations
MIPS: kernel: cps-vec: Replace KSEG0 with CKSEG0
MIPS: kernel: cps-vec: Use ta0-ta3 pseudo-registers for 64-bit
MIPS: kernel: cps-vec: Replace mips32r2 ISA level with mips64r2
MIPS: kernel: cps-vec: Replace 'la' macro with PTR_LA
MIPS: kernel: smp-cps: Fix 64-bit compatibility errors due to pointer casting
MIPS: Fix erroneous JR emulation for MIPS R6
MIPS: Fix branch emulation for BLTC and BGEC instructions
MIPS: kernel: traps: Fix broken indentation
MIPS: bootmem: Don't use memory holes for page bitmap
MIPS: O32: Do not handle require 32 bytes from the stack to be readable.
MIPS, CPUFREQ: Fix spelling of Institute.
MIPS: Lemote 2F: Fix build caused by recent mass rename.
Oliver Hartkopp [Fri, 26 Jun 2015 09:58:19 +0000 (11:58 +0200)]
can: replace timestamp as unique skb attribute
Commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb8 "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.
This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.
This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.
Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Roger Quadros [Tue, 7 Jul 2015 14:27:57 +0000 (17:27 +0300)]
ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
Driver core sets "default" pinmux on on probe and CAN driver
sets "sleep" pinmux during register. This causes a small window
where the CAN pins are in "default" state with the DCAN module
being disabled.
Change the "default" state to be like sleep so this glitch is
avoided. Add a new "active" state that is used by the driver
when CAN is actually active.
Signed-off-by: Roger Quadros <rogerq@ti.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The previous change 3973c526ae9c (net: can: c_can: Disable pins when CAN
interface is down) causes a slight glitch on the pinctrl settings when used.
Since commit ab78029 (drivers/pinctrl: grab default handles from device core),
the device core will automatically set the default pins. This causes the pins
to be momentarily set to the default and then to the sleep state in
register_c_can_dev(). By adding an optional "enable" state, boards can set the
default pin state to be disabled and avoid the glitch when the switch from
default to sleep first occurs. If the "enable" state is not available
c_can_pinctrl_select_state() falls back to using the "default" pinctrl state.
[Roger Q] - Forward port to v4.2 and use pinctrl_get_select().
Signed-off-by: J.D. Schroeder <jay.schroeder@garmin.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sergei Shtylyov [Sat, 20 Jun 2015 00:34:55 +0000 (03:34 +0300)]
can: rcar_can: fix typo in error message
Fix typo in the first error message printed by rcar_can_open().
Based on the original patch by Vladimir Barinov.
Fixes: 862e2b6af941 ("can: rcar_can: support all input clocks") Reported-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sergei Shtylyov [Sat, 20 Jun 2015 00:32:46 +0000 (03:32 +0300)]
can: rcar_can: fix IRQ check
rcar_can_probe() regards 0 as a wrong IRQ #, despite platform_get_irq() that it
calls returns negative error code in that case. This leads to the following
being printed to the console when attempting to open the device:
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- the high latency PIT detection fix, which slipped through the cracks
for rc1
- a regression fix for the early printk mechanism
- the x86 part to plug irq/vector related hotplug races
- move the allocation of the espfix pages on cpu hotplug to non atomic
context. The current code triggers a might_sleep() warning.
- a series of KASAN fixes addressing boot crashes and usability
- a trivial typo fix for Kconfig help text
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text
x86/irq: Retrieve irq data after locking irq_desc
x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
x86/irq: Plug irq vector hotplug race
x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
x86/espfix: Init espfix on the boot CPU side
x86/espfix: Add 'cpu' parameter to init_espfix_ap()
x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig
x86/kasan: Add message about KASAN being initialized
x86/kasan: Fix boot crash on AMD processors
x86/kasan: Flush TLBs after switching CR3
x86/kasan: Fix KASAN shadow region page tables
x86/init: Clear 'init_level4_pgt' earlier
x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"This update from the timer departement contains:
- A series of patches which address a shortcoming in the tick
broadcast code.
If the broadcast device is not available or an hrtimer emulated
broadcast device, some of the original assumptions lead to boot
failures. I rather plugged all of the corner cases instead of only
addressing the issue reported, so the change got a little larger.
Has been extensivly tested on x86 and arm.
- Get rid of the last holdouts using do_posix_clock_monotonic_gettime()
- A regression fix for the imx clocksource driver
- An update to the new state callbacks mechanism for clockevents.
This is required to simplify the conversion, which will take place
in 4.3"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Prevent NULL pointer dereference
time: Get rid of do_posix_clock_monotonic_gettime
cris: Replace do_posix_clock_monotonic_gettime()
tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
tick/broadcast: Handle spurious interrupts gracefully
tick/broadcast: Check for hrtimer broadcast active early
tick/broadcast: Return busy when IPI is pending
tick/broadcast: Return busy if periodic mode and hrtimer broadcast
tick/broadcast: Move the check for periodic mode inside state handling
tick/broadcast: Prevent deep idle if no broadcast device available
tick/broadcast: Make idle check independent from mode and config
tick/broadcast: Sanity check the shutdown of the local clock_event
tick/broadcast: Prevent hrtimer recursion
clockevents: Allow set-state callbacks to be optional
clocksource/imx: Define clocksource for mx27
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a cpu hotplug race vs. interrupt descriptors:
Prevent irq setup/teardown across the cpu starting/dying parts of cpu
hotplug so that the starting/dying cpu has a stable view of the
descriptor space. This has been an issue for all architectures in the
cpu dying phase, where interrupts are migrated away from the dying
cpu. In the starting phase its mostly a x86 issue vs the vector space
update"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hotplug: Prevent alloc/free of irq descriptors during cpu up/down
Al Viro [Wed, 8 Jul 2015 01:42:38 +0000 (02:42 +0100)]
freeing unlinked file indefinitely delayed
Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course). However, there's one case where that
does *not* happen. Namely, if you open it by fhandle with cold dcache,
then unlink() and close().
In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal. In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()). The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in. As the result, we have the final iput() delayed
indefinitely. It's trivial to reproduce -
main()
{
int fd;
union {
struct file_handle f;
char buf[MAX_HANDLE_SZ];
} x;
int m;
x.f.handle_bytes = sizeof(x);
chdir("/root");
mkdir("foo", 0700);
fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
close(fd);
name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
flush_dcache();
fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
unlink("foo/bar");
write(fd, buf, sizeof(buf));
system("df ."); /* 20Mb eaten */
close(fd);
system("df ."); /* should've freed those 20Mb */
flush_dcache();
system("df ."); /* should be the same as #2 */
}
will spit out something like
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 283282 21692 93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).
Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/ Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 12 Jul 2015 14:39:45 +0000 (10:39 -0400)]
fix a braino in ovl_d_select_inode()
when opening a directory we want the overlayfs inode, not one from
the topmost layer.
Reported-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Tested-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David S. Miller [Sun, 12 Jul 2015 06:25:16 +0000 (23:25 -0700)]
Merge branch 'dsa-of-parsing-fixes'
Florian Fainelli says:
====================
net: dsa: OF parsing fixes
This patch series fixes two small parsing issues, the first one was
reported by Dan, the second came after looking more closely at the
code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: Fix off-by-one in switch address parsing
cd->sw_addr is used as a MDIO bus address, which cannot exceed
PHY_MAX_ADDR (32), our check was off-by-one.
Fixes: 5e95329b701c ("dsa: add device tree bindings to register DSA switches") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
port_index is used an index into an array, and this information comes
from Device Tree, make sure that port_index is not equal to the array
size before using it. Move the check against port_index earlier in the
loop.
Fixes: 5e95329b701c: ("dsa: add device tree bindings to register DSA switches") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to abort attribute setting or object addition, if the
prepare phase returned operation not supported.
Thus, abort these two transactions only if the error is not -EOPNOTSUPP.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There is nothing wrong with coalescing during defragmentation, it
reduces truesize overhead and simplifies things for the receiving
socket (no fraglist walk needed).
However, it also destroys geometry of the original fragments.
While that doesn't cause any breakage (we make sure to not exceed largest
original size) ip_do_fragment contains a 'fastpath' that takes advantage
of a present frag list and results in fragments that (in most cases)
match what was received.
In case its needed the coalescing could be done later, when we're sure
the skb is not forwarded. But discussion during NFWS resulted in
'lets just remove this for now'.
Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm
Pull libnvdimm fixes from Dan Williams:
"1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
bug fixes (patches 1-6)
2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).
Granted these are slightly large for a -rc update. They have been
out for review in one form or another since the end of May and were
deferred from the merge window while we settled on the "PMEM API"
for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
and wmb_pmem).
Now that those apis are merged we implement them in the BLK driver
to guarantee that mmio aperture moves stay ordered with respect to
incoming read/write requests, and that writes are flushed through
those mmio-windows and platform-buffers to be persistent on media.
These pass the sub-system unit tests with the updates to
tools/testing/nvdimm, and have received a successful build-report from
the kbuild robot (468 configs).
With acks from Rafael for the touches to drivers/acpi/"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
nfit: add support for NVDIMM "latch" flag
nfit: update block I/O path to use PMEM API
tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
tools/testing/nvdimm: fix return code for unimplemented commands
tools/testing/nvdimm: mock ioremap_wt
pmem: add maintainer for include/linux/pmem.h
nfit: fix smatch "use after null check" report
nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
libnvdimm: smatch cleanups in __nd_ioctl
sparse: fix misplaced __pmem definition
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Mostly slight adjusments for new drivers, but also one core fix for
which finally the dependencies are now available as well"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Mark instantiated device nodes with OF_POPULATE
i2c: jz4780: Fix return value if probe fails
i2c: xgene-slimpro: Fix missing mbox_free_channel call in probe error path
i2c: I2C_MT65XX should depend on HAS_DMA
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"A fix (revert) for a recent regression in Synaptics driver and a fix
for Elan i2c touchpad driver"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: synaptics - allocate 3 slots to keep stability in image sensors"
Input: elan_i2c - change the hover event from MT to ST
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A small set of fixes for problems found by smatch in new drivers that
we added this rc and a handful of driver fixes that came in during the
merge window"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
drivers: clk: st: Incorrect register offset used for lock_status
clk: mediatek: mt8173: Fix enabling of critical clocks
drivers: clk: st: Fix mux bit-setting for Cortex A9 clocks
drivers: clk: st: Add CLK_GET_RATE_NOCACHE flag to clocks
drivers: clk: st: Fix flexgen lock init
drivers: clk: st: Fix FSYN channel values
drivers: clk: st: Remove unused code
clk: qcom: Use parent rate when set rate to pixel RCG clock
clk: at91: do not leak resources
clk: stm32: Fix out-by-one error path in the index lookup
clk: iproc: fix bit manipulation arithmetic
clk: iproc: fix memory leak from clock name
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A bunch of fixes for radeon, intel, omap and one amdkfd fix.
Radeon fixes are all over, but it does fix some cursor corruption
across suspend/resume. i915 should fix the second warn you were
seeing, so let us know if not. omap is a bunch of small fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (28 commits)
drm/radeon: disable vce init on cayman (v2)
drm/amdgpu: fix timeout calculation
drm/radeon: check if BO_VA is set before adding it to the invalidation list
drm/radeon: allways add the VM clear duplicate
Revert "Revert "drm/radeon: dont switch vt on suspend""
drm/radeon: Fold radeon_set_cursor() into radeon_show_cursor()
drm/radeon: unpin cursor BOs on suspend and pin them again on resume (v2)
drm/radeon: Clean up reference counting and pinning of the cursor BOs
drm/amdkfd: validate pdd where it acquired first
Revert "drm/i915: Allocate context objects from stolen"
drm/i915: Declare the swizzling unknown for L-shaped configurations
drm/radeon: fix underflow in r600_cp_dispatch_texture()
drm/radeon: default to 2048 MB GART size on SI+
drm/radeon: fix HDP flushing
drm/radeon: use RCU query for GEM_BUSY syscall
drm/amdgpu: Handle irqs only based on irq ring, not irq status regs.
drm/radeon: Handle irqs only based on irq ring, not irq status regs.
drm/i915: Use crtc_state->active in primary check_plane func
drm/i915: Check crtc->active in intel_crtc_disable_planes
drm/i915: Restore all GGTT VMAs on resume
...
Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull selinux fixes from James Morris.
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
selinux: fix mprotect PROT_EXEC regression caused by mm change
selinux: don't waste ebitmap space when importing NetLabel categories
Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This is an assortment of fixes. Most of the commits are from Filipe
(fsync, the inode allocation cache and a few others). Mark kicked in
a series fixing corners in the extent sharing ioctls, and everyone
else fixed up on assorted other problems"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix wrong check for btrfs_force_chunk_alloc()
Btrfs: fix warning of bytes_may_use
Btrfs: fix hang when failing to submit bio of directIO
Btrfs: fix a comment in inode.c:evict_inode_truncate_pages()
Btrfs: fix memory corruption on failure to submit bio for direct IO
btrfs: don't update mtime/ctime on deduped inodes
btrfs: allow dedupe of same inode
btrfs: fix deadlock with extent-same and readpage
btrfs: pass unaligned length to btrfs_cmp_data()
Btrfs: fix fsync after truncate when no_holes feature is enabled
Btrfs: fix fsync xattr loss in the fast fsync path
Btrfs: fix fsync data loss after append write
Btrfs: fix crash on close_ctree() if cleaner starts new transaction
Btrfs: fix race between caching kthread and returning inode to inode cache
Btrfs: use kmem_cache_free when freeing entry in inode cache
Btrfs: fix race between balance and unused block group deletion
btrfs: add error handling for scrub_workers_get()
btrfs: cleanup noused initialization of dev in btrfs_end_bio()
btrfs: qgroup: allow user to clear the limitation on qgroup
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Kevin Hilman:
"A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
prima2 as well as a few arm64-specific DT fixes.
This series also includes a late to support a new Allwinner (sunxi)
SoC, but since it's rather simple and isolated to the
platform-specific code, it's included it for this -rc"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: dts: add device tree for ARM SMM-A53x2 on LogicTile Express 20MG
arm: dts: vexpress: add missing CCI PMU device node to TC2
arm: dts: vexpress: describe all PMUs in TC2 dts
GICv3: Add ITS entry to THUNDER dts
arm64: dts: Add poweroff button device node for APM X-Gene platform
ARM: dts: am4372.dtsi: disable rfbi
ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
ARM: dts: am4372: Add emif node
Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
ARM: sunxi: Enable simplefb in the defconfig
ARM: Remove deprecated symbol from defconfig files
ARM: sunxi: Add Machine support for A33
ARM: sunxi: Introduce Allwinner H3 support
Documentation: sunxi: Update Allwinner SoC documentation
ARM: prima2: move to use REGMAP APIs for rtciobrg
ARM: dts: atlas7: add pinctrl and gpio descriptions
ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
memory: omap-gpmc: Fix parsing of devices
Thomas Gleixner [Sat, 11 Jul 2015 12:26:34 +0000 (14:26 +0200)]
tick/broadcast: Prevent NULL pointer dereference
Dan reported that the recent changes to the broadcast code introduced
a potential NULL dereference.
Add the proper check.
Fixes: e0454311903d "tick/broadcast: Sanity check the shutdown of the local clock_event" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
This patch fix a spelling typo found in API-z8530-sync-txdma-open.html.
It is because this file was generated from comment in source,
I have to fix comment in source.
Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Fri, 10 Jul 2015 09:39:57 +0000 (11:39 +0200)]
net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
Reconsidering my commit 20462155 "net: inet_diag: export IPV6_V6ONLY
sockopt", I am not happy with the limitations it causes for socket
analysing code in userspace. Exporting the value only if it is set makes
it hard for userspace to decide whether the option is not set or the
kernel does not support exporting the option at all.
>From an auditor's perspective, the interesting question for listening
AF_INET6 sockets is: "Does it NOT have IPV6_V6ONLY set?" Because it is
the unexpected case. This patch allows to answer this question reliably.
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jul 2015 06:24:31 +0000 (23:24 -0700)]
Merge branch 'be2net-next'
Sathya Perla says:
====================
be2net: patch set
Hi David, the following patch set has code cleanup patches, minor enhancements
and non-critical fixes. Pls consider applying to the net-next tree. Thanks!
Patch 1 removes duplicate code in be_setup_wol() routine making it simpler
and more readable.
Patch 2 fixes the the bridge mode return value for the ndo_bridge_getlink()
call. Instead of just relying on the SRIOV enabled state, the driver now
queries the FW, for the actual mode of bridge.
Patch 3 removes code for setting D0 power state as it's already done
in pci_enable_device()
Patch 4 fixes a bad return value in be_check_ufi_compatibility() routine
introduced by an earlier commit.
Patch 5 fixes a field in udp header being accessed while in network endian
format.
Patch 6 fixes the be_mcc_notify() routine to return an error status when
the FW/HW is in an error state.
Patch 7 fixes the be_cmd_rx_filter() routine to issue the RX_FILTER cmd
and not wait for a completion from the FW. If the FW/adapter
is in an error state, this change helps in not holding up the rtnl_lock
and keeping bottom halves disabled while the driver timesout waiting for
a response from the FW.
Patch 8 fixes the be_cmd_set_loopback() routine to issue the LOOPBACK cmd
and not wait for the FW completion while spin_lock_bh() is held on the
mcc_lock. As the cmd is always issued from ethtool in a process context,
it can sleep till the FW completion is received.
Patch 9 bumps up the driver version to 10.6.0.3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The SET_LOOPBACK_MODE command is always issued from ethtool only in a
process context. So, while waiting for the cmd to complete, the driver
can sleep instead of holding spin_lock_bh() on the mcc_lock. This is done
by calling be_mcc_notify() instead of be_mcc_notify_wait() (that returns
only after the cmd completes while the MCCQ is locked).
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This fix makes the RX_FILTER cmd asynchronous, i.e., the caller issues
this cmd and doesn't wait for a completion from the FW. If the FW/adapter
is in an error state, this change helps in not holding up the rtnl_lock
and keeping bottom halves disabled while the driver timesout waiting for
a response from the FW.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the adapter is in error state, return error from be_mcc_notify()
so that the caller routines need not sleep waiting for a response.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: convert dest field in udp-hdr to host-endian
The "dest" field in the UDP-hdr of a TX skb is in network endian format.
Convert it to host endian before accessing it. The os2bmc patch,
mentioned below introduced this code.
Fixes: 760c295e0e8d ("be2net: Support for OS2BMC") Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: fix wrong return value in be_check_ufi_compatibility()
In the commit a6e6ff6eee12f3e
("be2net: simplify UFI compatibility checking"), a return value of "-1"
was incorrectly used in place of "false". This patch fixes it.
Fixes: a6e6ff6eee12f3e ("be2net: simplify UFI compatibility checking") Signed-off-by: Vasundhara Volam <vasundhara.volam@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pci_enable_device() call sets device power state to D0; there is no need
doing it again.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The current code assumes that bridge functionality (EVB) in the adapter
is enabled only when SR-IOV is enabled. This is not always true.
This patch uses the GET_HSW_CONFIG FW cmd to query this from the FW.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This change will make be_setup_wol() routine more compact and readable
by removing some duplicate code.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Do not iterate over all interfaces when finding source address on specific interface.
If outgoing interface is specified and the candidate address is
restricted to the outgoing interface, it is enough to iterate
over that given interface only.
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Thomson [Fri, 10 Jul 2015 01:56:54 +0000 (13:56 +1200)]
net: phy: Pass mdix ethtool setting through to phy driver
Pass the mdix setting from ethtool down to the phy driver, to allow
driver specific implementations of manually setting the polarity.
Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: mdb: allow the user to delete mdb entry if there's a querier
Until now when a querier was present static entries couldn't be deleted.
Fix this and allow the user to manipulate the mdb with or without a
querier.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Jul 2015 01:16:37 +0000 (18:16 -0700)]
Merge branch 'netdev_unregister_races'
Julian Anastasov says:
====================
net: fixes for device unregistration
Test script from Eric W. Biederman can catch a problem
where packets from backlog are processed long after the last
synchronize_net call. This can be reproduced after few tests
if commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
is reverted for the test. Incoming packets do not hold
reference to device but even if they do, subsystems do not
expect packets to fly during and after the NETDEV_UNREGISTER
event.
The first fix has the cost of netif_running check in fast path.
The second fix calls rcu_read_lock while local IRQ is disabled,
I hope this is not against the rules.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:
CPU 1 CPU 2
skb->dev: no reference
process_backlog:__skb_dequeue
process_backlog:local_irq_enable
on_each_cpu for
flush_backlog => IPI(hardirq): flush_backlog
- packet not found in backlog
CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections
netdev_run_todo,
rcu_barrier: no
ongoing callbacks
__netif_receive_skb_core:rcu_read_lock
- too late
free dev
process packet for freed dev
Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
net: do not process device backlog during unregistration
commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.
Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.
Thanks to Eric W. Biederman for the test script and his
valuable feedback!
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'parisc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"We have one important patch from Dave Anglin and myself which fixes
PTE/TLB race conditions which caused random segmentation faults on our
debian buildd servers, and one patch from Alex Ivanov which speeds up
the graphical text console on the STI framebuffer driver"
* 'parisc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results
stifb: Implement hardware accelerated copyarea
Stephen Smalley [Fri, 10 Jul 2015 13:40:59 +0000 (09:40 -0400)]
selinux: fix mprotect PROT_EXEC regression caused by mm change
commit 66fc13039422ba7df2d01a8ee0873e4ef965b50b ("mm: shmem_zero_setup
skip security check and lockdep conflict with XFS") caused a regression
for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
shared anonymous mappings. However, even before that regression, the
checking on such mprotect PROT_EXEC calls was inconsistent with the
checking on a mmap PROT_EXEC call for a shared anonymous mapping. On a
mmap, the security hook is passed a NULL file and knows it is dealing
with an anonymous mapping and therefore applies an execmem check and no
file checks. On a mprotect, the security hook is passed a vma with a
non-NULL vm_file (as this was set from the internally-created shmem
file during mmap) and therefore applies the file-based execute check
and no execmem check. Since the aforementioned commit now marks the
shmem zero inode with the S_PRIVATE flag, the file checks are disabled
and we have no checking at all on mprotect PROT_EXEC. Add a test to
the mprotect hook logic for such private inodes, and apply an execmem
check in that case. This makes the mmap and mprotect checking
consistent for shared anonymous mappings, as well as for /dev/zero and
ashmem.
Cc: <stable@vger.kernel.org> # 4.1.x Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>