Rainer Weikusat [Sun, 6 Dec 2015 21:11:38 +0000 (21:11 +0000)]
af_unix: fix unix_dgram_recvmsg entry locking
The current unix_dgram_recvsmg code acquires the u->readlock mutex in
order to protect access to the peek offset prior to calling
__skb_recv_datagram for actually receiving data. This implies that a
blocking reader will go to sleep with this mutex held if there's
presently no data to return to userspace. Two non-desirable side effects
of this are that a later non-blocking read call on the same socket will
block on the ->readlock mutex until the earlier blocking call releases it
(or the readers is interrupted) and that later blocking read calls
will wait longer than the effective socket read timeout says they
should: The timeout will only start 'ticking' once such a reader hits
the schedule_timeout in wait_for_more_packets (core.c) while the time it
already had to wait until it could acquire the mutex is unaccounted for.
The patch avoids both by using the __skb_try_recv_datagram and
__skb_wait_for_more packets functions created by the first patch to
implement a unix_dgram_recvmsg read loop which releases the readlock
mutex prior to going to sleep and reacquires it as needed
afterwards. Non-blocking readers will thus immediately return with
-EAGAIN if there's no data available regardless of any concurrent
blocking readers and all blocking readers will end up sleeping via
schedule_timeout, thus honouring the configured socket receive timeout.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rainer Weikusat [Sun, 6 Dec 2015 21:11:34 +0000 (21:11 +0000)]
core: enable more fine-grained datagram reception control
The __skb_recv_datagram routine in core/ datagram.c provides a general
skb reception factility supposed to be utilized by protocol modules
providing datagram sockets. It encompasses both the actual recvmsg code
and a surrounding 'sleep until data is available' loop. This is
inconvenient if a protocol module has to use additional locking in order
to maintain some per-socket state the generic datagram socket code is
unaware of (as the af_unix code does). The patch below moves the recvmsg
proper code into a new __skb_try_recv_datagram routine which doesn't
sleep and renames wait_for_more_packets to
__skb_wait_for_more_packets, both routines being exported interfaces. The
original __skb_recv_datagram routine is reimplemented on top of these
two functions such that its user-visible behaviour remains unchanged.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 7 Dec 2015 03:38:58 +0000 (04:38 +0100)]
PHY: DP83867: Remove looking in parent device for OF properties
Device tree properties for a phy device are expected to be in the phy
node. The current code for the DP83867 also tries to look in the
parent node. The devices binding documentation does not mention this,
no current device tree file makes use of this, and it is not behaviour
we want. So remove looking in the parent device.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sun, 6 Dec 2015 21:47:15 +0000 (22:47 +0100)]
net: cdc_ncm: add "ndp_to_end" sysfs attribute
Adding a writable sysfs attribute for the "NDP to end"
quirk flag.
This makes it easier for end users to test new devices for
this firmware bug. We've been lucky so far, but we should
not depend on reporters capable of rebuilding the driver.
Cc: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Dec 2015 03:40:46 +0000 (22:40 -0500)]
Merge branch 'mlx4-HA-LAG-SRIOV-VF'
Or Gerlitz says:
====================
Add HA and LAG support for mlx4 SRIOV VFs
This series is built upon the code added in commit ce388ff "Merge branch
'mlx4-next'" which added HA and LAG support to mlx4 RoCE and SRIOV services.
We add HA and Link Aggregation support to single ported mlx4 Ethernet VFs.
In this case, the PF Ethernet interfaces are bonded, the VFs see single
port HW devices (already supported) -- however, now this port is highly
available. This means that all VF HW QPs (both VF Ethernet driver and VF
RoCE / RAW QPs) are subject to the V2P (Virtual-To-Physical) mapping which
is managed by the PF driver, and hence resilient across link failures and
such events.
When bonding operates in Dynamic link aggregation (802.3ad) mode, traffic
from each VF will go over the VF "base port" (the port this VF is assigned
to by the admin) as long as this port is up. When the port fails, traffic
from all VFs that are defined on this port will move to the other port, and
be back to their base-port when it recovers.
Moni and Or.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:43 +0000 (18:07 +0200)]
net/mlx4_core: Support the HA mode for SRIOV VFs too
When the mlx4 driver runs in HA mode, and all VFs are single ported
ones, we make their single port Highly-Available.
This is done by taking advantage of the HA mode properties (following
bonding changes with programming the port V2P map, etc) and adding
the missing parts which are unique to SRIOV such as mirroring VF
steering rules on both ports.
Due to limits on the MAC and VLAN table this mode is enabled only when
number of total VFs is under 64.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 6 Dec 2015 16:07:42 +0000 (18:07 +0200)]
IB/mlx4: Use the VF base-port when demuxing mad from wire
Under HA mode, it's possible that the VF registered its GID
(and expects to get mads through the PV scheme) on a port which is
different from the one this mad arrived on, due to HA fail over.
Therefore, if the gid is not matched on the port that the packet arrived
on, check for a match on the other port if HA mode is active -- and if a
match is found on the other port, continue processing the mad using that
other port.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:41 +0000 (18:07 +0200)]
net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode
Due to HW limitations, indexes to MAC and VLAN tables are always taken
from the table of the actual port. So, if a resource holds an index to
a table, it may refer to different values during the lifetime of the
resource, unless the tables are mirrored. Also, even when
driver is not in HA mode the policy of allocating an index to these
tables is such to make sure, as much as possible, that when the time
comes the mirroring will be successful. This means that in multifunction
mode the allocation of a free index in a port's table tries to make sure
that the same index in the other's port table is also free.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:40 +0000 (18:07 +0200)]
net/mlx4_core: Support mirroring VF DMFS rules on both ports
Under HA mode, steering rules set by VFs should be mirrored on both
ports of the device so packets will be accepted no matter on which
port they arrived.
Since getting into HA mode is done dynamically when the user bonds mlx4
Ethernet netdevs, we keep hold of the VF DMFS rule mbox with the port
value flipped (1->2,2->1) and execute the mirroring when getting into
HA mode. Later, when going out of HA mode, we unset the mirrored rules.
In that context note that mirrored rules cannot be removed explicitly.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:39 +0000 (18:07 +0200)]
net/mlx4_core: Use both physical ports to dispatch link state events to VF
Under HA mode, the link down event should be sent to VFs only if both
ports are down.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 6 Dec 2015 16:07:38 +0000 (18:07 +0200)]
net/mlx4_core: Use both physical ports to set the VF link state
In HA mode, the link state for VFs for which the policy is "auto"
(i.e. follow the physical link state) should be ORed from both ports.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
CC: Asias He <asias@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sun, 6 Dec 2015 20:25:50 +0000 (21:25 +0100)]
net: qmi_wwan: should hold RTNL while changing netdev type
The notifier calls were thrown in as a last-minute fix for an
imagined "this device could be part of a bridge" problem. That
revealed a certain lack of locking. Not to mention testing...
Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Singhai, Anjali [Fri, 4 Dec 2015 07:49:31 +0000 (23:49 -0800)]
Revert "i40e: remove CONFIG_I40E_VXLAN"
This reverts commit 8fe269991aece394a7ed274f525d96c73f94109a.
The case where VXLAN is a module and i40e driver is inbuilt
will not be handled properly with this change since i40e
will have an undefined symbol vxlan_get_rx_port in it.
v2: Add a signed-off-by.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 6 Dec 2015 16:14:06 +0000 (11:14 -0500)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2015-12-05
This series contains updates to fm10k only.
Jacob provides the remaining fm10k patches in the series. First change
ensures that all the logic regarding the setting of netdev features is
consolidated in one place of the driver. Fixed an issue where an assumption
was being made on how many queues are available, especially when init_hw_vf()
errors out. Fixed up an number of issues with init_hw() where failures
were not being handled properly or at all, so update the driver to check
returned error codes and respond appropriately. Fixed up typecasting
issues found where either the incorrect typecast size was used or
explicitly typecast values. Added additional debugging statistics and
rename statistic to better reflect its true value. Added support for
ITR scaling based on PCIe link speed for fm10k. Fixed up code comment
where "hardware" was misspelled.
v2: Dropped patches #1 and #10 from original submission, patch #1 was from
Nick Krause and due to his past kernel interactions, dropping his patch.
Patch #10 had questions and concerns from Tom Herbert which cannot be
addressed at this time since the author (Jacob Keller) is currently on
sabbatical, so dropping this patch for now until we can properly address
Tom's questions and concerns.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Fri, 16 Oct 2015 17:57:09 +0000 (10:57 -0700)]
fm10k: change default Tx ITR to 25usec
The current default ITR for Tx is overly restrictive. Using a simple
netperf TCP_STREAM test, we top out at about 10Gb/s for a single thread
when running using 1500 byte frames. By reducing the ITR value to 25usec
(up to 40K interrupts a second from 10K), we are able to achieve 36Gb/s
for a single thread TCP stream test.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:08 +0000 (10:57 -0700)]
fm10k: use macro for default Tx and Rx ITR values
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:07 +0000 (10:57 -0700)]
fm10k: Update adaptive ITR algorithm
The existing adaptive ITR algorithm is overly restrictive. It throttles
incorrectly for various traffic rates, and does not produce good
performance. The algorithm now allows for more interrupts per second,
and does some calculation to help improve for smaller packet loads. In
addition, take into account the new itr_scale from the hardware which
indicates how much to scale due to PCIe link speed.
Reported-by: Matthew Vick <matthew.vick@intel.com> Reported-by: Alex Duyck <alexander.duyck@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:06 +0000 (10:57 -0700)]
fm10k: introduce ITR_IS_ADAPTIVE macro
Define a macro for identifying when the itr value is dynamic or
adaptive. The concept was taken from i40e. This helps make clear what
the check is, and reduces the line length to something more reasonable
in a few places.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:05 +0000 (10:57 -0700)]
fm10k: Add support for ITR scaling based on PCIe link speed
The Intel Ethernet Switch FM10000 Host Interface interrupt throttle
timers are based on the PCIe link speed. Because of this, the value
being programmed into the ITR registers must be scaled accordingly.
For the PF, this is as simple as reading the PCIe link speed and storing
the result. However, in the case of SR-IOV, the VF's interrupt throttle
timers are based on the link speed of the PF. However, the VF is unable
to get the link speed information from its configuration space, so the
PF must inform it of what scale to use.
Rather than pass this scale via mailbox message, take advantage of
unused bits in the TDLEN register to pass the scale. It is the
responsibility of the PF to program this for the VF while setting up the
VF queues and the responsibility of the VF to get the information
accordingly. This is preferable because it allows the VF to set up the
interrupts properly during initialization and matches how the MAC
address is passed in the TDBAL/TDBAH registers.
Since we're modifying fm10k_type.h, we may as well also update the
copyright year.
Reported-by: Matthew Vick <matthew.vick@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:03 +0000 (10:57 -0700)]
fm10k: rename mbx_tx_oversized statistic to mbx_tx_dropped
Originally this statistic was renamed because the method of dropping was
called "drop_oversized_messages", but this logic has changed much, and
this counter does actually represent messages which we failed to
transmit for a number of reasons. Rename the counter back to tx_dropped
since this is when it will increment, and it is less confusing.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:02 +0000 (10:57 -0700)]
fm10k: add statistics for actual DWORD count of mbmem mailbox
A previous bug was uncovered by addition of a debug stat to indicate the
actual number of DWORDS we pulled from the mbmem. It turned out this was
not the same as the tx_dwords counter. While the previous bug fix should
have corrected this in all cases, add some debug stats that count the
number of DWORDs pushed or pulled from the mbmem. A future debugger may
take advantage of this statistic for debugging purposes. Since we're
modifying fm10k_mbx.h, update the copyright year as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:00 +0000 (10:57 -0700)]
fm10k: Correct typecast in fm10k_update_xc_addr_pf
Since the resultant data type of the mac_update.mac_upper field is u16,
it does not make sense to typecast u8 variables to u32 first. Since
we're modifying fm10k_pf.c, also update the copyright year.
Reported-by: Matthew Vick <matthew.vick@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:59 +0000 (10:56 -0700)]
fm10k: reinitialize queuing scheme after calling init_hw
The init_hw function may fail, and in the case of VFs, it might change
the number of maximum queues available. Thus, for every flow which
checks init_hw, we need to ensure that we clear the queue scheme before,
and initialize it after. The fm10k_io_slot_reset path will end up
triggering a reset so fm10k_reinit needs this change. The
fm10k_io_error_detected and fm10k_io_resume also need to properly clear
and reinitialize the queue scheme.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:58 +0000 (10:56 -0700)]
fm10k: always check init_hw for errors
A recent change modified init_hw in some flows the function may fail on
VF devices. For example, if a VF doesn't yet own its own queues.
However, many callers of init_hw didn't bother to check the error code.
Other callers checked but only displayed diagnostic messages without
actually handling the consequences.
Fix this by (a) always returning and preventing the netdevice from going
up, and (b) printing the diagnostic in every flow for consistency. This
should resolve an issue where VF drivers would attempt to come up
before the PF has finished assigning queues.
In addition, change the dmesg output to explicitly show the actual
function that failed, instead of combining reset_hw and init_hw into a
single check, to help for future debugging.
Fixes: 1d568b0f6424 ("fm10k: do not assume VF always has 1 queue") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:57 +0000 (10:56 -0700)]
fm10k: reset max_queues on init_hw_vf failure
VF drivers must detect how many queues are available. Previously, the
driver assumed that each VF has at minimum 1 queue. This assumption is
incorrect, since it is possible that the PF has not yet assigned the
queues to the VF by the time the VF checks. To resolve this, we added a
check first to ensure that the first queue is infact owned by the VF at
init_hw_vf time. However, the code flow did not reset hw->mac.max_queues
to 0. In some cases, such as during reinit flows, we call init_hw_vf
without clearing the previous value of hw->mac.max_queues. Due to this,
when init_hw_vf errors out, if its error code is not properly handled
the VF driver may still believe it has queues which no longer belong to
it. Fix this by clearing the hw->mac.max_queues on exit due to errors.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:56 +0000 (10:56 -0700)]
fm10k: set netdev features in one location
Don't change netdev hw_features later in fm10k_probe, instead set all
values inside fm10k_alloc_netdev. To do so, we need to know the MAC type
(whether it is PF or VF) in order to determine what to do. This helps
ensure that all logic regarding features is co-located.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sergei Shtylyov [Fri, 4 Dec 2015 21:58:57 +0000 (00:58 +0300)]
sh_eth: read MAC address registers only once
The code reading the MAHR/MALR registers in read_mac_address() is terribly
ineffective -- it reads MAHR 4 times and MALR 2 times, while it's enough to
read each register only once. Use the local variables to achieve that,
somewhat beautifying the code while at it...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 4 Dec 2015 21:58:07 +0000 (00:58 +0300)]
ravb: read MAC address registers only once
The code reading the MAHR/MALR registers in ravb_read_mac_address() is
terribly ineffective -- it reads MAHR 4 times and MALR 2 times, while
it's enough to read each register only once. Use the local variables to
achieve that, somewhat beautifying the code while at it...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This removes one redundant error message in bnx2x and changes another one to
WARN_ONCE. The third patch is a small simplification in ethtool stats.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:36 +0000 (17:22 +0100)]
bnx2x: simplify distinction between port and func stats
The 'flags' field in bnx2x_stats_arr[] serves only one purpose - to tell
us if the statistic is a per-port stat and thus should not be shown for
virtual functions. It's strange that the field can have three different
values. A boolean will do just fine.
Also remove IS_FUNC_STAT(). It was used only once and it's in fact just
a negation of IS_PORT_STAT().
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:35 +0000 (17:22 +0100)]
bnx2x: change FW GRO error message to WARN_ONCE
It's supposed to be impossible for TPA to give us anything else
than IPv4 or IPv6 here. But in case there is a way to reach this error
by some strange received frames, we don't want to flood the kernel log.
WARN_ONCE is better for this purpose.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:34 +0000 (17:22 +0100)]
bnx2x: drop redundant error message about allocation failure
alloc_pages() already prints a warning when it fails. No need to emit
another message. Certainly not at KERN_ERR level, because it is no big
deal if this GFP_ATOMIC allocation fails occasionally.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Pieczko [Fri, 4 Dec 2015 08:48:39 +0000 (08:48 +0000)]
sfc: check warm_boot_count after other functions have been reset
A change in MCFW behaviour means that the net driver must update its record
of the warm_boot_count by reading it from the ER_DZ_BIU_MC_SFT_STATUS
register.
On v4.6.x MCFW the global boot count was incremented when some functions
needed to be reset to enable multicast chaining, so all functions saw the
same value. In that case, the driver needed to increment its
warm_boot_count when other functions were reset, to avoid noticing it later
and then trying to reset itself to recover unnecessarily.
With v4.7+ MCFW, the boot count in firmware doesn't change as that is
unnecessary since the PFs that have been reset will each receive an MC
reboot notification. In that case, the driver re-reads the unchanged
value.
Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Fri, 4 Dec 2015 07:43:19 +0000 (08:43 +0100)]
atm: solos-pci: Replace simple_strtol by kstrtoint
The simple_strtol function is obsolete.
This patch replace it by kstrtoint.
This will simplify code, since some error case not handled by
simple_strtol are handled by kstrtoint.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 5 Dec 2015 22:41:42 +0000 (17:41 -0500)]
Merge branch 'batman-hdlc'
Andrew Lunn says:
====================
Allow BATMAN to use hdlc-eth interfaces
BATMAN works over Ethernet like interfaces. hdlc-eth provides the need
requirements. However, hdlc devices are often created as raw hdlc
devices, which batman cannot use, and are then be transmuted into
other types using sethdlc(1). Have the HDLC code emit
NETDEV_*_TYPE_CHANGE events when the type changes, and have BATMAN
react on these events.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:33 +0000 (21:12 +0100)]
batman-adv: Act on NETDEV_*_TYPE_CHANGE events
A network interface can change type. It may change from a type which
batman does not support, e.g. hdlc, to one it does, e.g. hdlc-eth.
When an interface changes type, it sends two notifications. Handle
these notifications.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:31 +0000 (21:12 +0100)]
WAN: HDLC: Call notifiers before and after changing device type
An HDLC device can change type when the protocol driver is changed.
Calling the notifier change allows potential users of the interface
know about this planned change, and even block it. After the change
has occurred, send a second notification to users can evaluate the new
device type etc.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:30 +0000 (21:12 +0100)]
WAN: HDLC: Detach protocol before unregistering device
The current code first unregisters the device, and then detaches the
protocol from it. This should be performed the other way around, since
the detach may try to use state which has been freed by the
unregister. Swap the order, so that we first detach and then remove the
netdev.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Dec 2015 21:56:23 +0000 (16:56 -0500)]
Merge branch 'qmi_wwan_MDM9x30'
Bjørn Mork says:
====================
net: qmi_wwan: MDM9x30 support
We add new device IDs all the time, often without any testing on
actual hardware. This is usually OK as long as the device is similar
to already supported devices, using the same chipset and firmware
basis. But the Sierra Wireless MC7455 is an example of a new chipset
generation. Adding it based on assumed similarity with its ancestors
proved too optimistic.
This series adds the missing bits and pieces necessary to support LTE
Advanced modems based on the Qualcomm MDM9x30 chipset. A big thanks to
Sierra Wireless for providing MC7455 samples for testing
The most important change is the "raw-ip" support. The series also
adds a necessary control request, removes an unsupported device ID,
and adds a driver specific entry in MAINTAINERS.
A few random notes about "raw-ip":
"I rather have these all running in raw IP mode. The 802.3 framing is
utterly stupid." - Marcel Holtmann in Jan 2012 [1]
Marcel was right. I should have listened to him. What more can I say?
The 802.3 framing has provided a steady supply of firmware bugs for
many years. We've added driver workarounds for many of these, but
there are still known bugs where the workaround is so yucky that we
have refused to apply it. But all that is over now. The latest
generation Qualcomm chips no longer supports 802.3 framing at all.
I had two open questions regarding the "raw-ip" userspace API:
1) Should we continue faking an ethernet device, even if we don't use
the L2 headers on the USB link anymore?
There was a vote in favour of the "headerless" device. This is the
honest representation of the hardware/firmware interface.
2) What input should the driver base its framing on?
Snooping or directly manipulating QMI is considered out of the
question. We delegated all QMI handling to userspace from the
beginning.
We have so far required userspace to configure the firmware for
"802.3" framing, or fail if that proved impossible. This
requirement is now changed. Userspace must now inform the driver
if it negotiates "raw-ip" framing. Two alternative interfaces were
proposed:
- ethtool private driver flag, or
- sysfs file
The NetworkManager/ModemManager developers were in favour of the
sysfs alternative.
These questions (or any other you migh have :) are of course still
open. This patch set presents the solutions I currently prefer,
considering the above.
All comments are appreciated, even simple '+1' ones.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:21 +0000 (19:24 +0100)]
net: qmi_wwan: support "raw IP" mode
QMI wwan devices have traditionally emulated ethernet devices
by default. But they have always had the capability of operating
without any L2 header at all, transmitting and receiving "raw"
IP packets over the USB link. This firmware feature used to be
configurable through the QMI management protocol.
Traditionally there was no way to verify the firmware mode
without attempting to change it. And the firmware would often
disallow changes anyway, i.e. due to a session already being
established. In some cases, this could be a hidden firmware
internal session, completely outside host control. For these
reasons, sticking with the "well known" default mode was safest.
But newer generations of QMI hardware and firmware have moved
towards defaulting to "raw IP" mode instead, followed by an
increasing number of bugs in the already buggy "802.3" firmware
implementation. At the same time, the QMI management protocol
gained the ability to detect the current mode. This has enabled
the userspace QMI management application to verify the current
firmware mode without trying to modify it.
Following this development, the latest QMI hardware and firmware
(the MDM9x30 generation) has dropped support for "802.3" mode
entirely. Support for "raw IP" framing in the driver is therefore
necessary for these devices, and to a certain degree to work
around problems with the previous generation,
This patch adds support for "raw IP" framing for QMI devices,
changing the netdev from an ethernet device to an ARPHRD_NONE
p-t-p device when "raw IP" framing is enabled.
The firmware setup is fully delegated to the QMI userspace
management application, through simple tunneling of the QMI
protocol. The driver will therefore not know which mode has been
"negotiated" between firmware and userspace. Allowing userspace
to inform the driver of the result through a sysfs switch is
considered a better alternative than to change the well established
clean delegation of firmware management to userspace.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:20 +0000 (19:24 +0100)]
usbnet: allow mini-drivers to consume L2 headers
Assume the minidriver has taken care of all L2 header parsing
if it sets skb->protocol. This allows the minidriver to
support non-ethernet L2 headers, and even operate without
any L2 header at all.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:18 +0000 (19:24 +0100)]
net: qmi_wwan: MDM9x30 specific power management
MDM9x30 based modems appear to go into a deeper sleep when
suspended without "Remote Wakeup" enabled. The QMI interface
will not respond unless a "set DTR" control request is sent
on resume. The effect is similar to a QMI_CTL SYNC request,
resetting (some of) the firmware state.
We allow userspace sessions to span multiple character device
open/close sequences. This means that userspace can depend
on firmware state while both the netdev and the character
device are closed. We have disabled "needs_remote_wakeup" at
this point to allow devices without remote wakeup support to
be auto-suspended.
To make sure the MDM9x30 keeps firmware state, we need to
keep "needs_remote_wakeup" always set. We also need to
issue a "set DTR" request to enable the QMI interface.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Dec 2015 19:36:16 +0000 (14:36 -0500)]
Merge branch 'hip06-soc'
Salil Mehta says:
====================
net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem
This PATCH V7 addresses the TAB formatting comments by
Sergei Shtylyov. Missing TABs at some other palces have
also been corrected.
PATCH V6:
This addresses the review comments provided by
David Miller over the existing use of ENABLE/DISABLE
hash defines with the code. These hash defines are doing
a similar job as implicit type bool would do. So these are
kind of duplicate and are redundant.
PATCH V5:
This PATCH addresses the review comments by Yuval Mintz
<Yuval.Mintz@qlogic.com>. This rework of comments are basically
related to:
1) styling of the code,
2) RSS default Key initiailization related code
3) redundant code removal
PATCH V4:
This addresses the review comment provided by
Sergei Shtylyov. The changelog of every patch has also
been modified.
PATCH V3:
Addresses the review comment floated by David Miller
PATCH V2:
1) Bug Fixes and Clean-up: Internally identified
2) Addresses internal review comments by Kenneth Lee and
by Huang Daode
3) Addresses the review comment from "Yisen.Zhuang(Zhuangyuzeng)"
4) Adds fix from Fengguang Wu for an error generated from
"kbuild test robot" from Intel
5) Ethtool support for TSO set option from Lisheng
PATCH V1:
Adds initial support of Hip06 SoC with below changes:
This patch-set adds support of new Hisilicon Hip06 SoC to the existing
(already part of net-next) HNS ethernet driver for Hip05 SoC. Hip06 is
a multi-core SoC and is a derivative of Hip05 SoC with lots of new
hardware featres supported like RSS, TSO, hardware VLAN assist etc.
The changes in the driver are mainly due to following:
1) changes in the DMA descriptor provided by the Hip06 ethernet
hardware. These changes need to co-exist with already present
Hip05 DMA descriptor and its operating functions. The decision
to choose the correct type of DMA descriptor is taken dynamically
depending upon the version of the hardware (i.e. V1/hip05 or
V2/hip06, see already existing hisilicon-hns-nic.txt binding file
for the detailed description version and naming).
2) To support new features added to the Hip06 ethernet hardware:
a. RSS (Receive Side Scaling)
b. TSO (TCP Segment Offload)
c. Hardware VLAN support (currently we are initializing hardware
to not assist in stripping the vlan tag at hardware level.
Proper support of this feature and ethtool would come after
these patches have been accepted)
Kindly note that, this patchset has been based on latest net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:56 +0000 (12:17 +0000)]
net:hns: Add support of ethtool TSO set option for Hip06 in HNS
This patch adds the support of ethtool TSO option to support
Hip06 SoC to HNS
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:55 +0000 (12:17 +0000)]
net:hns: Add Hip06 "TSO(TCP Segment Offload)" support HNS Driver
This patch adds the support of "TSO (TCP Segment Offload)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.
Enabling this feature would help offload the TCP Segmentation
process to the Hip06 ethernet hardware. This eventually would help
in saving precious cpu cycles.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:54 +0000 (12:17 +0000)]
net:hns: Add Hip06 "RSS(Receive Side Scaling)" support to HNS Driver
This patch adds the support of "RSS (Receive Side Scaling)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.
This feature helps in distributing the different flows (mapped as
hash by hardware using Toeplitz Hash) to different Queues asssociated
with the processor cores. The mapping of flow-hash values to the
different queues is stored in indirection table (which is per Packet-
parse-Engine/PPE). This patch also provides the changes to re-program
the (flow-hash<->Qid) mapping using the ethtool.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:53 +0000 (12:17 +0000)]
net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem
This patchset adds support of Hisilicon Hip06 SoC to the existing HNS
ethernet driver.
The changes in the driver are mainly due to changes in the DMA
descriptor provided by the Hip06 ethernet hardware. These changes
need to co-exist with already present Hip05 DMA descriptor and its
operating functions. The decision to choose the correct type of DMA
descriptor is taken dynamically depending upon the version of the
hardware (i.e. V1/hip05 or V2/hip06, see already existing
hisilicon-hns-nic.txt binding file for detailed description). other
changes includes in SBM, DSAF and PPE modules as well. Changes
affecting the driver related to the newly added ethernet hardware
features in Hip06 would be added as separate patch over this and
subsequent patches.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
yzhu1 [Thu, 3 Dec 2015 10:00:55 +0000 (18:00 +0800)]
net: bonding: remove redudant brackets
It is not necessary to use two brackets. As such, the redudant brackets
are removed.
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 3 Dec 2015 23:45:16 +0000 (15:45 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes from this series. The most important here is a
regression fix for an issue that some folks would hit in blk-merge.c,
and the NVMe queue depth limit for the screwed up Apple "nvme"
controller.
In more detail, this pull request contains:
- a set of fixes for null_blk, including a fix for a few corner cases
where we could hang the device. From Arianna and Paolo.
- lightnvm:
- A build improvement from Keith.
- Update the qemu pci id detection from Matias.
- Error handling fixes for leaks and other little fixes from
Sudip and Wenwei.
- fix from Eric where BLKRRPART would not return EBUSY for whole
device mounts, only when partitions were mounted.
- fix from Jan Kara, where EOF O_DIRECT reads would return
negatively.
- remove check for rq_mergeable() when checking limits for cloned
requests. The check doesn't make any sense. It's assuming that
since NOMERGE is set on the request that we don't have to
recalculate limits since the request didn't change, but that's not
true if the request has been redirected. From Hannes.
- correctly get the bio front segment value set for single segment
bio's, fixing a BUG() in blk-merge. From Ming"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: temporary fix for Apple controller reset
null_blk: change type of completion_nsec to unsigned long
null_blk: guarantee device restart in all irq modes
null_blk: set a separate timer for each command
blk-merge: fix computing bio->bi_seg_front_size in case of single segment
direct-io: Fix negative return from dio read beyond eof
block: Always check queue limits for cloned requests
lightnvm: missing nvm_lock acquire
lightnvm: unconverted ppa returned in get_bb_tbl
lightnvm: refactor and change vendor id for qemu
lightnvm: do device max sectors boundary check first
lightnvm: fix ioctl memory leaks
lightnvm: free memory when gennvm register fails
lightnvm: Simplify config when disabled
Return EBUSY from BLKRRPART for mounted whole-dev fs
Linus Torvalds [Thu, 3 Dec 2015 23:23:17 +0000 (15:23 -0800)]
Merge tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"During the merge window I added a new file that is used to filter
trace events on pids. It filters all events where only tasks with
their pid in that file exists. It also handles the sched_switch and
sched_wakeup trace events where the current task does not have its pid
in the file, but the task either being switched to or awaken does.
Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both
of these tracepoints use the same class as the sched_wakeup
tracepoint, and they too should be included in what gets filtered by
the set_event_pid file"
* tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
David S. Miller [Thu, 3 Dec 2015 20:56:22 +0000 (15:56 -0500)]
Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jérôme Pouiller [Thu, 3 Dec 2015 09:02:35 +0000 (10:02 +0100)]
net: phy: reset only targeted phy
It is possible to address another chip on same MDIO bus. The case is
correctly handled for media advertising. It is taken into account
only if mii_data->phy_id == phydev->addr. However, this condition
was missing for reset case.
Signed-off-by: Jérôme Pouiller <jezz@sysmic.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Boyd [Thu, 3 Dec 2015 07:55:15 +0000 (23:55 -0800)]
stmmac: ipq806x: Return error values instead of pointers
Typically we return error pointers when we want to use those
pointers in the non-error case, but this function is just
returning error pointers or NULL for success. Change the style to
plain int to follow normal kernel coding styles.
Cc: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Wed, 2 Dec 2015 20:19:37 +0000 (15:19 -0500)]
tipc: fix node reference count bug
Commit 5405ff6e15f40f2f ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.
This commit fixes this bug.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 20:18:10 +0000 (15:18 -0500)]
Merge branch 'mvneta-ethtool-autoneg'
Stas Sergeev says:
====================
mvneta: implement ethtool autonegotiation control
These 2 patches add an ability to control the
autonegotiation via ethtool. For example:
ethtool -s eth0 autoneg off
ethtool -s eth0 autoneg on
This is needed if you want to connect the mvneta's MII
to different switches or PHYs: the ones the do support
the in-band status, and the ones that do not.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stas Sergeev [Wed, 2 Dec 2015 17:35:11 +0000 (20:35 +0300)]
mvneta: implement ethtool autonegotiation control
This patch allows to do
ethtool -s eth0 autoneg off
ethtool -s eth0 autoneg on
to disable or enable autonegotiation at run-time.
Without that functionality, the only way to control the autonegotiation
is to modify the device tree.
This is needed if you plan to use the same kernel with
different ethernet switches, the ones that support the in-band
status and the ones that not.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Stas Sergeev [Wed, 2 Dec 2015 17:33:56 +0000 (20:33 +0300)]
mvneta: consolidate autoneg enabling
This moves autoneg-related bit manipulations to the single place.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:29 +0000 (17:30 +0100)]
net: mv643xx: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:28 +0000 (17:30 +0100)]
net: mpc52xx: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:27 +0000 (17:30 +0100)]
net: bcm63xx: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Wed, 2 Dec 2015 16:30:26 +0000 (17:30 +0100)]
net: bfin_mac: Use platform_register/unregister_drivers()
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 2 Dec 2015 06:54:08 +0000 (01:54 -0500)]
bnxt_en: Setup uc_list mac filters after resetting the chip.
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters. Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.
Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeffrey Huang [Wed, 2 Dec 2015 06:54:07 +0000 (01:54 -0500)]
bnxt_en: enforce proper storing of MAC address
For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW. For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeffrey Huang [Wed, 2 Dec 2015 06:54:06 +0000 (01:54 -0500)]
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 20:05:56 +0000 (15:05 -0500)]
Merge branch 'vsock-virtio'
Stefan Hajnoczi says:
====================
Add virtio transport for AF_VSOCK
v2:
* Rebased onto Linux v4.4-rc2
* vhost: Refuse to assign reserved CIDs
* vhost: Refuse guest CID if already in use
* vhost: Only accept correctly addressed packets (no spoofing!)
* vhost: Support flexible rx/tx descriptor layout
* vhost: Add missing total_tx_buf decrement
* virtio_transport: Fix total_tx_buf accounting
* virtio_transport: Add virtio_transport global mutex to prevent races
* common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
semantics)
* common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
* common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
* common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
* common: Fix peer_buf_alloc inheritance on child socket
This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors. It is currently only implemented for VMware's VMCI transport.
This series implements the proposed virtio-vsock device specification from
here:
http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855
Most of the work was done by Asias He and Gerd Hoffmann a while back. I have
picked up the series again.
The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock
Why virtio-vsock?
-----------------
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.
virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM and SOCK_DGRAM semantics. Applications on the host can easily
connect to guest agents because the sockets API allows multiple connections to
a listen socket (unlike virtio-serial). This simplifies the guest<->host
communication and eliminates the need for extra processes on the host to
arbitrate virtio-serial ports.
Overview
--------
This series adds 3 pieces:
1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko
2. virtio_transport.ko - guest driver
3. drivers/vhost/vsock.ko - host driver
Howto
-----
The following kernel options are needed:
CONFIG_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS_COMMON=y
CONFIG_VHOST_VSOCK=m
Launch QEMU as follows:
# qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
Guest and host can communicate via AF_VSOCK sockets. The host's CID (address)
is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to
bind to it).
Status
------
There are a few design changes I'd like to make to the virtio-vsock device:
1. The 3-way handshake isn't necessary over a reliable transport (virtqueue).
Spoofing packets is also impossible so the security aspects of the 3-way
handshake (including syn cookie) add nothing. The next version will have a
single operation to establish a connection.
2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple clients
can transmit to the same listen socket. There is no way for the clients to
coordinate buffer space with each other fairly. The next version will drop
credit-based flow control for SOCK_DGRAM and only rely on best-effort
delivery. SOCK_STREAM still has guaranteed delivery.
3. In the next version only the host will be able to establish connections
(i.e. to connect to a guest agent). This is for security reasons since
there is currently no ability to provide host services only to certain
guests. This also matches how AF_VSOCK works on modern VMware hypervisors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 2 Dec 2015 06:18:11 +0000 (22:18 -0800)]
mpls: support for dead routes
Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.
Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).
dead routes:
-----------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
$ip link set dev swp1 down
$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 dead linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2
linkdown routes:
----------------
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1
nexthop as to 700 via inet 10.1.1.6 dev swp2
$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
/* carrier goes down */
$ip link show dev swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff
$ip -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2 dev swp1 linkdown
nexthop as to 700 via inet 10.1.1.6 dev swp2
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: lpc_eth: remove irq > NR_IRQS check from probe()
If the driver is used on an ARM platform with SPARSE_IRQ defined,
semantics of NR_IRQS is different (minimal value of virtual irqs) and
by default it is set to 16, see arch/arm/include/asm/irq.h.
This value may be less than the actual number of virtual irqs, which
may break the driver initialization. The check removal allows to use
the driver on such a platform, and, if irq controller driver works
correctly, the check is not needed on legacy platforms.
Fixes a runtime problem:
lpc-eth 31060000.ethernet: error getting resources.
lpc_eth: lpc-eth: not found (-6).
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 2 Dec 2015 05:58:33 +0000 (14:58 +0900)]
ravb: add device tree support for r8a779[123]
Simply document new compatibility strings.
As a previous patch adds a generic R-Car Gen2 compatibility string
there appears to be no need for a driver updates.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 2 Dec 2015 05:58:32 +0000 (14:58 +0900)]
ravb: add fallback compatibility strings
Add fallback compatibility strings for R-Car Gen 2 & 3 SoC Families.
This is in keeping with the fallback scheme being adopted wherever appropriate
for drivers for Renesas SoCs.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.
A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.
Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.
Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.
A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: Daniele Fucini <dfucini@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Tue, 1 Dec 2015 21:45:15 +0000 (22:45 +0100)]
net: ipv6: restrict hop_limit sysctl setting to range [1; 255]
Setting a value bigger than 255 resulted in using only the lower eight
bits of that value as it is assigned to the u8 header field. To avoid
this unexpected result, reject such values.
Setting a value of zero is technically possible, but hosts receiving
such a packet have to treat it like hop_limit was set to one, according
to RFC2460. Therefore I don't see a use-case for that.
Setting a route's hop_limit to zero in iproute2 means to use the sysctl
default, which is not the case here: Setting e.g.
net.conf.eth0.hop_limit=0 will not make the kernel use
net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
kinds of confusion, reject zero.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 1 Dec 2015 17:33:36 +0000 (18:33 +0100)]
openvswitch: fix hangup on vxlan/gre/geneve device deletion
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes PTP support active in CONFIG mode on R-Car Gen3.
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 19:17:51 +0000 (14:17 -0500)]
Merge branch 'netronome-NFP4000-NFP6000'
Jakub Kicinski says:
====================
Netronome NFP4000/NFP6000 NIC VF driver
This patchset adds support for VFs of Netronome's NFP-4000 and NFP-6000
based NICs. We are currently also preparing the submission for the PF
driver, but it is not quite ready yet. The PF driver can be found on
GitHub:
Jakub Kicinski [Tue, 1 Dec 2015 14:55:22 +0000 (14:55 +0000)]
net: add driver for Netronome NFP4000/NFP6000 NIC VFs
Add driver for Virtual Functions for the Netronome's
NFP-4000 and NFP-6000 based NICs.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Rolf Neugebauer <rolf.neugebauer@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 1 Dec 2015 14:55:21 +0000 (14:55 +0000)]
pci_ids: add Netronome Systems vendor
Add PCI vendor id for Netronome Systems.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Rolf Neugebauer <rolf.neugebauer@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Dec 2015 17:11:00 +0000 (12:11 -0500)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2015-12-03
This series contains updates to i40e and i40evf only.
Mitch updates the i40evf driver by increasing the maximum number of queues,
since future devices will allow for more queue pairs. Cleans up a
duplicate printing of the driver info string done in init, since it is
already done in probe. Cleaned up the several allocations which did
not need to be at atomic level, where GFP_KERNEL would work just fine.
Then makes i40e_sync_vsi_filters() a more mature function, make having
a common exit point so it will properly release the busy lock on the VSI
and propagate errors to the callers. Then does some whitespace
housekeeping in i40evf.
Kiran moves and updates the detection/recovery of transmit queue hang code
to service_task from tx_timeout function. Also fixed memory leak when
users program flow-director filter using ethtool (sideband filter
programming), the cause being the check of 'tx_buffer->skb' was preventing
'raw_buf' from being freed as part of the cleanup.
Jesse enabled the ability to turn off/on packet split using ethtool priv
flags. Then does some housekeeping for both the i40e and i40evf drivers
which includes: remove unused/useless code, correct whitespace, remove
duplicate #include, fix incorrect comment, etc...
Neerav cleans up functions to gather Flow Control Rx XOFF stats, since
the recent change in the driver logic for checking transmit hang has been
moved, so these functions do not do anything meaningful any longer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset introduces the support of Ethernet SRIOV in ConnectX-4
family of 100G Ethernet NICs.
Some features are still missing, but all the basic SRIOV functionalities
are there already.
Basic Introduction:
ConnectX-4 HW architecture provides two kinds of underlying HW switches.
MPFS (Multi Physical Function Switch) or L2 Table in Software terms:
The HCA has one MPFS switch per physical port, this switch is responsible
of forwarding Unicast traffic to the various overlying Physical Functions (PFs).
Multicast traffic is flooded amongst all the PFs, Each PF can request to
forward a unicast MAC to its E-Switch Uplink vport (which we will cover later)
through SET_L2_TABLE_ENTRY HW command.
MPFS has five ports, four are connected to PFs (one for each) and one is connected
directly to the Physical Port (Physical Link).
E-Switch (Ethernet Switch):
The HCA has one per physical function. The main responsibility of this component is
to forward Unicast/Multicast and vlan tagged/untagged traffic to the various
Virtual Functions (VFs) allocated by the PF. Unlike MPFS, the PF needs to explicitly
create the E-Switch FDB table, Which is a HW flow table managed by the PF driver
whenever vport_group_manager capability bit is set for this PF.
E-Switch has Virtual Ports (vports) entities as its ports, vport0 and uplink vport
are special kind of vports that represents PF vport (vport0) and uplink vport which
is connected to the MPFS switch (if exists) as the PF external link.
vport1..vportN represent VF0..VF(N-1) egress/ingress ports.
E-Switch FDB contains forwarding rules such as:
UC MAC0 -> vport0(PF).
UC MAC1 -> vport1.
UC MAC2 -> vport2.
MC MACX -> vport0, vport2, Uplink.
MC MACY -> vport1, Uplink.
For unmatched traffic FDB has the following default rules:
Unmatched Traffic (src vport != Uplink) -> Uplink.
Unmatched Traffic (src vport == Uplink) -> vport0(PF).
NIC VPort context:
Each NIC (VF/PF) has its own vport context which will be used to store the current
NIC vport context (UC/MC and vlan lists) and other NIC properties such as MTU, promisc
mode, etc.. NIC (VF/PF) driver is responsible of constantly updating this context.
FDB rules population:
Each NIC vport (VF/PF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.
Both PF and VF use the same driver and submit commands directly to the firmware.
The PF sees the vport_group_manager capability bit and as such runs the code
to populate the embedded switches as explained above.
The patch goes as follows:
Patches 1-2 introduces the basic PCI SRIOV functionalities and the support of
Connectx4 to enable specific VFs via enable/disable HCA commands. These two
patches will be also in use later for the IB SRIOV flow.
Patches 3-8 Introduces the basic E-Switch capabilities and commands to be used later by
VF to modify and update its NIC vport context, and by PF (E-Switch Manager) driver to
Query the VF NIC context and acts accordingly.
Patches 9-10 Provide the needed functionality of a NIC driver VF/PF to support SRIOV,
mainly vport context update support.
Patch 11 ("net/mlx5: Introducing E-Switch and l2 table"), Introduces the basic
E-Switch support and infrastructure to read vport context events and to update
MPFS L2 Table of the UC mac addresses request by the PF.
Patches 12-18 Introduces SRIOV enablemenet and E-Switch FDB table management
It adds the Basic E-Swtich public API to set and get sriov properties to be used
in PF netdev sriov ndos.
Patchset was applied ontop of commit 3f8c0f7 "gianfar: use of_property_read_bool()"
Saeed, Eli and Or.
changes from V0, addressed feedback from Alex Duyck:
- patch 09, remove the loop to seek the device address
- patch 09, avoid using array as returned value from helper function
- patch 10, fix possible buffer over-run
changes from V1, addressed feedback from and Julia Lawall and kbuild test robot
- patch 11 check the right variable for allocation failure
- patch 18 eliminated unneeded semicolon
====================
Signed-off-by: David S. Miller <davem@davemloft.net>