Matan Barak [Thu, 13 Nov 2014 12:45:33 +0000 (14:45 +0200)]
net/mlx4_core: Support more than 64 VFs
We now allow up to 126 VFs. Note though that certain firmware
versions only allow up to 80 VFs. Moreover, old HCAs only support 64 VFs.
In these cases, we limit the maximum number of VFs to 64.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matan Barak [Thu, 13 Nov 2014 12:45:32 +0000 (14:45 +0200)]
net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs
Previously, the driver queried the firmware in order to get the number
of supported EQs. Under SRIOV, since this was done before the driver
notified the firmware how many VFs it actually needs, the firmware had
to take into account a worst case scenario and always allocated four EQs
per VF, where one was used for events while the others were used for completions.
Now, when the firmware supports the asymmetric allocation scheme, denoted
by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the
QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we
can get more EQs and MSI-X vectors per function.
Moreover, when running in the new firmware/driver mode, the limitation
that the number of EQs should be a power of two is lifted.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matan Barak [Thu, 13 Nov 2014 12:45:31 +0000 (14:45 +0200)]
net/mlx4_core: Add QUERY_FUNC firmware command
QUERY_FUNC firmware command could be used in order to query the
number of EQs, reserved EQs, etc for a specific function.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matan Barak [Thu, 13 Nov 2014 12:45:30 +0000 (14:45 +0200)]
net/mlx4_core: Refactor mlx4_load_one
Refactor mlx4_load_one, as a preparation step for a new and
more complicated load function. The goal is to support both
newer firmware that required init_hca to be done before
enable_sriov and legacy firmwares that requires things to
be done the other way around.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matan Barak [Thu, 13 Nov 2014 12:45:29 +0000 (14:45 +0200)]
net/mlx4_core: Refactor mlx4_cmd_init and mlx4_cmd_cleanup
Refactoring mlx4_cmd_init and mlx4_cmd_cleanup such that partial init
and cleanup are possible. After this refactoring, calling mlx4_cmd_init
several times is safe.
This is necessary in the VF init flow when mlx4_init_hca returns -EACCESS,
we need to issue cleanup and re-attempt to call it with the slave flag.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matan Barak [Thu, 13 Nov 2014 12:45:28 +0000 (14:45 +0200)]
net/mlx4_core: Use correct variable type for mlx4_slave_cap
We've used an incorrect type for the loop counter and the
mlx4_QUERY_FUNC_CAP function. The current input modifier
is either a port or a boolean.
Since the number of ports is always a positive value < 255,
we should use u8 instead of an integer with casting.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matan Barak [Thu, 13 Nov 2014 12:45:27 +0000 (14:45 +0200)]
net/mlx4_core: Fix wrong reading of reserved_eqs
We mistakenly read the reserved_eqs field as a standard
numeric value rather than a log2 value.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Nov 2014 20:13:13 +0000 (15:13 -0500)]
Merge branch 'rhash_prove_locking'
Herbert Xu says:
====================
rhashtable: Allow local locks to be used and tested
This series moves mutex_is_held entirely under PROVE_LOCKING so
there is zero foot print when we're not debugging. More importantly
it adds a parrent argument to mutex_is_held so that we can test
local locks rather than global ones (e.g., per-namespace locks).
====================
Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 13 Nov 2014 10:11:22 +0000 (18:11 +0800)]
rhashtable: Add parent argument to mutex_is_held
Currently mutex_is_held can only test locks in the that are global
since it takes no arguments. This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.
This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 13 Nov 2014 10:11:20 +0000 (18:11 +0800)]
rhashtable: Move mutex_is_held under PROVE_LOCKING
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch makes the mutex_is_held field in rhashtable
optional depending on PROVE_LOCKING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 13 Nov 2014 10:11:19 +0000 (18:11 +0800)]
netfilter: Move mutex_is_held under PROVE_LOCKING
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch modifies netfilter so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 13 Nov 2014 10:11:18 +0000 (18:11 +0800)]
netlink: Move mutex_is_held under PROVE_LOCKING
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch modifies netlink so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Hisashi Nakamura [Thu, 13 Nov 2014 06:59:07 +0000 (15:59 +0900)]
net: sh_eth: Add r8a7793 support
The device tree probing for R-Car M2N (r8a7793) is added.
Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hisashi Nakamura [Thu, 13 Nov 2014 06:54:05 +0000 (15:54 +0900)]
net: sh_eth: Add RMII mode setting in probe
When using RMMI mode, it is necessary to change in probe.
Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Large receive offloading is known to cause problems if received packets
are passed to other host. Therefore the kernel disables it by calling
dev_disable_lro() whenever a network device is enslaved in a bridge or
forwarding is enabled for it (or globally). For virtual devices we need
to disable LRO on the underlying physical device (which is actually
receiving the packets).
Current dev_disable_lro() code handles this propagation for a vlan
(including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan.
It doesn't handle other stacked devices and their combinations, in
particular propagation from a bond to its slaves which often causes
problems in virtualization setups.
As we now have generic data structures describing the upper-lower device
relationship, dev_disable_lro() can be generalized to disable LRO also
for all lower devices (if any) once it is disabled for the device
itself.
For bonding and teaming devices, it is necessary to disable LRO not only
on current slaves at the moment when dev_disable_lro() is called but
also on any slave (port) added later.
v2: use lower device links for all devices (including vlan and macvlan)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 13 Nov 2014 06:19:06 +0000 (09:19 +0300)]
amd-xgbe: fix ->rss_hash_type
There was a missing break statement so we set everything to
PKT_HASH_TYPE_L3 even when we intended to use PKT_HASH_TYPE_L4.
Fixes: 5b9dfe299e55 ('amd-xgbe: Provide support for receive side scaling') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Anish Bhatt [Thu, 13 Nov 2014 01:15:57 +0000 (17:15 -0800)]
cxgb4i/cxgb4 : Refactor macros to conform to uniform standards
Refactored all macros used in cxgb4i as part of previously started cxgb4 macro
names cleanup. Makes them more uniform and avoids namespace collision.
Minor changes in other drivers where required as some of these macros are used
by multiple drivers, affected drivers are iw_cxgb4, cxgb4(vf) & csiostor
Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 13 Nov 2014 08:54:14 +0000 (16:54 +0800)]
tun: fix issues of iovec iterators using in tun_put_user()
This patch fixes two issues after using iovec iterators:
- vlan_offset should be initialized to zero, otherwise unexpected offset
will be used in skb_copy_datagram_iter()
- advance iovec iterator when vnet_hdr_sz is greater than sizeof(gso), this
is the case when mergeable rx buffer were enabled for a virt guest.
Fixes e0b46d0ee9c240c7430a47e9b0365674d4a04522 ("tun: Use iovec iterators") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 13 Nov 2014 11:48:21 +0000 (12:48 +0100)]
FOU: Fix no return statement warning for !CONFIG_NET_FOU_IP_TUNNELS
net/ipv4/fou.c: In function ‘ip_tunnel_encap_del_fou_ops’:
net/ipv4/fou.c:861:1: warning: no return statement in function returning non-void [-Wreturn-type]
Fixes: a8c5f90fb5 ("ip_tunnel: Ops registration for secondary encap (fou, gue)") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 12 Nov 2014 23:40:43 +0000 (15:40 -0800)]
net: systemport: fix tx work done in TX napi poll
With commit d75b1ade567 ("net: less interrupt masking in NAPI") napi
repoll is done only when work_done == budget. bcm_sysport_tx_poll()
always returns 0 whether or not we completed the poll quantum.
Fix this by returning either 0 when we did complete the TX ring reclaim,
or budget to trigger a repoll.
Fixes: d75b1ade567 ("net: less interrupt masking in NAPI") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
With the commit d75b1ade567 ("net: less interrupt masking in NAPI") napi repoll
is done only when work_done == budget. In tx napi poll we always return 0.
So tx napi is not called again and we do not clean up the tx ring.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Wed, 12 Nov 2014 22:07:44 +0000 (14:07 -0800)]
hyperv: Add processing of MTU reduced by the host
If the host uses packet encapsulation feature, the MTU may be reduced by the
host due to headroom reservation for encapsulation. This patch handles this
new MTU value.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 12 Nov 2014 16:37:49 +0000 (10:37 -0600)]
amd-xgbe: Fix sparse endian warnings
Change the types of the descriptor entries in the xgbe_ring_desc struct
from u32 to __le32 to fix endian warnings issued by sparse.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jisheng Zhang [Wed, 12 Nov 2014 11:08:47 +0000 (19:08 +0800)]
net: pxa168_eth: move SET_NETDEV_DEV a bit earlier
This is to ensure the net_device's dev.parent is set before we used it
in dma_zalloc_coherent() from init_hash_table().
Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix static checker warning that got introduced in commit e2ac9628959cc152
("cxgb4: Cleanup macros so they follow the same style and look consistent, part
2") due to accidental checkin of bogus line.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 12 Nov 2014 19:54:09 +0000 (11:54 -0800)]
ip_tunnel: Ops registration for secondary encap (fou, gue)
Instead of calling fou and gue functions directly from ip_tunnel
use ops for these that were previously registered. This patch adds the
logic to add and remove encapsulation operations for ip_tunnel,
and modified fou (and gue) to register with ip_tunnels.
This patch also addresses a circular dependency between ip_tunnel
and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y
and CONFIG_NET_FOU=m. References to fou an gue have been removed from
ip_tunnel.c
Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series contains some updates to the Broadcom BCM7xxx internal
PHY driver, including:
- removing an annonying print that would appear during interface up/down and
suspend/resume cycles
- drop a workaround sequence for a non-production PHY revision
- add new workarounds for the latest and greatest PHY devices found out there
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 11 Nov 2014 22:55:14 +0000 (14:55 -0800)]
net: phy: bcm7xxx: add workaround for PHY revision E0 and F0
PHY revisions E0 and F0 share the same shorter workaround initialization
sequence. Dedicate a special function for these two PHY revisions to
perform the needed workaround sequence.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
PHY revision D0 requires a specific workaround sequence which needs to
be applied to get the HW to behave properly in all corner cases
conditions. Do this based on the revision we just read out of the HW
using a specific function.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This function performs a R/RC calibration reset and will start being
used by more than one function in the next patches, create a helper
function to factor code.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 11 Nov 2014 22:55:11 +0000 (14:55 -0800)]
net: phy: bcm7xxx: drop A0 revision workaround and fix B0 workaround
bcm7445_config_init() was working around non-production version of the
PHY HW block, so just remove it entirely.
bcm7xxx_28nm_afe_config_init() was running for all PHY revisions greater
than B0, but this workaround sequence is really specific to the B0 PHY
revision, so rename the function accordingly and update the GPHY macro
to use the generic config_init callback.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 11 Nov 2014 22:55:10 +0000 (14:55 -0800)]
net: phy: bcm7xxx: only show PHY revision once
bcm7xxx_28nm_config_init() can be called as frequently as needed by the
PHY library upon suspend/resume cycles and interface bring up/down, just
print the PHY revision once and for all in order not to spam kernel
logs.
Fixes: d8ebfed3f11b ("net: phy: bcm7xxx: utilize PHY revision in config_init") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 11 Nov 2014 22:44:57 +0000 (14:44 -0800)]
irda: Convert IRDA_DEBUG to pr_debug
Use the normal kernel debugging mechanism which also
enables dynamic_debug at the same time.
Other miscellanea:
o Remove sysctl for irda_debug
o Remove function tracing like uses (use ftrace instead)
o Coalesce formats
o Realign arguments
o Remove unnecessary OOM messages
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Nov 2014 18:55:41 +0000 (13:55 -0500)]
Merge branch 'micrel-next'
Johan Hovold says:
====================
net: phy: micrel: refactoring and KSZ8081/KSZ8091 features
This series cleans up and refactors parts of the micrel PHY driver, and
adds support for broadcast-address-disable and led-mode configuration
for KSZ8081 and KSZ8091 PHYs.
Specifically, this enables dual KSZ8081 setups (which are limited to
using address 0 and 3).
A follow up series will add device-type abstraction which will allow for
further refactoring and shared initialisation code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Tue, 11 Nov 2014 19:00:10 +0000 (20:00 +0100)]
net: phy: micrel: refactor broadcast disable
Refactor and clean up broadcast disable.
Some Micrel PHYs have a broadcast-off bit in the Operation Mode Strap
Override register which can be used to disable PHY address 0 as the
broadcast address, so that it can be used as a unique (non-broadcast)
address on a shared bus.
Note that the KSZPHY_OMSO_RMII_OVERRIDE bit is set by default on
KSZ8021/8031.
Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Tue, 11 Nov 2014 19:00:07 +0000 (20:00 +0100)]
dt/bindings: fix documentation of ethernet-phy compatible property
A recent commit extended the documentation of the ethernet-phy
compatible property, but placed the new paragraph under the max-speed
property.
Fixes: f00e756ed12d ("dt: Document a compatible entry for MDIO ethernet
Phys") Cc: devicetree@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Add module_phy_driver macro that can be used by PHY drivers that only
calls phy_driver_register or phy_drivers_register (and the corresponding
unregister functions) in their module init (and exit).
This allows us to eliminate a lot of boilerplate code.
Split in three patches (actual macro and two driver change classes) in
order to facilitate review.
====================
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David L Stevens [Wed, 12 Nov 2014 15:58:47 +0000 (10:58 -0500)]
sunvnet: fix NULL pointer dereference
This patch fixes a NULL pointer dereference when __tx_port_find() doesn't
find a matching port.
Signed-off-by: David L Stevens <david.stevens@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Nov 2014 05:00:20 +0000 (00:00 -0500)]
Merge branch 'skb_alloc_pages'
Alexander Duyck says:
====================
Replace __skb_alloc_pages with simpler function
This patch series replaces __skb_alloc_pages with a much simpler function,
__dev_alloc_pages. The main difference between the two is that
__skb_alloc_pages had an sk_buff pointer that was being passed as NULL in
call places where it was called. In a couple of cases the NULL was passed
by variable and this led to unnecessary code being run.
As such in order to simplify things the __dev_alloc_pages call only takes a
mask and the page order being requested. In addition it takes advantage of
several behaviors already built into the page allocator so that it can just
set GFP flags unconditionally.
v2: Renamed functions to dev_alloc_page(s) instead of netdev_alloc_page(s)
Removed __GFP_COLD flag from usb code as it was redundant
v3: Update patch descriptions and subjects to match changes in v2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 11 Nov 2014 17:26:57 +0000 (09:26 -0800)]
fm10k/igb/ixgbe: Replace __skb_alloc_page with dev_alloc_page
The Intel drivers were pretty much just using the plain vanilla GFP flags
in their calls to __skb_alloc_page so this change makes it so that they use
dev_alloc_page which just uses GFP_ATOMIC for the gfp_flags value.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Matthew Vick <matthew.vick@intel.com> Cc: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 11 Nov 2014 17:26:50 +0000 (09:26 -0800)]
phonet: Replace calls to __skb_alloc_page with __dev_alloc_page
Replace the calls to __skb_alloc_page that are passed NULL with calls to
__dev_alloc_page.
In addition remove __GFP_COLD flag from allocations as we only want it for
the Rx buffer which is taken care of by __dev_alloc_skb, not for any
secondary allocations such as the queue element transmit descriptors.
Cc: Oliver Neukum <oliver@neukum.org> Cc: Felipe Balbi <balbi@ti.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 11 Nov 2014 17:26:42 +0000 (09:26 -0800)]
cxgb4/cxgb4vf: Replace __skb_alloc_page with __dev_alloc_page
Drop the bloated use of __skb_alloc_page and replace it with
__dev_alloc_page. In addition update the one other spot that is
allocating a page so that it allocates with the correct flags.
Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Casey Leedom <leedom@chelsio.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 11 Nov 2014 17:26:34 +0000 (09:26 -0800)]
net: Add device Rx page allocation function
This patch implements __dev_alloc_pages and __dev_alloc_page. These are
meant to replace the __skb_alloc_pages and __skb_alloc_page functions. The
reason for doing this is that it occurred to me that __skb_alloc_page is
supposed to be passed an sk_buff pointer, but it is NULL in all cases where
it is used. Worse is that in the case of ixgbe it is passed NULL via the
sk_buff pointer in the rx_buffer info structure which means the compiler is
not correctly stripping it out.
The naming for these functions is based on dev_alloc_skb and __dev_alloc_skb.
There was originally a netdev_alloc_page, however that was passed a
net_device pointer and this function is not so I thought it best to follow
that naming scheme since that is the same difference between dev_alloc_skb
and netdev_alloc_skb.
In the case of anything greater than order 0 it is assumed that we want a
compound page so __GFP_COMP is set for all allocations as we expect a
compound page when assigning a page frag.
The other change in this patch is to exploit the behaviors of the page
allocator in how it handles flags. So for example we can always set
__GFP_COMP and __GFP_MEMALLOC since they are ignored if they are not
applicable or are overridden by another flag.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 11 Nov 2014 21:29:42 +0000 (13:29 -0800)]
net: kill netif_copy_real_num_queues()
vlan was the only user of netif_copy_real_num_queues(),
but it no longer calls it after
commit 4af429d29b341bb1735f04c2fb960178 ("vlan: lockless transmit path").
So we can just remove it.
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 11 Nov 2014 21:26:42 +0000 (16:26 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-11-11
This series contains updates to i40e, i40evf and ixgbe.
Kamil updated the i40e and i40evf driver to poll the firmware slower
since we were polling faster than the firmware could respond.
Shannon updates i40e to add a check to keep the service_task from
running the periodic tasks more than once per second, while still
allowing quick action to service the events.
Jesse cleans up the throttle rate code by fixing the minimum interrupt
throttle rate and removing some unused defines.
Mitch makes the early init admin queue message receive code more robust
by handling messages in a loop and ignoring those that we are not
interested in. This also gets rid of some scary log messages that
really do not indicate a problem.
Don provides several ixgbe patches, first fixes an issue with x540
completion timeout where on topologies including few levels of PCIe
switching for x540 can run into an unexpected completion error. Cleans
up the functionality in ixgbe_ndo_set_vf_vlan() in preparation for
future work. Adds support for x550 MAC's to the driver.
v2:
- Remove code comment in patch 01 of the series, based on feedback from
David Liaght
- Updated the "goto" to "break" statements in patch 06 of the series,
based on feedback from Sergei Shtylyov
- Initialized the variable err due to the possibility of use before
being assigned a value in patch 07 of the series
- Added patch "ixgbe: add helper function for setting RSS key in
preparation of X550" since it is needed for the addition of X550 MAC
support
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudip Mukherjee [Tue, 11 Nov 2014 08:40:47 +0000 (14:10 +0530)]
usbnet: smsc95xx: dereferencing NULL pointer
we were dereferencing dev to initialize pdata. but just after that we
have a BUG_ON(!dev). so we were basically dereferencing the pointer
first and then tesing it for NULL.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Mon, 10 Nov 2014 23:59:36 +0000 (15:59 -0800)]
neigh: remove dynamic neigh table registration support
Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.
Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Mon, 10 Nov 2014 10:38:59 +0000 (12:38 +0200)]
stmmac: split to core library and probe drivers
Instead of registering the platform and PCI drivers in one module let's move
necessary bits to where it belongs. During this procedure we convert the module
registration part to use module_*_driver() macros which makes code simplier.
>From now on the driver consists three parts: core library, PCI, and platform
drivers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 11 Nov 2014 18:59:17 +0000 (10:59 -0800)]
net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.
All messages are still ratelimited.
Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled. Even so,
these messages are now _not_ emitted by default.
This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings". For backward compatibility,
the sysctl is not removed, but it has no function. The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c
Miscellanea:
o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 11 Nov 2014 18:32:25 +0000 (13:32 -0500)]
Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:
====================
Open vSwitch
Following batch of patches brings feature parity between upstream
ovs and out of tree ovs module.
Two features are added, first adds support to export egress
tunnel information for a packet. This is used to improve
visibility in network traffic. Second feature allows userspace
vswitchd process to probe ovs module features. Other patches
are optimization and code cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 11 Nov 2014 18:20:07 +0000 (13:20 -0500)]
Merge branch 'mlx4-next'
Or Gerlitz says:
====================
mlx4: Add CHECKSUM_COMPLETE support
These patches from Shani, Matan and myself add support for
CHECKSUM_COMPLETE reporting on non TCP/UDP packets such as
GRE and ICMP. I'd like to deeply thank Jerry Chu for his
innovation and support in that effort.
Based on the feedback from Eric and Ido Shamay, in V2 we dropped
the patch which removed the calls to napi_gro_frags() and added
a patch which makes the RX code to go through that path
regardless of the checksum status.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shani Michaeli [Sun, 9 Nov 2014 11:51:53 +0000 (13:51 +0200)]
net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
When processing received traffic, pass CHECKSUM_COMPLETE status to the
stack, with calculated checksum for non TCP/UDP packets (such
as GRE or ICMP).
Although the stack expects checksum which doesn't include the pseudo
header, the HW adds it. To address that, we are subtracting the pseudo
header checksum from the checksum value provided by the HW.
In the IPv6 case, we also compute/add the IP header checksum which
is not added by the HW for such packets.
Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shani Michaeli [Sun, 9 Nov 2014 11:51:52 +0000 (13:51 +0200)]
net/mlx4_en: Extend usage of napi_gro_frags
We can call napi_gro_frags for all the received traffic regardless
of the checksum status. Specifically, received packets whose status
is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE)
are eligible for napi_gro_frags as well.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 11 Nov 2014 13:54:28 +0000 (05:54 -0800)]
net: introduce SO_INCOMING_CPU
Alternative to RPS/RFS is to use hardware support for multiple
queues.
Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.
Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.
We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.
After accept(), connect(), or even file descriptor passing around
processes, applications can use :
And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 11 Nov 2014 13:54:27 +0000 (05:54 -0800)]
tcp: move sk_mark_napi_id() at the right place
sk_mark_napi_id() is used to record for a flow napi id of incoming
packets for busypoll sake.
We should do this only on established flows, not on listeners.
This was 'working' by virtue of the socket cloning, but doing
this on SYN packets in unecessary cache line dirtying.
Even if we move sk_napi_id in the same cache line than sk_lock,
we are working to make SYN processing lockless, so it is desirable
to set sk_napi_id only for established flows.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 7 Nov 2014 03:53:35 +0000 (03:53 +0000)]
ixgbe: Add new support for X550 MAC's
This patch will add in the new MAC defines and fit it into the switch
cases throughout the driver. New functionality and enablement support will
be added in following patches.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 5 Nov 2014 04:52:09 +0000 (04:52 +0000)]
ixgbe: cleanup move setting PFQDE.HIDE_VLAN to support function.
Move setting of drop enable to support function. This not only makes the
code more readable but is also prep for following patches that add
additional MAC support.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 29 Oct 2014 07:23:41 +0000 (07:23 +0000)]
ixgbe: fix X540 Completion timeout
On topologies including few levels of PCIe switching X540 can run into an
unexpected completion error. We get around this by waiting after enabling
loopback a sufficient amount of time until Tx Data Fetch is sent. We then
poll the pending transaction bit to ensure we received the completion. Only
then do we go on to clear the buffers.
Signed-of-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Sat, 25 Oct 2014 03:24:34 +0000 (03:24 +0000)]
i40evf: don't use more queues than CPUs
It's kind of silly to configure and attempt to use a bunch of queue
pairs when you're running on a single (virtual) CPU. Instead of
unconditionally configuring all of the queues that the PF gives us,
clamp the number of queue pairs to the number of CPUs.
Change-ID: I321714c9e15072ee76de8f95ab9a81f86ed347d1 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Sat, 25 Oct 2014 03:24:33 +0000 (03:24 +0000)]
i40evf: make early init processing more robust
In early init, if we get an unexpected message from the PF (such as link
status), we just kick an error back to the init task, causing it to
restart its state machine and delaying initialization.
Make the early init AQ message receive code more robust by handling
messages in a loop, and ignoring those that we aren't interested in.
This also gets rid of some scary log messages that really didn't
indicate a problem.
Change-ID: I620e8c72e49c49c665ef33eeab2425dd10e721cf Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 25 Oct 2014 03:24:32 +0000 (03:24 +0000)]
i40e: clean up throttle rate code
The interrupt throttle rate minimum is actually 2us, so
fix that define and while we are there, remove some unused defines.
Change some strings in the function to be a bit less wrappy, and
express the correct limits.
Change-ID: I96829bbc77935e0b57c6f0fc1439fb4152b2960a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Sat, 25 Oct 2014 10:35:25 +0000 (10:35 +0000)]
i40e: don't do link_status or stats collection on every ARQ
The ARQ events cause a service_task execution, and we do a link_status
check and full stats gathering for each service_task. However, when
there are a lot of ARQ events, such as when doing an NVM update, we end up
doing 10's if not 100's of these per second, thereby heavily abusing the
PCI bus and especially the Firmware. This patch adds a check to keep the
service_task from running these periodic tasks more than once per second,
while still allowing quick action to service the events.
Change-ID: Iec7670c37bfae9791c43fec26df48aea7f70b33e Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kamil Krawczyk [Sat, 25 Oct 2014 03:24:30 +0000 (03:24 +0000)]
i40e: poll firmware slower
The code was polling the firmware tail register for completion every
10 microseconds, which is way faster than the firmware can respond.
This changes the poll interval to 1ms, which reduces polling CPU
utilization, and the number of times we loop.
The maximum delay is still 100ms.
Change-ID: I4bbfa6b66d802890baf8b4154061e55942b90958 Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eric Dumazet [Mon, 10 Nov 2014 22:07:20 +0000 (14:07 -0800)]
mlx4: restore conditional call to napi_complete_done()
After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up
calling napi_complete_done() in the case NAPI poll consumed all its
budget.
This added extra interrupt pressure, this patch restores proper
behavior.
Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1a28817282 ("mlx4: use napi_complete_done()") Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series contains fixes for race-conditions in sunvnet,
that can encountered when there is a difference in latency between
producer and consumer.
Patch 1 addresses a case when the STOPPED LDC ack from a peer is
processed before vnet_start_xmit can finish updating the dr->prod
state.
Patch 2 fixes the edge-case when outgoing data and incoming
stopped-ack cross each other in flight.
Patch 3 adds a missing rcu_read_unlock(), found by code-inspection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sunvnet: vnet_ack() should check if !start_cons to send a missed trigger
As per comments in vnet_start_xmit, for the edge case
when outgoing vnet_start_xmit() data and an incoming STOPPED
ACK cross each other in flight, we may need to send the missed
START trigger from maybe_tx_wakeup() after checking for a
false value of start_cons
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
__vnet_tx_trigger for some desc X
at this point dr->prod == X
peer sends back a stopped ack for X
we process X, but X == dr->prod
so we bail out in vnet_ack with
!idx_is_pending
update dr->prod
As a result of the fact that we never processed the stopped ack for X,
the Tx path is led to incorrectly believe that the peer is still
"started" and reading, but the peer has stopped reading, which will
ultimately end in flow-control assertions.
The fix is to synchronize the above 2 paths on the netif_tx_lock.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alban Bedel [Sat, 8 Nov 2014 11:48:36 +0000 (12:48 +0100)]
8139too: Allow using the largest possible MTU
This driver allows MTU up to 1518 bytes which is not enought to run
batman-adv. Simply raise the maximum packet size up to the maximum
allowed by the transmit descriptor, 1792 bytes, giving a maximum MTU
of 1774 bytes.
Signed-off-by: Alban Bedel <albeu@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Please pull this batch of updates intended for the 3.19 stream!
For the mac80211 bits, Johannes says:
"This relatively large batch of changes is comprised of the following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions"
For the Bluetooth bits, Johan says:
"Here's the first bluetooth-next pull request for 3.19. The vast majority
of patches are for ieee802154 from Alexander Aring with various fixes
and cleanups. There are also several LE/SMP fixes as well as improved
support for handling LE devices that have lost their pairing information
(the patches from Alfonso). Jukka provides a couple of stability fixes
for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
we have one new USB ID for an Acer controller as well as a reset
handling fix for H5."
For the Atheros bits, Kalle says:
"Major changes are:
o ethtool support (Ben)
o print dev string prefix with debug hex buffers dump (Michal)
o debugfs file to read calibration data from the firmware verification
purposes (me)
o fix fw_stats debugfs file, now results are more reliable (Michal)
o firmware crash counters via debugfs (Ben&me)
o various tracing points to debug firmware (Rajkumar)
o make it possible to provide firmware calibration data via a file (me)
And we have quite a lot of smaller fixes and clean up."
For the iwlwifi bits, Emmanuel says:
"The big new thing here is netdetect which allows the
firmware to wake up the platform when a specific network
is detected. Along with that I have fixes for d3 operation.
The usual amount of rate scaling stuff - we now support STBC.
The other commit that stands out is Johannes's work on
devcoredump. He basically starts to use the standard
infrastructure he built."
Along with that are the usual sort of updates and such for ath9k,
brcmfmac, wil6210, and a handful of other bits here and there...
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Nov 2014 19:25:49 +0000 (14:25 -0500)]
Merge branch 'raw_probe_proto_opt'
Herbert Xu says:
====================
ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice
This series rewrites the function raw_probe_proto_opt in a more
readable fasion, and then fixes the long-standing bug where we
read the probed bytes twice which means that what we're using to
probe may in fact be invalid.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 7 Nov 2014 13:27:09 +0000 (21:27 +0800)]
ipv4: Avoid reading user iov twice after raw_probe_proto_opt
Ever since raw_probe_proto_opt was added it had the problem of
causing the user iov to be read twice, once during the probe for
the protocol header and once again in ip_append_data.
This is a potential security problem since it means that whatever
we're probing may be invalid. This patch plugs the hole by
firstly advancing the iov so we don't read the same spot again,
and secondly saving what we read the first time around for use
by ip_append_data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 7 Nov 2014 13:27:08 +0000 (21:27 +0800)]
ipv4: Use standard iovec primitive in raw_probe_proto_opt
The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets. In doing so it's processing iovec by hand and
overcomplicating things.
This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>