Ben Hutchings [Tue, 10 Jul 2012 10:56:00 +0000 (10:56 +0000)]
drivers/net/ethernet: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches. Delete
a few that are content-free.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 05:53:57 +0000 (22:53 -0700)]
Merge branch 'metrics_restructure'
This patch series works towards the goal of minimizing the amount
of things that can change in an ipv4 route.
In a regime where the routing cache is removed, route changes will
lead to cloning in the FIB tables or similar.
The largest trigger of route metrics writes, TCP, now has it's own
cache of dynamic metric state. The timewait timestamps are stored
there now as well.
As a result of that, pre-cowing metrics is no longer necessary,
and therefore FLOWI_FLAG_PRECOW_METRICS is removed.
Redirect and PMTU handling is moved back into the ipv4 routes. I'm
sorry for all the headaches trying to do this in the inetpeer has
caused, it was the wrong approach for sure.
Since metrics become read-only for ipv4 we no longer need the inetpeer
hung off of the ipv4 routes either. So those disappear too.
Also, timewait sockets no longer need to hold onto an inetpeer either.
After this series, we still have some details to resolve wrt. PMTU and
redirects for a route-cache-less system:
1) With just the plain route cache removal, PMTU will continue to
work mostly fine. This is because of how the local route users
call down into the PMTU update code with the route they already
hold.
However, if we wish to cache pre-computed routes in fib_info
nexthops (which we want for performance), then we need to add
route cloning for PMTU events.
2) Redirects require more work. First, redirects must be changed to
be handled like PMTU. Wherein we call down into the sockets and
other entities, and then they call back into the routing code with
the route they were using.
So we'll be adding an ->update_nexthop() method alongside
->update_pmtu().
And then, like for PMTU, we'll need cloning support once we start
caching routes in the fib_info nexthops.
But that's it, we can completely pull the trigger and remove the
routing cache with minimal disruptions.
As it is, this patch series alone helps a lot of things. For one,
routing cache entry creation should be a lot faster, because we no
longer do inetpeer lookups (even to check if an entry exists).
This patch series also opens the door for non-DST_HOST ipv4 routes,
because nothing fundamentally cares about rt->rt_dst any more. It
can be removed with the base routing cache removal patch. In fact,
that was the primary goal of this patch series.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 11:01:57 +0000 (04:01 -0700)]
inet: Kill FLOWI_FLAG_PRECOW_METRICS.
No longer needed. TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics. Therefore there is no
longer any reason to pre-cow metrics in this way.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 10 Jul 2012 07:49:14 +0000 (00:49 -0700)]
tcp: Maintain dynamic metrics in local cache.
Maintain a local hash table of TCP dynamic metrics blobs.
Computed TCP metrics are no longer maintained in the route metrics.
The table uses RCU and an extremely simple hash so that it has low
latency and low overhead. A simple hash is legitimate because we only
make metrics blobs for fully established connections.
Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking. But
the basic design seems sound.
With help from Eric Dumazet and Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Tue, 10 Jul 2012 06:18:44 +0000 (06:18 +0000)]
etherdevice: introduce eth_broadcast_addr
A lot of code has either the memset or an inefficient copy
from a static array that contains the all-ones broadcast
address. Introduce eth_broadcast_addr() to fill an address
with all ones, making the code clearer and allowing us to
get rid of some constant arrays.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 11 Jul 2012 01:05:28 +0000 (18:05 -0700)]
ipv4: Fix crashes in fib_rules_tclass().
All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any
successful call to fib_lookup() will initialize the fib_result->r
value to something.
We violated that expectation in the new fib_lookup() fast path.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Mon, 9 Jul 2012 21:57:36 +0000 (16:57 -0500)]
net/fsl_pq_mdio: use spin_event_timeout() to poll the indicator register
Macro spin_event_timeout() was designed for simple polling of hardware
registers with a timeout, so use it when we poll the MIIMIND register.
This allows us to return an error code instead of polling indefinitely.
Note that PHY_INIT_TIMEOUT is a count of loop iterations, so we can't use
it for spin_event_timeout(), which asks for microseconds.
Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
as the manual of module_pci_driver says that
it can be used when the init and exit functions of
the module does nothing but the pci_register_driver
and pci_unregister_driver.
use it for rdc's r6040 driver, as the init and exit
paths does as above, and also this reduces a little
amount of code.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Acked-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 9 Jul 2012 06:02:24 +0000 (06:02 +0000)]
bnx2x: populate skb->l4_rxhash
l4_rxhash is set on skb when rxhash is obtained from canonical 4-tuple
over transport ports/addresses.
We can set skb->l4_rxhash for all incoming TCP packets on bnx2x for
free, as cqe status contains a hash type information.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang [Mon, 2 Jul 2012 09:23:22 +0000 (17:23 +0800)]
r8169: support RTL8168G
For RTL8111G, the settings of phy and firmware are replaced with
ocp functions. r8168g_mdio_{write / read} redirects the relative
settings to suitable ocp functions. A per-device variable is needed
to evaluate the real address of ocp functions.
rtl_writephy(tp, 0x1f, xxxx) is dedicated to keeping said variable
up-to-date.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Li RongQing [Wed, 4 Jul 2012 16:05:42 +0000 (16:05 +0000)]
be2net: Fix Endian
ETH_P_IP is host Endian, skb->protocol is big Endian, when
compare them, we should change ETH_P_IP from host endian
to big endian, htons, not ntohs.
CC: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Korsgaard [Wed, 4 Jul 2012 00:33:57 +0000 (00:33 +0000)]
bcm87xx: disable autonegotiation by default
The bcm87xx phys don't support autonegotiation, so don't use it by
default, as otherwise phy_state_machine() will try to enable it (using
c22 requests, which also don't make any sense for the bcm78xx).
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 2 Jul 2012 09:59:24 +0000 (09:59 +0000)]
sctp: refactor sctp_packet_append_chunk and clenup some memory leaks
While doing some recent work on sctp sack bundling I noted that
sctp_packet_append_chunk was pretty inefficient. Specifially, it was called
recursively while trying to bundle auth and sack chunks. Because of that we
call sctp_packet_bundle_sack and sctp_packet_bundle_auth a total of 4 times for
every call to sctp_packet_append_chunk, knowing that at least 3 of those calls
will do nothing.
So lets refactor sctp_packet_bundle_auth to have an outer part that does the
attempted bundling, and an inner part that just does the chunk appends. This
saves us several calls per iteration that we just don't need.
Also, noticed that the auth and sack bundling fail to free the chunks they
allocate if the append fails, so make sure we add that in
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Sat, 30 Jun 2012 01:49:35 +0000 (01:49 +0000)]
bnx2i: use strlcpy() instead of memcpy() for strings
DRV_MODULE_VERSION here is "2.7.2.2" which is only 8 chars but we copy
12 bytes from the stack so it's a small information leak.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Michael Chan <mchan@broadcom.com> Acked-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Jul 2012 04:31:01 +0000 (04:31 +0000)]
asix: avoid copies in tx path
I noticed excess calls to skb_copy_expand() or memmove() in asix driver.
This driver needs to push 4 bytes in front of frame (packet_len)
and maybe add 4 bytes after the end (if padlen is 4)
So it should set needed_headroom & needed_tailroom to avoid
copies. But its not enough, because many packets are cloned
before entering asix_tx_fixup() and this driver use skb_cloned()
as a lazy way to check if it can push and put additional bytes in frame.
Avoid skb_copy_expand() expensive call, using following rules :
- We are allowed to push 4 bytes in headroom if skb_header_cloned()
is false (and if we have 4 bytes of headroom)
- We are allowed to put 4 bytes at tail if skb_cloned()
is false (and if we have 4 bytes of tailroom)
TCP packets for example are cloned, but skb_header_release()
was called in tcp stack, allowing us to use headroom for our needs.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Allan Chou <allan@asix.com.tw> Cc: Trond Wuellner <trond@chromium.org> Cc: Grant Grundler <grundler@chromium.org> Cc: Paul Stewart <pstew@chromium.org> Cc: Ming Lei <tom.leiming@gmail.com> Tested-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:50 +0000 (04:03 +0000)]
net/mlx4_en: Add support for drop action through ethtool
The drop action is implemented by allocating a QP and keeping it in a reset state
such that the HW drops any packets which are steered to that QP. When a drop action
is requested, we attach the relevant flow to that QP.
Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:49 +0000 (04:03 +0000)]
net/mlx4_en: Manage flow steering rules with ethtool
Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
the existing get_rxnfc callback.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:48 +0000 (04:03 +0000)]
net/mlx4: Implement promiscuous mode with device managed flow-steering
The device managed flow steering API has three promiscuous modes:
1. Uplink - captures all the packets that arrive to the port.
2. Allmulti - captures all multicast packets arriving to the port.
3. Function port - for future use, this mode is not implemented yet.
Use these modes with the flow_attach and flow_detach firmware commands
according to the promiscuous state of the netdevice.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:47 +0000 (04:03 +0000)]
net/mlx4_core: Add resource tracking for device managed flow steering rules
As with other device resources, the resource tracker is needed for supporting
device managed flow steering rules under SRIOV: make sure virtual functions
delete only rules created by them, and clean all rules attached by a crashed VF.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:46 +0000 (04:03 +0000)]
{NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.
If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.
When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.
When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.
Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:45 +0000 (04:03 +0000)]
net/mlx4_core: Add firmware commands to support device managed flow steering
Add support for firmware commands to attach/detach a new device managed
steering mode. Such network steering rules allow the user to provide an
L2/L3/L4 flow specification to the firmware and have the device to steer
traffic that matches that specification to the provided QP.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:44 +0000 (04:03 +0000)]
net/mlx4: Set steering mode according to device capabilities
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.
This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.
A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.
B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,
The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.
To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed.
Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:42 +0000 (04:03 +0000)]
net/mlx4_core: Change resource tracking ID to be 64 bit
Currently the IDs used by the resource tracker are of type u32, so far this was
ok since all the different resources we were tracking could be encoded in 32bit.
As a preparation step for tracking of resources whose IDs need > 32 bits such
as network flow steering rules, who are 64 bit in size, move to use 64 bit
based resource IDs.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 5 Jul 2012 04:03:41 +0000 (04:03 +0000)]
net/mlx4_core: Change resource tracking mechanism to use red-black tree
Change the data structure used for managing the SRIOV resource tracking
mechanism from radix tree to red-black tree. This is preparation step
for supporting resource IDs which are 64bit long, such as network flow
steering rules. Such IDs can't be used as radix-tree keys on 32bit
architectures and hence the reason for the change.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Initialize the neighbour pointer of rt6_info on allocation
git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct)
added a neighbour pointer to rt6_info. Currently we don't initialize
this pointer at allocation time. We assume this pointer to be valid
if it is not a null pointer, so initialize it on allocation.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Accorind to the IEEE 802.15.4 standard, device has 8-byte length address,
so this hook loses the last 2 bytes which may rise a compatibility problems
with other IEEE 802.15.4 standard implementations.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Cc: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
myri10ge: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Cc: Jon Mason <mason@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Cc: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Cc: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Cc: Jon Mason <jdmason@kudzu.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Most multi-queue networking driver consider the number of online cpus when
configuring RSS queues.
This patch adds a wrapper to the number of cpus, setting an upper limit on the
number of cpus a driver should consider (by default) when allocating resources
for his queues.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Jul 2012 09:15:37 +0000 (02:15 -0700)]
sunrpc: Don't do a dst_confirm() on an input routes.
xs_udp_data_ready() is operating on received packets, and tries to
do a dst_confirm() on the dst attached to the SKB.
This isn't right, dst confirmation is for output routes, not input
rights. It's for resetting the timers on the nexthop neighbour entry
for the route, indicating that we've got good evidence that we've
successfully reached it.
Signed-off-by: David S. Miller <davem@davemloft.net>