]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
11 years agobonding: remove unnecessary setup_by_slave member
nikolay@redhat.com [Wed, 26 Jun 2013 15:13:37 +0000 (17:13 +0200)]
bonding: remove unnecessary setup_by_slave member

We have a member called setup_by_slave in struct bonding to denote if the
bond dev has different type than ARPHRD_ETHER, but that is already denoted
in bond's netdev type variable if it was setup by the slave, so use that
instead of the member.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetlink: fix splat in skb_clone with large messages
Pablo Neira [Fri, 28 Jun 2013 01:04:23 +0000 (03:04 +0200)]
netlink: fix splat in skb_clone with large messages

Since (c05cdb1 netlink: allow large data transfers from user-space),
netlink splats if it invokes skb_clone on large netlink skbs since:

* skb_shared_info was not correctly initialized.
* skb->destructor is not set in the cloned skb.

This was spotted by trinity:

[  894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
[  894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0
[...]
[  894.991034] Call Trace:
[  894.991034]  [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240
[  894.991034]  [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40
[  894.991034]  [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0
[  894.991034]  [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0

Fix it by:

1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
   that sets our special skb->destructor in the cloned skb. Moreover, handle
   the release of the large cloned skb head area in the destructor path.

2) not allowing large skbuffs in the netlink broadcast path. I cannot find
   any reasonable use of the large data transfer using netlink in that path,
   moreover this helps to skip extra skb_clone handling.

I found two more netlink clients that are cloning the skbs, but they are
not in the sendmsg path. Therefore, the sole client cloning that I found
seems to be the fib frontend.

Thanks to Eric Dumazet for helping to address this issue.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosit: add support of x-netns
Nicolas Dichtel [Wed, 26 Jun 2013 14:11:28 +0000 (16:11 +0200)]
sit: add support of x-netns

This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodev: introduce skb_scrub_packet()
Nicolas Dichtel [Wed, 26 Jun 2013 14:11:27 +0000 (16:11 +0200)]
dev: introduce skb_scrub_packet()

The goal of this new function is to perform all needed cleanup before sending
an skb into another netns.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofec: Add support for reading RMON registers
Chris Healy [Wed, 26 Jun 2013 06:18:52 +0000 (23:18 -0700)]
fec: Add support for reading RMON registers

Add ethtool operation to read RMON registers.

Tested against net-next on i.MX28.

v2: make conditional on #ifndef CONFIG_M5272

Signed-off-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: rearm router solicitaion timer when setting new tokenized address
Hannes Frederic Sowa [Wed, 26 Jun 2013 01:41:49 +0000 (03:41 +0200)]
ipv6: rearm router solicitaion timer when setting new tokenized address

When a new tokenized address gets installed we send out just one
router solicition. We should send out `rtr_solicits' in case one router
advertisment got lost.

So, rearm the timer as we do in addrconf_dad_complete.

Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosit: fix 4in4 + IPsec scenario
Nicolas Dichtel [Wed, 26 Jun 2013 15:40:33 +0000 (17:40 +0200)]
sit: fix 4in4 + IPsec scenario

Since commit 32b8a8e59c9c "sit: add IPv4 over IPv4 support",
tunnel->parms.iph.protocol is 0 when both 4in4 and 6in4 are setup, but
xfrm_lookup() is called only when proto is != 0, thus we need to pass the real
value.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Wed, 26 Jun 2013 20:23:13 +0000 (13:23 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
Just one patch this time.

1) Drop packets when the matching SA is in larval state and add a
   statistic counter for that. From Fan Du.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoarc_emac: fix compile-time errors & warnings on PPC64
Alexey Brodkin [Wed, 26 Jun 2013 07:49:26 +0000 (11:49 +0400)]
arc_emac: fix compile-time errors & warnings on PPC64

As reported by "kbuild test robot" there were some errors and warnings
on attempt to build kernel with "make ARCH=powerpc allmodconfig".

And this patch addresses both errors and warnings.
Below is a list of introduced changes:
1. Fix compile-time errors (misspellings in "dma_unmap_single") on PPC.
2. Use DMA address instead of "skb->data" as a pointer to data buffer.
This fixed warnings on pointer to int conversion on 64-bit systems.
3. Re-implemented initial allocation of Rx buffers in "arc_emac_open" in
the same way they're re-allocated during operation (receiving packets).
So once again DMA address could be used instead of "skb->data".
4. Explicitly use EMAC_BUFFER_SIZE for Rx buffers allocation.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: netdev@vger.kernel.org
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Joe Perches <joe@perches.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Mischa Jonker <mjonker@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: linux-kernel@vger.kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Cc: Florian Fainelli <florian@openwrt.org>
Cc: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: add an option to fail when any of arp_ip_target is inaccessible
Veaceslav Falico [Mon, 24 Jun 2013 09:49:34 +0000 (11:49 +0200)]
bonding: add an option to fail when any of arp_ip_target is inaccessible

Currently, we fail only when all of the ips in arp_ip_target are gone.
However, in some situations we might need to fail if even one host from
arp_ip_target becomes unavailable.

All situations, obviously, rely on the idea that we need *completely*
functional network, with all interfaces/addresses working correctly.

One real world example might be:
vlans on top on bond (hybrid port). If bond and vlans have ips assigned
and we have their peers monitored via arp_ip_target - in case of switch
misconfiguration (trunk/access port), slave driver malfunction or
tagged/untagged traffic dropped on the way - we will be able to switch
to another slave.

Though any other configuration needs that if we need to have access to all
arp_ip_targets.

This patch adds this possibility by adding a new parameter -
arp_all_targets (both as a module parameter and as a sysfs knob). It can be
set to:

0 or any (the default) - which works exactly as it's working now -
the slave is up if any of the arp_ip_targets are up.

1 or all - the slave is up if all of the arp_ip_targets are up.

This parameter can be changed on the fly (via sysfs), and requires the mode
to be active-backup and arp_validate to be enabled (it obeys the
arp_validate config on which slaves to validate).

Internally it's done through:

1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's
   an array of jiffies, meaning that slave->target_last_arp_rx[i] is the
   last time we've received arp from bond->params.arp_targets[i] on this
   slave.

2) If we successfully validate an arp from bond->params.arp_targets[i] in
   bond_validate_arp() - update the slave->target_last_arp_rx[i] with the
   current jiffies value.

3) When getting slave's last_rx via slave_last_rx(), we return the oldest
   time when we've received an arp from any address in
   bond->params.arp_targets[].

If the value of arp_all_targets == 0 - we still work the same way as
before.

Also, update the documentation to reflect the new parameter.

v3->v4:
Kill the forgotten rtnl_unlock(), rephrase the documentation part to be
more clear, don't fail setting arp_all_targets if arp_validate is not set -
it has no effect anyway but can be easier to set up. Also, print a warning
if the last arp_ip_target is removed while the arp_interval is on, but not
the arp_validate.

v2->v3:
Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new
arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(),
use the same initialization value for target_last_arp_rx[] as is used
for the default last_arp_rx, to avoid useless interface flaps.

Also, instead of failing to remove the last arp_ip_target just print a
warning - otherwise it might break existing scripts.

v1->v2:
Correctly handle adding/removing hosts in arp_ip_target - we need to
shift/initialize all slave's target_last_arp_rx. Also, don't fail module
loading on arp_all_targets misconfiguration, just disable it, and some
minor style fixes.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: doc: some details on backup slave arp validation
Veaceslav Falico [Mon, 24 Jun 2013 09:49:33 +0000 (11:49 +0200)]
bonding: doc: some details on backup slave arp validation

Add some details to bonding documentation on how backup slave arp
validation works.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: don't trust arp requests unless active slave really works
Veaceslav Falico [Mon, 24 Jun 2013 09:49:32 +0000 (11:49 +0200)]
bonding: don't trust arp requests unless active slave really works

Currently, if we receive any arp packet on a backup slave in active-backup
mode and arp_validate enabled, we suppose that it's an arp request, swap
source/target ip and try to validate it. This optimization gives us
virtually no downtime in the most common situation (active and backup
slaves are in the same broadcast domain and the active slave failed).

However, if we can't reach the arp_ip_target(s), we end up in an endless
loop of reselecting slaves, because we receive our arp requests, sent by
the active slave, and think that backup slaves are up, thus selecting them
as active and, again, sending arp requests, which fool our backup slaves.

Fix this by not validating the swapped arp packets if the current active
slave didn't receive any arp reply after it was selected as active. This
way we will only accept arp requests if we know that the current active
slave can actually reach arp_ip_target.

v3->v4:
Obey 80 lines and make checkpatch.pl happy, per Sergei's suggestion.

v1->v3:
No change.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: don't validate arp if we don't have to
Veaceslav Falico [Mon, 24 Jun 2013 09:49:31 +0000 (11:49 +0200)]
bonding: don't validate arp if we don't have to

Currently, we validate all the incoming arps if arp_validate not 0.
However, we don't have to validate backup slaves if arp_validate == active
and vice versa, so return early in bond_arp_rcv() in these cases.

It works correctly now because we verify arp_validate in slave_last_rx(),
however we're just doing useless work in bond_arp_rcv().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: don't add duplicate targets to arp_ip_target
Veaceslav Falico [Mon, 24 Jun 2013 09:49:30 +0000 (11:49 +0200)]
bonding: don't add duplicate targets to arp_ip_target

Print a warning and skip them.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: add helper function bond_get_targets_ip(targets, ip)
Veaceslav Falico [Mon, 24 Jun 2013 09:49:29 +0000 (11:49 +0200)]
bonding: add helper function bond_get_targets_ip(targets, ip)

Add function bond_get_targets_ip(targets, ip) which searches through
targets array of ips (arp_targets) and returns the position of first
match. If ip == 0, returns the first free slot. On failure to find the
ip or free slot, return -1.

Use it to verify if the arp we've received is valid and in sysfs.

v1->v2:
Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)",
per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might
update 'null' arp_ip_targets' last_rx. Also, address style.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: davinci_mdio: gaurd the DT code with IS_ENABLED(CONFIG_OF)
Lad, Prabhakar [Tue, 25 Jun 2013 15:54:53 +0000 (21:24 +0530)]
net: davinci_mdio: gaurd the DT code with IS_ENABLED(CONFIG_OF)

guard the davinci_mdio_of_mtable table and davinci_mdio_probe_dt()
with CONFIG_OF.

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: davinci_emac: simplify the OF parser code
Lad, Prabhakar [Tue, 25 Jun 2013 15:54:52 +0000 (21:24 +0530)]
net: davinci_emac: simplify the OF parser code

This patch cleans up the OF parser code, removes unnecessary checks
on of_property_read_*() and guards davinci_emac_of_match table with
CONFIG_OF.

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: davinci: emac: Convert to devm_* api
Lad, Prabhakar [Tue, 25 Jun 2013 15:54:51 +0000 (21:24 +0530)]
net: davinci: emac: Convert to devm_* api

Use devm_ioremap_resource instead of devm_request_mem_region()/devm_ioremap()
and devm_request_irq() instead of request_irq().

This ensures more consistent error values and simplifies error paths.

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agodoc: fix some syntax errors in netlink mmap sample code
Cong Wang [Mon, 24 Jun 2013 11:46:54 +0000 (19:46 +0800)]
doc: fix some syntax errors in netlink mmap sample code

Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomacvtap: Perform GSO on forwarding path.
Vlad Yasevich [Tue, 25 Jun 2013 20:04:22 +0000 (16:04 -0400)]
macvtap: Perform GSO on forwarding path.

When macvtap forwards skb to its tap, it needs to check
if GSO needs to be performed.  This is sometimes necessary
when the HW device performed GRO, but the guest reading
from the tap does not support it (ex: Windows 7).

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomacvtap: Let TUNSETOFFLOAD actually controll offload features.
Vlad Yasevich [Tue, 25 Jun 2013 20:04:21 +0000 (16:04 -0400)]
macvtap: Let TUNSETOFFLOAD actually controll offload features.

When the user issues TUNSETOFFLOAD ioctl, macvtap does not do
anything other then to verify arguments.  This patch adds
functionality to allow users to actually control offload features.
NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the
features can be controlled.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomacvtap: Consistently use rcu functions
Vlad Yasevich [Tue, 25 Jun 2013 20:04:20 +0000 (16:04 -0400)]
macvtap: Consistently use rcu functions

Currently macvtap uses rcu_bh functions in its
user facing fuction macvtap_get_user() and macvtap_put_user().
However, its packet handlers use normal rcu as the rcu_read_lock()
is taken in netif_receive_skb().  We can safely discontinue
the usage or rcu with bh disabled.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomacvtap: Convert to using rtnl lock
Vlad Yasevich [Tue, 25 Jun 2013 20:04:19 +0000 (16:04 -0400)]
macvtap: Convert to using rtnl lock

Macvtap uses a private lock to protect the relationship between
macvtap_queue and macvlan_dev.  The private lock is not needed
since the relationship is managed by user via open(), release(),
and dellink() calls.  dellink() already happens under rtnl, so
we can safely convert open() and release(), and use it in ioctl()
as well.

Suggested by Eric Dumazet.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: poll/select low latency socket support
Eliezer Tamir [Mon, 24 Jun 2013 07:28:03 +0000 (10:28 +0300)]
net: poll/select low latency socket support

select/poll busy-poll support.

Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txt

Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.

When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.

Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoethernet/arc/arc_emac - Add new driver
Alexey Brodkin [Mon, 24 Jun 2013 05:54:27 +0000 (09:54 +0400)]
ethernet/arc/arc_emac - Add new driver

Driver for non-standard on-chip ethernet device ARC EMAC 10/100,
instantiated in some legacy ARC (Synopsys) FPGA Boards such as
ARCAngel4/ML50x.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Joe Perches <joe@perches.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Mischa Jonker <mjonker@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org
Cc: devicetree-discuss@lists.ozlabs.org
Cc: Florian Fainelli <florian@openwrt.org>
Cc: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: simplify sctp_get_port
Daniel Borkmann [Tue, 25 Jun 2013 16:17:30 +0000 (18:17 +0200)]
net: sctp: simplify sctp_get_port

No need to have an extra ret variable when we directly can return
the value of sctp_get_port_local().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: decouple cleaning some socket data from endpoint
Daniel Borkmann [Tue, 25 Jun 2013 16:17:29 +0000 (18:17 +0200)]
net: sctp: decouple cleaning some socket data from endpoint

Rather instead of having the endpoint clean the garbage from the
socket, use a sk_destruct handler sctp_destruct_sock(), that does
the job for that when there are no more references on the socket.
At least do this for our crypto transform through crypto_free_hash()
that is allocated when in listening state.

Also, perform sctp_put_port() only when sk is valid. At a later
point in time we can still determine if there's an option of
placing this into sk_prot->unhash() or sctp_endpoint_free() without
any races. For now, leave it in sctp_endpoint_destroy() though.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: minor: sctp_seq_dump_local_addrs add missing newline
Daniel Borkmann [Tue, 25 Jun 2013 16:17:28 +0000 (18:17 +0200)]
net: sctp: minor: sctp_seq_dump_local_addrs add missing newline

A trailing newline has been forgotten to add into the WARN().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: migrate cookie life from timeval to ktime
Daniel Borkmann [Tue, 25 Jun 2013 16:17:27 +0000 (18:17 +0200)]
net: sctp: migrate cookie life from timeval to ktime

Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.

We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoktime: add ms_to_ktime() and ktime_add_ms() helpers
Daniel Borkmann [Tue, 25 Jun 2013 16:17:26 +0000 (18:17 +0200)]
ktime: add ms_to_ktime() and ktime_add_ms() helpers

Add two ktime helper functions that i) convert a given msec value to
a ktime structure and ii) that adds a msec value to a ktime structure.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: remove TEST_FRAME ifdef
Daniel Borkmann [Tue, 25 Jun 2013 16:17:25 +0000 (18:17 +0200)]
net: sctp: remove TEST_FRAME ifdef

We do neither ship a test_frame.h, nor will this be compatible with
the 2.5 out-of-tree lksctp kernel test suite anyway. So remove this
artefact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Fail device init if num_vfs is negative
Jack Morgenstein [Tue, 25 Jun 2013 09:09:38 +0000 (12:09 +0300)]
net/mlx4_core: Fail device init if num_vfs is negative

Should not allow negative num_vfs

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Add warning in case of command timeouts
Dotan Barak [Tue, 25 Jun 2013 09:09:37 +0000 (12:09 +0300)]
net/mlx4_core: Add warning in case of command timeouts

Warning prints when there are command timeout to help debugging future
failures.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_core: Replace sscanf() with kstrtoint()
Dotan Barak [Tue, 25 Jun 2013 09:09:36 +0000 (12:09 +0300)]
net/mlx4_core: Replace sscanf() with kstrtoint()

It is not safe to use sscanf.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Remove an unnecessary test
Dotan Barak [Tue, 25 Jun 2013 09:09:35 +0000 (12:09 +0300)]
net/mlx4_en: Remove an unnecessary test

Since this variable is now part of a structure and not allocated dynamically,
this test is irrelevant now.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Add prints when TX timeout occurs
Yevgeny Petrilin [Tue, 25 Jun 2013 09:09:34 +0000 (12:09 +0300)]
net/mlx4_en: Add prints when TX timeout occurs

Print a warning when a TX timeout is detected

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Fix a race between napi poll function and RX ring cleanup
Eugenia Emantayev [Tue, 25 Jun 2013 09:09:33 +0000 (12:09 +0300)]
net/mlx4_en: Fix a race between napi poll function and RX ring cleanup

The RX rings were cleaned while there was still possible RX traffic completion
handling.
Change the sequance of events so that the port is closed and the QPs are being
stopped before RX cleanup.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Change log level from error to debug for vlan related messages
Eugenia Emantayev [Tue, 25 Jun 2013 09:09:32 +0000 (12:09 +0300)]
net/mlx4_en: Change log level from error to debug for vlan related messages

The port vlan table size is 126 (used for IBoE) so after 126 we will
not have space and the user need to see it only in debug print and not
error.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Move register_netdev() to the end of initialization function
Eugenia Emantayev [Tue, 25 Jun 2013 09:09:31 +0000 (12:09 +0300)]
net/mlx4_en: Move register_netdev() to the end of initialization function

To avoid a race between the open function and everything that happens after
register_netdev() move it to be the last operation called.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Do not query stats when device port is down
Jack Morgenstein [Tue, 25 Jun 2013 09:09:30 +0000 (12:09 +0300)]
net/mlx4_en: Do not query stats when device port is down

There are no counters allocated to the eth device when the port is down, so
this query is meaningless at that time.

It also leads to querying incorrect counters (since the counter_index is not
valid when the device port is down).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Fix resource leak in error flow
Dotan Barak [Tue, 25 Jun 2013 09:09:29 +0000 (12:09 +0300)]
net/mlx4_en: Fix resource leak in error flow

Wrong condition was used when calling iounmap.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: remove old token ipv6 address as soon as possible
Hannes Frederic Sowa [Mon, 24 Jun 2013 20:03:28 +0000 (22:03 +0200)]
ipv6: remove old token ipv6 address as soon as possible

If the tokenized ip address is re-set on an interface we depend on the
arrival of a new router advertisment to call addrconf_verify to clean
up the old address (which valid_lft is now set to 0). Old addresses can
linger around for a longer time if e.g. the source of router advertisments
vanishes.

So, call addrconf_verify immediately after setting the new tokenized
address to get rid of the old tokenized addresses.

Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: don't disable interface if last ipv6 address is removed
Hannes Frederic Sowa [Sun, 23 Jun 2013 22:22:20 +0000 (00:22 +0200)]
ipv6: don't disable interface if last ipv6 address is removed

The reason behind this change is that as soon as we delete
the last ipv6 address of an interface we also lose the
/proc/sys/net/ipv6/conf/<interface> directory. This seems to be a
usability problem for me.

I don't see any reason why we should shutdown ipv6 on that interface in
such cases.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: split duplicate address detection and router solicitation timer
Hannes Frederic Sowa [Sun, 23 Jun 2013 16:39:01 +0000 (18:39 +0200)]
ipv6: split duplicate address detection and router solicitation timer

This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.

The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.

If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.

Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomlx4: allow order-0 memory allocations in RX path
Eric Dumazet [Sun, 23 Jun 2013 15:17:56 +0000 (08:17 -0700)]
mlx4: allow order-0 memory allocations in RX path

Signed-off-by: Eric Dumazet <edumazet@google.com>
mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.

We therefore drop frames more than needed.

This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.

By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.

Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Tue, 25 Jun 2013 23:11:41 +0000 (16:11 -0700)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
1. Make EEH recovery work when using legacy interrupts, from Alexandre
   Rames.

2. Enable accelerated RFS for VLAN-tagged flows, from Andy Lutomirski.

3. Improve performance for non-TCP (and particularly UDP) traffic, which
   regressed in 3.10 when we switched to always allocating paged RX
   buffers.  Partly by Jon Cooper.

4. Some minor bug fixes to IOMMU detection, timestamping capabilities,
   and IRQ cleanup on the probe failure path.

I've dropped the RX skb cache, which improved some benchmarks but
perhaps needs some reworking to be more generally useful.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqeth: use default napi weight
Sebastian Ott [Mon, 24 Jun 2013 11:21:52 +0000 (13:21 +0200)]
qeth: use default napi weight

Since commit 82dc3c63c692b1e1d59378ecee948ac88e034aad
"net: introduce NAPI_POLL_WEIGHT" network drivers receive a warning
when they use napi weight higher than NAPI_POLL_WEIGHT. This patch
reduces QETH_NAPI_WEIGHT from 128 to 64 (NAPI_POLL_WEIGHT).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqeth: Fix crash on initial MTU size change
Stefan Raspl [Mon, 24 Jun 2013 11:21:51 +0000 (13:21 +0200)]
qeth: Fix crash on initial MTU size change

When the initial MTU size is changed prior to any activity on the device
(e.g. by attaching a z/VM vNIC already configured in Linux to a guestLAN),
we call dev_kfree_skb_irq(NULL) which results in a kernel panic.
Adding a proper check for NULL pointers to address this issue.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqeth: change default standard blkt settings for OSA
Ursula Braun [Mon, 24 Jun 2013 11:21:50 +0000 (13:21 +0200)]
qeth: change default standard blkt settings for OSA

blkt settings (or LAN idle settings) for an OSA Express card
determine when and how often an OSA Express card tells the
operating system about new incoming packets. The semantic of
these settings has changed starting with OSA Express3. Currently
the qeth standard settings apply to OSA Express2 and older
generations of OSA Express cards, while new generations of OSA
Express cards require extra coding of their reasonable default.

To cover future OSA Express generations the qeth default standard
blkt setting is now the desired setting for OSA generations
starting with OSA Express3, while the fixed set of older OSA
Express cards receives its blkt settings explicitly.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqeth: Increase default MTU for OSA devices
Stefan Raspl [Mon, 24 Jun 2013 11:21:49 +0000 (13:21 +0200)]
qeth: Increase default MTU for OSA devices

Increase the default MTU for real OSA devices in layer 2 mode
to 1500 Bytes for increased compatibility.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetiucv: remove unused macro
Andy Shevchenko [Mon, 24 Jun 2013 11:21:48 +0000 (13:21 +0200)]
netiucv: remove unused macro

If someone is interested to dump something they may consider to use
print_hex_dump() or print_hex_dump_bytes() kernel helpers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Remove sparse and coccinelle warnings
Yuval Mintz [Tue, 25 Jun 2013 05:50:11 +0000 (08:50 +0300)]
bnx2x: Remove sparse and coccinelle warnings

This patch solves several sparse issues as well as an unneeded semicolon
found via coccinelle.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: add include file to suppress sparse warnings
Eric Dumazet [Tue, 25 Jun 2013 08:30:11 +0000 (01:30 -0700)]
ipv6: add include file to suppress sparse warnings

commit f88c91ddba95 ("ipv6: statically link
register_inet6addr_notifier()" added following sparse warnings :

net/ipv6/addrconf_core.c:83:5: warning: symbol
'register_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:89:5: warning: symbol
'unregister_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:95:5: warning: symbol
'inet6addr_notifier_call_chain' was not declared. Should it be static?

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: remove invalid __rcu annotation
Eric Dumazet [Tue, 25 Jun 2013 08:21:06 +0000 (01:21 -0700)]
tcp: remove invalid __rcu annotation

struct tcp_fastopen_context has a field named tfm, which is a pointer
to a crypto_cipher structure.

It currently has a __rcu annotation, which is not needed at all.

tcp_fastopen_ctx is the pointer fetched by rcu_dereference(), but once
we have a pointer to current tcp_fastopen_context, we do not use/need
rcu_dereference() to access tfm.

This fixes a lot of sparse errors like the following :

net/ipv4/tcp_fastopen.c:21:31: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_fastopen.c:21:31:    expected struct crypto_cipher *tfm
net/ipv4/tcp_fastopen.c:21:31:    got struct crypto_cipher [noderef] <asn:4>*tfm

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopacket: nlmon: virtual netlink monitoring device for packet sockets
Daniel Borkmann [Fri, 21 Jun 2013 17:38:08 +0000 (19:38 +0200)]
packet: nlmon: virtual netlink monitoring device for packet sockets

Currently, there is no good possibility to debug netlink traffic that
is being exchanged between kernel and user space. Therefore, this patch
implements a netlink virtual device, so that netlink messages will be
made visible to PF_PACKET sockets. Once there was an approach with a
similar idea [1], but it got forgotten somehow.

I think it makes most sense to accept the "overhead" of an extra netlink
net device over implementing the same functionality from PF_PACKET
sockets once again into netlink sockets. We have BPF filters that can
already be easily applied which even have netlink extensions, we have
RX_RING zero-copy between kernel- and user space that can be reused,
and much more features. So instead of re-implementing all of this, we
simply pass the skb to a given PF_PACKET socket for further analysis.

Another nice benefit that comes from that is that no code needs to be
changed in user space packet analyzers (maybe adding a dissector, but
not more), thus out of the box, we can already capture pcap files of
netlink traffic to debug/troubleshoot netlink problems.

Also thanks goes to Thomas Graf, Flavio Leitner, Jesper Dangaard Brouer.

 [1] http://marc.info/?l=linux-netdev&m=113813401516110

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: netlink: virtual tap device management
Daniel Borkmann [Fri, 21 Jun 2013 17:38:07 +0000 (19:38 +0200)]
net: netlink: virtual tap device management

Similarly to the networking receive path with ptype_all taps, we add
the possibility to register netdevices that are for ARPHRD_NETLINK to
the netlink subsystem, so that those can be used for netlink analyzers
resp. debuggers. We do not offer a direct callback function as out-of-tree
modules could do crap with it. Instead, a netdevice must be registered
properly and only receives a clone, managed by the netlink layer. Symbols
are exported as GPL-only.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: if_arp: add ARPHRD_NETLINK type
Daniel Borkmann [Fri, 21 Jun 2013 17:38:06 +0000 (19:38 +0200)]
net: if_arp: add ARPHRD_NETLINK type

This small patch adds the definition of ARPHRD_NETLINK which can for
example be used by netlink monitoring devices as device type. So that
sockaddr_ll can pick it up and based on that choose the correct packet
dissector.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Restore unintentional reverts.
David S. Miller [Mon, 24 Jun 2013 19:43:40 +0000 (12:43 -0700)]
net: Restore unintentional reverts.

This restores commits:

c573972c111eb4c6b3f3250ad71e7c75cc799833
1a5904342c7380ceddd61c0b37544d752d0b1433
da2e2c214953f37c2a6be20226537ca5a329724c

which initially accidently went into 'net', were
reverted there, and then properly placed into 'net-next'.
But the next net --> net-next merge accidently wiped them
out again.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosfc: Improve test for IOMMU in use
Ben Hutchings [Wed, 12 Jun 2013 17:09:08 +0000 (18:09 +0100)]
sfc: Improve test for IOMMU in use

The device::iommu_group field may be set even if no IOMMU is in use.
iommu_present() is still a better indicator, although it doesn't tell
us whether *our* device is affected.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Fix IRQ cleanup in case of a probe failure
Ben Hutchings [Wed, 22 May 2013 17:03:35 +0000 (18:03 +0100)]
sfc: Fix IRQ cleanup in case of a probe failure

The lifetime of an irq_cpu_rmap is odd: we have to allocate it before
installing IRQ handlers and free it before removing the IRQ handlers.
As a result of this asymmetry, it was omitted from some failure paths.

On another failure path, we could try to remove IRQ handlers we
had not yet installed.

Move the irq_cpu_rmap allocation and freeing alongside IRQ handler
installation and removal, in efx_nic_{init,fini}_interrupts().
Count the number of IRQ handlers successfully installed and only
remove those on the failure path.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Do not pass non-TCP packets into GRO code
Ben Hutchings [Thu, 16 May 2013 17:38:13 +0000 (18:38 +0100)]
sfc: Do not pass non-TCP packets into GRO code

GRO can handle non-TCP packets and pass them up without coalescing,
but it has to do some extra work to parse the packet which we can
bypass using the hardware parse result.  (This condition yields a
false negative for TCP/IPv6 packets received by Falcon, but its
performance is already poor in that case due to lack of checksum
offload.)

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Define and set RX buffer flag for packets parsed as TCP
Ben Hutchings [Thu, 16 May 2013 17:38:11 +0000 (18:38 +0100)]
sfc: Define and set RX buffer flag for packets parsed as TCP

This will be useful for shortcutting some software packet parsing.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Enable accelerated RFS on vlans
Andy Lutomirski [Fri, 10 May 2013 23:51:33 +0000 (16:51 -0700)]
sfc: Enable accelerated RFS on vlans

As far as I know, the hardware doesn't support matching on both IP
fields and vlan tag, but it can at least match on the IP fields.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Report software timestamping capabilities
Ben Hutchings [Mon, 8 Apr 2013 16:34:58 +0000 (17:34 +0100)]
sfc: Report software timestamping capabilities

The kernel can generate software receive timestamps and we should
report those for all ports regardless of hardware capabilities.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Increase size of RX SKB header area
Jon Cooper [Mon, 8 Apr 2013 11:55:58 +0000 (12:55 +0100)]
sfc: Increase size of RX SKB header area

This allows the SKB to hold the headers without reallocation more often.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Enable RX checksum offload for packets not handled by GRO
Jon Cooper [Mon, 8 Apr 2013 11:49:48 +0000 (12:49 +0100)]
sfc: Enable RX checksum offload for packets not handled by GRO

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Fix EEH with legacy interrupts.
Alexandre Rames [Thu, 21 Mar 2013 16:41:43 +0000 (16:41 +0000)]
sfc: Fix EEH with legacy interrupts.

PCI legacy interrupts are level-triggered, and we cannot mask them up
on an isolated device.  Instead, disable the IRQ at the controller
until we have recovered.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agobnx2x: Fix compilation with no IOV support
Yuval Mintz [Mon, 24 Jun 2013 08:04:10 +0000 (11:04 +0300)]
bnx2x: Fix compilation with no IOV support

This fixes an issue caused by submit 78c3bcc5d1af64f51d9f30b0f5a2d1985bf69734
`bnx2x: Improve PF behaviour toward VF', which made the bnx2x driver fail
compilation when PCI_IOV is not set.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Unmap fragment page once iterator is done
Wedson Almeida Filho [Mon, 24 Jun 2013 06:33:48 +0000 (23:33 -0700)]
net: Unmap fragment page once iterator is done

Callers of skb_seq_read() are currently forced to call skb_abort_seq_read()
even when consuming all the data because the last call to skb_seq_read (the
one that returns 0 to indicate the end) fails to unmap the last fragment page.

With this patch callers will be allowed to traverse the SKB data by calling
skb_prepare_seq_read() once and repeatedly calling skb_seq_read() as originally
intended (and documented in the original commit 677e90eda), that is, only call
skb_abort_seq_read() if the sequential read is actually aborted.

Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Mon, 24 Jun 2013 07:31:02 +0000 (00:31 -0700)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
I would guess that this is the last big wireless pull request before
the 3.11 merge window...

Regarding the mac80211 bits, Johannes says:

"I have a number of mesh fixes and improvements from Colleen, Jacob,
Ashok and Thomas, powersave fixes in mac80211 from Alex, improved
management-TX from Antonio, and a few various things, including locking
fixes, from others and myself. Overall though, nothing really stands
out."

As for the iwlwifi bits, Johannes says:

"Emmanuel contributed two AP mode fixes, removed an unused field, fixed a
comment and added a warning for something that shouldn't happen in
practice, and I removed the declaration of a function that doesn't even
exist and cleaned up a small include."

"This time I have a number of cleanups, a small fix from Emmanuel and two
performance improvements that combined reduce our driver's CPU
utilisation as much as 75% in high TX-throughput scenarios."

"These two patches fix two issues with using rfkill randomly during
traffic, which would then cause our driver to stop working and not be
able to recover at all."

Regarding the ath6kl bits, Kalle says:

"Here are few simple patches for ath6kl. We have a suspend crash fix for
USB from Shafi, use of mac_pton(), a compiler warning fix and a fix for
module initialisation error path."

Kalle also sends the biggest single item of note, the new ath10k
driver for Qualcomm Atheros 802.11ac CQA98xx devices.

Included is an NFC pull, of which Samuel says:

"These are the pending NFC patches for the 3.11 merge window.

It contains the pending fixes that were on nfc-fixes (nfc-fixes-3.10-2),
along with a few more for the pn544 and pn533 drivers, the LLCP
disconnection path and an LLCP memory leak.

Highlights for this one are:

- An initial secure element API. NFC chipsets can carry an embedded
  secure element or get access to the SIM one. In both cases they
  control the secure elements and this API provides a way to discover,
  enable and disable the available SEs. It also exports that to
  userspace in order for SE focused middleware to actually do something
  with them (e.g. payments).

- NCI over SPI support. SPI is the most complex NCI specified transport
  layer and we now have support for it in the kernel. The next step will
  be to implement drivers for NCI chipsets using this transport like
  e.g. bcm2079x.

- NFC p2p hardware simulation driver. We now have an nfcsim driver that
  is mostly a loopback device between 2 NFC interfaces. It also
  implements the rest of the NFC core API like polling and target
  detection. This driver, with neard running on top of it, allows us to
  completely test the LLCP, SNEP and Handover implementation without
  physical hardware.

- A Firmware update netlink API. Most (All ?) HCI chipsets have a
  special firmware update mode where applications can push a new
  firmware that will be flashed. We now have a netlink API for providing
  that mode to e.g. nfctool."

On top of all that, there are a variety of updates to brcmfmac,
iwlegacy, rtlwifi, wil6210, and the TI wl12xx drivers.  As usual,
the bcma and ssb busses get a little love as well, as do a handful
of others here and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoxen-netback: double free on unload
Dan Carpenter [Fri, 21 Jun 2013 06:20:08 +0000 (09:20 +0300)]
xen-netback: double free on unload

There is a typo here, "i" vs "j", so we would crash on module_exit().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoip_tunnel: Protect tunnel functions with CONFIG_INET guard.
Jesse Gross [Fri, 21 Jun 2013 23:17:11 +0000 (16:17 -0700)]
ip_tunnel: Protect tunnel functions with CONFIG_INET guard.

Tunnel constants can be used in generic code but in these cases
the inline functions in ip_tunnels.h cause compilation problems
if CONFIG_INET is not set.

CC: Pravin Shelar <pshelar@nicira.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Use correct config guard.
Pravin B Shelar [Thu, 20 Jun 2013 22:08:14 +0000 (15:08 -0700)]
openvswitch: Use correct config guard.

This bug was introduced by commit aa310701e787087
(openvswitch: Add gre tunnel support.)

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobridge: fix a typo in comments
Cong Wang [Fri, 21 Jun 2013 07:37:25 +0000 (15:37 +0800)]
bridge: fix a typo in comments

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: allow large number of tx queues
Eric Dumazet [Thu, 20 Jun 2013 08:15:51 +0000 (01:15 -0700)]
net: allow large number of tx queues

netif_alloc_netdev_queues() uses kcalloc() to allocate memory
for the "struct netdev_queue *_tx" array.

For large number of tx queues, kcalloc() might fail, so this
patch does a fallback to vzalloc().

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'bnx2x'
David S. Miller [Mon, 24 Jun 2013 06:54:23 +0000 (23:54 -0700)]
Merge branch 'bnx2x'

Yuval Mintz says:

====================
This patch series mostly revolves around improving SR-IOV implementation
(Better PF-VF relation, sanity checks and timings), as well as including
a patch correcting the (outward) advertisement of 20G capabilities.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Fix 20G KR2 support claims
Yaniv Rosner [Thu, 20 Jun 2013 14:39:11 +0000 (17:39 +0300)]
bnx2x: Fix 20G KR2 support claims

Don't claim 20G is supported if the speed is unsupported by the phys
(reflected by various ethtools and ndos).

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: improve VF timings
Ariel Elior [Thu, 20 Jun 2013 14:39:10 +0000 (17:39 +0300)]
bnx2x: improve VF timings

Wait 100ms for FLR to complete in parallel over all VFs instead of serializing
the waits (which can amount to several seconds with 64 VFs).

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: VF ndo sanity
Ariel Elior [Thu, 20 Jun 2013 14:39:09 +0000 (17:39 +0300)]
bnx2x: VF ndo sanity

If iproute2 VF callbacks are invoked before PF is loaded,
abort gracefully.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: Improve PF behaviour toward VF
Ariel Elior [Thu, 20 Jun 2013 14:39:08 +0000 (17:39 +0300)]
bnx2x: Improve PF behaviour toward VF

If PF is unloaded with loaded VFs, signal towards VFs so they can detect
this gracefully.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
----
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h       |  2 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |  3 +++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  | 23 +++++++++++++++++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 24 ++++++++++++++++++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h |  2 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c  | 12 +++++++++++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h  |  5 ++++-
 7 files changed, 63 insertions(+), 8 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVSOCK: Fix VSOCK_HASH and VSOCK_CONN_HASH
Asias He [Thu, 20 Jun 2013 09:20:33 +0000 (17:20 +0800)]
VSOCK: Fix VSOCK_HASH and VSOCK_CONN_HASH

If we mod with VSOCK_HASH_SIZE -1, we get 0, 1, .... 249.  Actually, we
have vsock_bind_table[0 ... 250] and vsock_connected_table[0 .. 250].
In this case the last entry will never be used.

We should mod with VSOCK_HASH_SIZE instead.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVSOCK: Remove unnecessary label
Asias He [Thu, 20 Jun 2013 09:20:32 +0000 (17:20 +0800)]
VSOCK: Remove unnecessary label

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVSOCK: Return VMCI_ERROR_NO_MEM when fails to allocate skb
Asias He [Thu, 20 Jun 2013 09:20:31 +0000 (17:20 +0800)]
VSOCK: Return VMCI_ERROR_NO_MEM when fails to allocate skb

vmci_transport_recv_dgram_cb always return VMCI_SUCESS even if we fail
to allocate skb, return VMCI_ERROR_NO_MEM instead.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVSOCK: Introduce vsock_auto_bind helper
Asias He [Thu, 20 Jun 2013 09:20:30 +0000 (17:20 +0800)]
VSOCK: Introduce vsock_auto_bind helper

This peace of code is called three times, let's have a helper for it.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: remove a useless pr_info() in addrconf_gre_config()
Cong Wang [Thu, 20 Jun 2013 08:30:00 +0000 (16:30 +0800)]
ipv6: remove a useless pr_info() in addrconf_gre_config()

This is debug info, should at least be pr_debug(), but given
that this code is in upstream for two years, there is no
need to keep this debugging printk any more, so just remove it.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Update version to 5.2.44
Jitendra Kalsaria [Sat, 22 Jun 2013 08:12:07 +0000 (04:12 -0400)]
qlcnic: Update version to 5.2.44

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Add support for 83xx suspend and resume.
Rajesh Borundia [Sat, 22 Jun 2013 08:12:06 +0000 (04:12 -0400)]
qlcnic: Add support for 83xx suspend and resume.

o Implement shutdown and resume handlers for 83xx.
o Refactor 82xx shutdown and resume handlers.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Add support for 'set driver version' in 83XX
Pratik Pujar [Sat, 22 Jun 2013 08:12:05 +0000 (04:12 -0400)]
qlcnic: Add support for 'set driver version' in 83XX

Issue 'set driver version' during driver load and after reset recovery
to notify the driver version to the firmware.

Signed-off-by: Pratik Pujar <pratik.pujar@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Cleanup of structure qlcnic_hardware_context
Pratik Pujar [Sat, 22 Jun 2013 08:12:04 +0000 (04:12 -0400)]
qlcnic: Cleanup of structure qlcnic_hardware_context

Signed-off-by: Pratik Pujar <pratik.pujar@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Add support for PEX DMA method to read memory section of adapter dump
Shahed Shaikh [Sat, 22 Jun 2013 08:12:03 +0000 (04:12 -0400)]
qlcnic: Add support for PEX DMA method to read memory section of adapter dump

This patch adds support to read memory section of adapter
dump using PEX DMA method. This method significantly improves
total adapter dump collection time.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Minimize sleep duration within loopback diagnostic test.
Jitendra Kalsaria [Sat, 22 Jun 2013 08:12:02 +0000 (04:12 -0400)]
qlcnic: Minimize sleep duration within loopback diagnostic test.

o Minimize sleep duration and check for adapter status.
o Exit from loopback test if adapter reset is detected.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Secondary unicast MAC address support.
Jitendra Kalsaria [Sat, 22 Jun 2013 08:12:01 +0000 (04:12 -0400)]
qlcnic: Secondary unicast MAC address support.

Add support for configuring secondary unicast address which
will use existing HW filters to store all the unicast MAC
addresses and prevent device going into promiscuous mode.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Handle qlcnic_alloc_mbx_args() failure
Shahed Shaikh [Sat, 22 Jun 2013 08:12:00 +0000 (04:12 -0400)]
qlcnic: Handle qlcnic_alloc_mbx_args() failure

qlcnic_alloc_mbx_args() may fail due to failure in memory allocation.
This patch checks for failure of qlcnic_alloc_mbx_args() to avoid
potential invalid memory access.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Fri, 21 Jun 2013 19:42:30 +0000 (15:42 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem

Conflicts:
net/wireless/nl80211.c

11 years agondisc: Convert use of typedef ctl_table to struct ctl_table
Joe Perches [Fri, 14 Jun 2013 02:37:54 +0000 (19:37 -0700)]
ndisc: Convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: Convert use of typedef ctl_table to struct ctl_table
Joe Perches [Fri, 14 Jun 2013 02:37:53 +0000 (19:37 -0700)]
ipv6: Convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoinet: frag , remove an empty ifdef.
Rami Rosen [Sat, 15 Jun 2013 20:04:56 +0000 (23:04 +0300)]
inet: frag , remove an empty ifdef.

This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.

commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agohtb: refactor struct htb_sched fields for performance
Eric Dumazet [Sat, 15 Jun 2013 10:30:10 +0000 (03:30 -0700)]
htb: refactor struct htb_sched fields for performance

htb_sched structures are big, and source of false sharing on SMP.

Every time a packet is queued or dequeue, many cache lines must be
touched because structures are not lay out properly.

By carefully splitting htb_sched in two parts, and define sub structures
to increase data locality, we can improve performance dramatically on
SMP.

New htb_prio structure can also be used in htb_class to increase data
locality.

I got 26 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: introduce a per-route knob for quick ack
Cong Wang [Sat, 15 Jun 2013 01:39:18 +0000 (09:39 +0800)]
tcp: introduce a per-route knob for quick ack

In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:

"ACKS might also be delayed because of bidirectional
traffic, and is more controlled by the application
response time. TCP stack can not easily estimate it."

"ACK can be incredibly useful to recover from losses in
a short time.

The vast majority of TCP sessions are small lived, and we
send one ACK per received segment anyway at beginning or
retransmits to let the sender smoothly increase its cwnd,
so an auto-tuning facility wont help them that much."

and according to David:

"ACKs are the only information we have to detect loss.

And, for the same reasons that TCP VEGAS is fundamentally
broken, we cannot measure the pipe or some other
receiver-side-visible piece of information to determine
when it's "safe" to stretch ACK.

And even if it's "safe", we should not do it so that losses are
accurately detected and we don't spuriously retransmit.

The only way to know when the bandwidth increases is to
"test" it, by sending more and more packets until drops happen.
That's why all successful congestion control algorithms must
operate on explicited tested pieces of information.

Similarly, it's not really possible to universally know if
it's safe to stretch ACK or not."

It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.

Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosctp: Convert __list_for_each use to list_for_each
Dave Jones [Tue, 18 Jun 2013 02:26:52 +0000 (22:26 -0400)]
sctp: Convert __list_for_each use to list_for_each

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>