]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
10 years agoaf_packet: Add Queue mapping mode to af_packet fanout operation
Neil Horman [Wed, 22 Jan 2014 21:01:44 +0000 (16:01 -0500)]
af_packet: Add Queue mapping mode to af_packet fanout operation

This patch adds a queue mapping mode to the fanout operation of af_packet
sockets.  This allows user space af_packet users to better filter on flows
ingressing and egressing via a specific hardware queue, and avoids the potential
packet reordering that can occur when FANOUT_CPU is being used and irq affinity
varies.

Tested successfully by myself.  applies to net-next

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'bonding_option_api'
David S. Miller [Wed, 22 Jan 2014 23:39:18 +0000 (15:39 -0800)]
Merge branch 'bonding_option_api'

Nikolay Aleksandrov says:

====================
bonding: introduce new option API

This patchset's goal is to introduce a new option API which should be used
to properly describe the bonding options with their mode dependcies and
requirements. With this patchset applied we get centralized option
manipulation, automatic RTNL acquire per option setting, automatic option
range checking, mode dependcy checking and other various flags which are
described in detail in patch 01's commit message and comments.
Also the parameter passing is changed to use a specialized structure which
is initialized to a value depending on the needs.
The main exported functions are:
 __bond_opt_set() - set an option (RTNL should be acquired prior)
 bond_opt_init(val|str) - init a bond_opt_value struct for value or string
                          parameter passing
 bond_opt_tryset_rtnl() - function which tries to acquire rtnl, mainly used
                          for sysfs
 bond_opt_parse - used to parse or check for valid values
 bond_opt_get - retrieve a pointer to bond_option struct for some option
 bond_opt_get_val - retrieve a pointer to a bond_opt_value struct for
                    some value

The same functions are used to set an option via sysfs and netlink, just
the parameter that's passed is usually initialized in a different way.
The converted options have multiple style fixes, there're some longer
lines but they looked either ugly or were strings/pr_warnings, if you
think some line would be better broken just let me know :-) there're
also a few sscanf false-positive warnings.
I decided to keep the "unsuppmodes" way of mode dep checking since it's
straight forward, if we make a more general way for checking dependencies
it'll be easy to change it.

Future plans for this work include:
 - Automatic sysfs generation from the bond_opts[].
 - Use of the API in bond_check_params() and thus cleaning it up (this has
   actually started, I'll take care of the rest in a separate patch)
 - Clean up all option-unrelated files of option definitions and functions

I've tried to leave as much documentation as possible, if there's anything
unclear please let me know. One more thing, I haven't moved all
option-related functions from bonding.h to the new bond_options.h, this
will be done in a separate patch, it's in my todo list.

This patchset has been tested by setting each converted option via sysfs
and netlink to a couple of wrong values, a couple of correct values and
some random values, also for the opts that have flags they have been
tested as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert slaves to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:40 +0000 (14:53 +0100)]
bonding: convert slaves to use the new option API

This patch adds the necessary changes so slaves would use
the new bonding option API. Also move the option to its own set function
in bond_options.c and fix some style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert lp_interval to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:39 +0000 (14:53 +0100)]
bonding: convert lp_interval to use the new option API

This patch adds the necessary changes so lp_interval would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert resend_igmp to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:38 +0000 (14:53 +0100)]
bonding: convert resend_igmp to use the new option API

This patch adds the necessary changes so resend_igmp would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert all_slaves_active to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:37 +0000 (14:53 +0100)]
bonding: convert all_slaves_active to use the new option API

This patch adds the necessary changes so all_slaves_active would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert queue_id to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:36 +0000 (14:53 +0100)]
bonding: convert queue_id to use the new option API

This patch adds the necessary changes so queue_id would use
the new bonding option API. Also move it to its own set function in
bond_options.c and fix some style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert active_slave to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:35 +0000 (14:53 +0100)]
bonding: convert active_slave to use the new option API

This patch adds the necessary changes so active_slave would use
the new bonding option API. Also some trivial/style fixes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert use_carrier to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:34 +0000 (14:53 +0100)]
bonding: convert use_carrier to use the new option API

This patch adds the necessary changes so use_carrier would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert primary_reselect to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:33 +0000 (14:53 +0100)]
bonding: convert primary_reselect to use the new option API

This patch adds the necessary changes so primary_reselect would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert primary to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:32 +0000 (14:53 +0100)]
bonding: convert primary to use the new option API

This patch adds the necessary changes so primary would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert miimon to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:31 +0000 (14:53 +0100)]
bonding: convert miimon to use the new option API

This patch adds the necessary changes so miimon would use
the new bonding option API. The "default" definition has been removed as
it was 0.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert num_peer_notif to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:30 +0000 (14:53 +0100)]
bonding: convert num_peer_notif to use the new option API

This patch adds the necessary changes so num_peer_notif would use
the new bonding option API.
When the auto-sysfs generation is done an alias should be added for
this option as there're currently 2 entries in sysfs for it.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert ad_select to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:29 +0000 (14:53 +0100)]
bonding: convert ad_select to use the new option API

This patch adds the necessary changes so ad_select would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert min_links to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:28 +0000 (14:53 +0100)]
bonding: convert min_links to use the new option API

This patch adds the necessary changes so min_links would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert lacp_rate to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:27 +0000 (14:53 +0100)]
bonding: convert lacp_rate to use the new option API

This patch adds the necessary changes so lacp_rate would use
the new bonding option API. Also some trivial/style error fixes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert updelay to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:26 +0000 (14:53 +0100)]
bonding: convert updelay to use the new option API

This patch adds the necessary changes so updelay would use
the new bonding option API. Also some trivial style fixes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert downdelay to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:25 +0000 (14:53 +0100)]
bonding: convert downdelay to use the new option API

This patch adds the necessary changes so downdelay would use
the new bonding option API. Also some trivial style fixes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert arp_ip_target to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:24 +0000 (14:53 +0100)]
bonding: convert arp_ip_target to use the new option API

This patch adds the necessary changes so arp_ip_target would use
the new bonding option API. This option is an exception because of
the way it's currently implemented that's why its netlink code is
a bit different from the other options to keep the functionality as
before and at the same time to have a single set function.

This patch also fixes a few stylistic errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert arp_interval to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:23 +0000 (14:53 +0100)]
bonding: convert arp_interval to use the new option API

This patch adds the necessary changes so arp_interval would use
the new bonding option API. The "default" definition has been removed as
it was 0.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert fail_over_mac to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:22 +0000 (14:53 +0100)]
bonding: convert fail_over_mac to use the new option API

This patch adds the necessary changes so fail_over_mac would use
the new bonding option API. Also fixes a trivial copy/paste error in
bond_check_params where the wrong variable was used for the error msg.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert arp_all_targets to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:21 +0000 (14:53 +0100)]
bonding: convert arp_all_targets to use the new option API

This patch adds the necessary changes so arp_all_targets would use the
new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert arp_validate to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:20 +0000 (14:53 +0100)]
bonding: convert arp_validate to use the new option API

This patch adds the necessary changes so arp_validate would use the
new bonding option API. Also fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert xmit_hash_policy to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:19 +0000 (14:53 +0100)]
bonding: convert xmit_hash_policy to use the new option API

This patch adds the necessary changes so xmit_hash_policy would use the
new bonding option API. Also fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert packets_per_slave to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:18 +0000 (14:53 +0100)]
bonding: convert packets_per_slave to use the new option API

This patch adds the necessary changes so packets_per_slave would use the
new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: convert mode setting to use the new option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:17 +0000 (14:53 +0100)]
bonding: convert mode setting to use the new option API

This patch makes the bond's mode setting use the new option API and
adds support for dependency printing which relies on having an entry for
the mode option in the bond_opts[] array.
Also add the ability to print the mode name when mode dependency fails
and fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobonding: add infrastructure for an option API
Nikolay Aleksandrov [Wed, 22 Jan 2014 13:53:16 +0000 (14:53 +0100)]
bonding: add infrastructure for an option API

This patch adds the necessary basic infrastructure to support
centralized and unified option manipulation API for the bonding. The new
structure bond_option will be used to describe each option with its
dependencies on modes which will be checked automatically thus removing a
lot of duplicated code. Also automatic range checking is added for
some options. Currently the option setting function requires RTNL to
be acquired prior to calling it, since many options already rely on RTNL
it seemed like the best choice to protect all against common race
conditions.
In order to add an option the following steps need to be done:
1. Add an entry BOND_OPT_<option> to bond_options.h so it gets a unique id
   and a bit corresponding to the id
2. Add a bond_option entry to the bond_opts[] array in bond_options.c which
   describes the option, its dependencies and its manipulation function
3. Add code to export the option through sysfs and/or as a module parameter
   (the sysfs export will be made automatically in the future)

The options can have different flags set, currently the following are
supported:
BOND_OPTFLAG_NOSLAVES - require that the bond device has no slaves prior
                        to setting the option
BOND_OPTFLAG_IFDOWN - require that the bond device is down prior to
                      setting the option
BOND_OPTFLAG_RAWVAL - don't parse the value but return it raw for the
                      option to parse

There's a new value structure to describe different types of values
which can have the following flags:
BOND_VALFLAG_DEFAULT - marks the default option (permanent string alias
                       to this option is "default")
BOND_VALFLAG_MIN - the minimum value that this option can have
BOND_VALFLAG_MAX - the maximum value that this option can have

An example would be nice here, so if we have an option which can have
the values "off"(2), "special"(4, default) and supports a range, say
16 - 32, it should be defined as follows:
"off", 2,
"special", 4, BOND_VALFLAG_DEFAULT,
"rangemin", 16, BOND_VALFLAG_MIN,
"rangemax", 32, BOND_VALFLAG_MAX
So we have the valid intervals: [2, 2], [4, 4], [16, 32]
Also the valid strings: "off" = 2, "special" and "default" = 4
                        "rangemin" = 16, "rangemax" = 32

BOND_VALFLAG_(MIN|MAX) can be used to specify a valid range for an
option, if MIN is omitted then 0 is considered as a minimum. If an
exact match is found in the values[] table it will be returned,
otherwise the range is tried (if available).

The option parameter passing is done by using a special structure called
bond_opt_value which can take either a string or a value to parse. One
of the bond_opt_init(val|str) macros should be used depending on which
one does the user want to parse (string or value). Then a call to
__bond_opt_set should be done under RTNL.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'reciprocal'
David S. Miller [Wed, 22 Jan 2014 07:19:26 +0000 (23:19 -0800)]
Merge branch 'reciprocal'

Hannes Frederic Sowa says:

====================
reciprocal_divide update

This patch is on top of aee636c4809fa5 ("bpf: do not use reciprocal
divide") from Eric that sits in net tree. It will not create a merge
conflict, but it depends on this one, so we suggest, if possible, to
merge net into net-next.

We are proposing this change with only small modifications from the
v2 version, namely updating the name of trim to reciprocal_scale
(as commented on by Ben Hutchings and Eric Dumazet, thanks!).

We thought about introducing the reciprocal_divide algorithm in
parallel to the one already used by the kernel but faced organizational
issues, leading us to the conclusion that it is best to just replace
the old one: We could not come up with names for the different
implementations and also with a way to describe the differences to
guide developers which one to choose in which situation. This is
because we cannot specify the correct semantics for the version
which is currently used by the kernel. Altough it seems to not be
causing problems in the kernel, we cannot surely say so in the
case of flex_array for the future. Current usage seems ok, but
future users could run into problems.

Changelog:

v1->v2:
 - changed name to prandom_u32_max in p1
 - changed name to trim in p2
 - reworked code in p3
v2->v3:
 - p1 and p3 stays unchanged, only small update in commit
   message in p3
 - changed name to reciprocal_scale in p2
 - fixed kernel doc format
v3->v4:
 - pseduo -> pseudo (thanks to Tilman Schmidt)
v4->v5:
 - fix pseduo -> pseudo for real now, sorry for the noise
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoreciprocal_divide: update/correction of the algorithm
Hannes Frederic Sowa [Wed, 22 Jan 2014 01:29:41 +0000 (02:29 +0100)]
reciprocal_divide: update/correction of the algorithm

Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c4809fa5
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d8d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: introduce reciprocal_scale helper and convert users
Daniel Borkmann [Wed, 22 Jan 2014 01:29:40 +0000 (02:29 +0100)]
net: introduce reciprocal_scale helper and convert users

As David Laight suggests, we shouldn't necessarily call this
reciprocal_divide() when users didn't requested a reciprocal_value();
lets keep the basic idea and call it reciprocal_scale(). More
background information on this topic can be found in [1].

Joint work with Hannes Frederic Sowa.

  [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html

Suggested-by: David Laight <david.laight@aculab.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agorandom32: add prandom_u32_max and convert open coded users
Daniel Borkmann [Wed, 22 Jan 2014 01:29:39 +0000 (02:29 +0100)]
random32: add prandom_u32_max and convert open coded users

Many functions have open coded a function that returns a random
number in range [0,N-1]. Under the assumption that we have a PRNG
such as taus113 with being well distributed in [0, ~0U] space,
we can implement such a function as uword t = (n*m')>>32, where
m' is a random number obtained from PRNG, n the right open interval
border and t our resulting random number, with n,m',t in u32 universe.

Lets go with Joe and simply call it prandom_u32_max(), although
technically we have an right open interval endpoint, but that we
have documented. Other users can further be migrated to the new
prandom_u32_max() function later on; for now, we need to make sure
to migrate reciprocal_divide() users for the reciprocal_divide()
follow-up fixup since their function signatures are going to change.

Joint work with Hannes Frederic Sowa.

Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Remove unnecessary validation for port number
Moni Shoua [Tue, 21 Jan 2014 08:19:37 +0000 (10:19 +0200)]
net/mlx4_core: Remove unnecessary validation for port number

This is a fix to a regression introduced by commit:
"982290a net/mlx4_core: Check port number for validity
before accessing data"

IPoIB could not attach to multicast group and we get this in dmesg:
[144214.145008] ib0: failed to attach to multicast group, ret = -22
[144214.145016] ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
[144214.145019] ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22

The cause to the problem is because port is extracted from gid[5].
Which is only valid for Ethernet.
Removed this validation in mlx4_qp_attach_common(), which is accessed
from both Ethernet and IB flows.
Error flow for bad port value in Ethernet is already exists in that
function.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobe2net: Fix be_vlan_add/rem_vid() routines
Somnath Kotur [Tue, 21 Jan 2014 10:20:55 +0000 (15:50 +0530)]
be2net: Fix be_vlan_add/rem_vid() routines

The current logic to put interface into VLAN Promiscous mode is not correct.
We should increment "adapter->vlans_added" before calling be_vid_config().
Also removed some unwanted log messages.

Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobnx2x: Fix VF flr flow
Ariel Elior [Tue, 21 Jan 2014 08:31:20 +0000 (10:31 +0200)]
bnx2x: Fix VF flr flow

When a VF originating from a given PF is flr-ed, that PF gets an interrupt
from the chip management and takes a part in the flr process.

This patch fixes several corner cases in which the driver performs its part
of the flr flow out-of-order, causing the FW to assert due to badly timed
messages received from the driver.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Missing change from the ether_addr_copy() fixups.
David S. Miller [Wed, 22 Jan 2014 06:54:01 +0000 (22:54 -0800)]
net: Missing change from the ether_addr_copy() fixups.

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: filter: let bpf_tell_extensions return SKF_AD_MAX
Daniel Borkmann [Mon, 20 Jan 2014 23:19:37 +0000 (00:19 +0100)]
net: filter: let bpf_tell_extensions return SKF_AD_MAX

Michal Sekletar added in commit ea02f9411d9f ("net: introduce
SO_BPF_EXTENSIONS") a facility where user space can enquire
the BPF ancillary instruction set, which is imho a step into
the right direction for letting user space high-level to BPF
optimizers make an informed decision for possibly using these
extensions.

The original rationale was to return through a getsockopt(2)
a bitfield of which instructions are supported and which
are not, as of right now, we just return 0 to indicate a
base support for SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET.
Limitations of this approach are that this API which we need
to maintain for a long time can only support a maximum of 32
extensions, and needs to be additionally maintained/updated
when each new extension that comes in.

I thought about this a bit more and what we can do here to
overcome this is to just return SKF_AD_MAX. Since we never
remove any extension since we cannot break user space and
always linearly increase SKF_AD_MAX on each newly added
extension, user space can make a decision on what extensions
are supported in the whole set of extensions and which aren't,
by just checking which of them from the whole set have an
offset < SKF_AD_MAX of the underlying kernel.

Since SKF_AD_MAX must be updated each time we add new ones,
we don't need to introduce an additional enum and got
maintenance for free. At some point in time when
SO_BPF_EXTENSIONS becomes ubiquitous for most kernels, then
an application can simply make use of this and easily be run
on newer or older underlying kernels without needing to be
recompiled, of course. Since that is for 3.14, it's not too
late to do this change.

Cc: Michal Sekletar <msekleta@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Michal Sekletar <msekleta@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Fix some fallout from the etner_addr_copy() changes.
David S. Miller [Wed, 22 Jan 2014 02:55:22 +0000 (18:55 -0800)]
net: Fix some fallout from the etner_addr_copy() changes.

net/appletalk/aarp.c: In function ‘__aarp_send_query’:
net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]
 ...
net/atm/lec.c: In function ‘send_to_lecd’:
net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default]
In file included from net/atm/lec.c:17:0:
include/linux/etherdevice.h:227:20: note: expected ‘u8 *’ but argument is of type ‘unsigned char (*)[6]’
 ...
net/caif/caif_usb.c: In function ‘cfusbl_create’:
net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'sctp'
David S. Miller [Wed, 22 Jan 2014 02:41:46 +0000 (18:41 -0800)]
Merge branch 'sctp'

Wang Weidong says:

====================
sctp: remove some macro locking wrappers

In sctp.h we can find some macro locking wrappers. As Neil point out that:

"Its because in the origional implementation of the sctp protocol, there was a
user space test harness which built the kernel module for userspace execution to
cary our some unit testing on the code.  It did so by redefining some of those
locking macros to user space friendly code.  IIRC we haven't use those unit
tests in years, and so should be removing them, not adding them to other
locations."

So I remove them.
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove macros sctp_bh_[un]lock_sock
wangweidong [Tue, 21 Jan 2014 07:44:12 +0000 (15:44 +0800)]
sctp: remove macros sctp_bh_[un]lock_sock

Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user
space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove macros sctp_{lock|release}_sock
wangweidong [Tue, 21 Jan 2014 07:44:11 +0000 (15:44 +0800)]
sctp: remove macros sctp_{lock|release}_sock

Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove macros sctp_read_[un]lock
wangweidong [Tue, 21 Jan 2014 07:44:10 +0000 (15:44 +0800)]
sctp: remove macros sctp_read_[un]lock

Redefined read_[un]lock to sctp_read_[un]lock for user space
friendly code which we haven't use in years, and the macros
we never used, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove macros sctp_write_[un]_lock
wangweidong [Tue, 21 Jan 2014 07:44:09 +0000 (15:44 +0800)]
sctp: remove macros sctp_write_[un]_lock

Redefined write_[un]lock to sctp_write_[un]lock for user space
friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove macros sctp_spin_[un]lock
wangweidong [Tue, 21 Jan 2014 07:44:08 +0000 (15:44 +0800)]
sctp: remove macros sctp_spin_[un]lock

Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove macros sctp_local_bh_{disable|enable}
wangweidong [Tue, 21 Jan 2014 07:44:07 +0000 (15:44 +0800)]
sctp: remove macros sctp_local_bh_{disable|enable}

Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable}
for user space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: remove macros sctp_spin_[un]lock_irqrestore
wangweidong [Tue, 21 Jan 2014 07:44:06 +0000 (15:44 +0800)]
sctp: remove macros sctp_spin_[un]lock_irqrestore

Redefined spin_[un]lock_irqstore to sctp_spin_[un]lock_irqrestore for user
space friendly code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agodsa: Use ether_addr_copy
Joe Perches [Mon, 20 Jan 2014 17:52:20 +0000 (09:52 -0800)]
dsa: Use ether_addr_copy

Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agopktgen: Use ether_addr_copy
Joe Perches [Mon, 20 Jan 2014 17:52:19 +0000 (09:52 -0800)]
pktgen: Use ether_addr_copy

Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonetpoll: Use ether_addr_copy
Joe Perches [Mon, 20 Jan 2014 17:52:18 +0000 (09:52 -0800)]
netpoll: Use ether_addr_copy

Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agocaif_usb: Use ether_addr_copy
Joe Perches [Mon, 20 Jan 2014 17:52:17 +0000 (09:52 -0800)]
caif_usb: Use ether_addr_copy

Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoatm: Use ether_addr_copy
Joe Perches [Mon, 20 Jan 2014 17:52:16 +0000 (09:52 -0800)]
atm: Use ether_addr_copy

Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoappletalk: Use ether_addr_copy
Joe Perches [Mon, 20 Jan 2014 17:52:15 +0000 (09:52 -0800)]
appletalk: Use ether_addr_copy

Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Convert struct aarp_entry.hwaddr[6] to hwaddr[ETH_ALEN].

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years ago8021q: Use ether_addr_copy
Joe Perches [Mon, 20 Jan 2014 17:52:14 +0000 (09:52 -0800)]
8021q: Use ether_addr_copy

Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'gro_udp_encap'
David S. Miller [Wed, 22 Jan 2014 02:05:12 +0000 (18:05 -0800)]
Merge branch 'gro_udp_encap'

Or Gerlitz says:

====================
net: Add GRO support for UDP encapsulating protocols

This series adds GRO handlers for protocols that do UDP encapsulation, with the
intent of being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

The patches done against net-next 75e4364f67 "net: stmmac: fix NULL pointer
dereference in stmmac_get_tx_hwtstamp"

Or.

v4 --> v5 changes:
  - followed Eric's directives to avoid using atomic get/put ops on the
    udp gro receive and complete callbacks and instead keep the rcu_read_lock
    when calling the next handler on the chain.

v3 --> v4 changes:

  - applied feedback from Tom on some micro-optimizations that save
    branches and goto directives in the udp gro logic

 - applied feedback from Eric on correct RCU programming for the
   add/remove flow of the upper protocols udp gro handlers

v2 --> v3 changes:

 - moved to use linked list to store the udp gro handlers, this solves the
   problem of consuming 512KB of memory for the handlers.

 - use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
   on a packet, this solves the problem of udp encapsulated packets whose inner VM
   packet is udp and happen to carry a port which has registered offloads - and flush it.

 - invoke the udp offload protocol registration and de-registration from the vxlan driver
   in a sleepable context
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Add GRO support for vxlan traffic
Or Gerlitz [Mon, 20 Jan 2014 11:59:21 +0000 (13:59 +0200)]
net: Add GRO support for vxlan traffic

Add GRO handlers for vxlann, by using the UDP GRO infrastructure.

For single TCP session that goes through vxlan tunneling I got nice
improvement from 6.8Gbs to 11.5Gbs

--> UDP/VXLAN GRO disabled
$ netperf  -H 192.168.52.147 -c -C

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195

--> UDP/VXLAN GRO enabled

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Export gro_find_by_type helpers
Or Gerlitz [Mon, 20 Jan 2014 11:59:20 +0000 (13:59 +0200)]
net: Export gro_find_by_type helpers

Export the gro_find_receive/complete_by_type helpers to they can be invoked
by the gro callbacks of encapsulation protocols such as vxlan.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: Add GRO support for UDP encapsulating protocols
Or Gerlitz [Mon, 20 Jan 2014 11:59:19 +0000 (13:59 +0200)]
net: Add GRO support for UDP encapsulating protocols

Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
code twice on a packet. This solves the problem of udp encapsulated packets whose
inner VM packet is udp and happen to carry a port which has registered offloads.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agostmmac: Fix kernel crashes for jumbo frames
Vince Bridgers [Mon, 20 Jan 2014 11:39:01 +0000 (05:39 -0600)]
stmmac: Fix kernel crashes for jumbo frames

These changes correct the following issues with jumbo frames on the
stmmac driver:

1) The Synopsys EMAC can be configured to support different FIFO
sizes at core configuration time. There's no way to query the
controller and know the FIFO size, so the driver needs to get this
information from the device tree in order to know how to correctly
handle MTU changes and setting up dma buffers. The default
max-frame-size is as currently used, which is the size of a jumbo
frame.

2) The driver was enabling Jumbo frames by default, but was not allocating
dma buffers of sufficient size to handle the maximum possible packet
size that could be received. This led to memory corruption since DMAs were
occurring beyond the extent of the allocated receive buffers for certain types
of network traffic.

kernel BUG at net/core/skbuff.c:126!
Internal error: Oops - BUG: 0 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 563 Comm: sockperf Not tainted 3.13.0-rc6-01523-gf7111b9 #31
task: ef35e580 ti: ef252000 task.ti: ef252000
PC is at skb_panic+0x60/0x64
LR is at skb_panic+0x60/0x64
pc : [<c03c7c3c>]    lr : [<c03c7c3c>]    psr: 60000113
sp : ef253c18  ip : 60000113  fp : 00000000
r10: ef3a5400  r9 : 00000ebc  r8 : ef3a546c
r7 : ee59f000  r6 : ee59f084  r5 : ee59ff40  r4 : ee59f140
r3 : 000003e2  r2 : 00000007  r1 : c0b9c420  r0 : 0000007d
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 2e8ac04a  DAC: 00000015
Process sockperf (pid: 563, stack limit = 0xef252248)
Stack: (0xef253c18 to 0xef254000)
3c00:                                                       00000ebc ee59f000
3c20: ee59f084 ee59ff40 ee59f140 c04a9cd8 ee8c50c0 00000ebc ee59ff40 00000000
3c40: ee59f140 c02d0ef0 00000056 ef1eda80 ee8c50c0 00000ebc 22bbef29 c0318f8c
3c60: 00000056 ef3a547c ffe2c716 c02c9c90 c0ba1298 ef3a5838 ef3a5838 ef3a5400
3c80: 000020c0 ee573840 000055cb ef3f2050 c053f0e0 c0319214 22b9b085 22d92813
3ca0: 00001c80 004b8e00 ef3a5400 ee573840 ef3f2064 22d92813 ef3f2064 000055cb
3cc0: ef3f2050 c031a19c ef252000 00000000 00000000 c0561bc0 00000000 ff00ffff
3ce0: c05621c0 ef3a5400 ef3f2064 ee573840 00000020 ef3f2064 000055cb ef3f2050
3d00: c053f0e0 c031cad0 c053e740 00000e60 00000000 00000000 ee573840 ef3a5400
3d20: ef0a6e00 00000000 ef3f2064 c032507c 00010000 00000020 c0561bc0 c0561bc0
3d40: ee599850 c032799c 00000000 ee573840 c055a380 ef3a5400 00000000 ef3f2064
3d60: ef3f2050 c032799c 0101c7c0 2b6755cb c059a280 c030e4d8 000055cb ffffffff
3d80: ee574fc0 c055a380 ee574000 ee573840 00002b67 ee573840 c03fe9c4 c053fa68
3da0: c055a380 00001f6f 00000000 ee573840 c053f0e0 c0304fdc ef0a6e01 ef3f2050
3dc0: ee573858 ef031000 ee573840 c03055d8 c0ba0c40 ef000f40 00100100 c053f0dc
3de0: c053ffdc c053f0f0 00000008 00000000 ef031000 c02da948 00001140 00000000
3e00: c0563c78 ef253e5f 00000020 ee573840 00000020 c053f0f0 ef313400 ee573840
3e20: c053f0e0 00000000 00000000 c05380c0 ef313400 00001000 00000015 c02df280
3e40: ee574000 ef001e00 00000000 00001080 00000042 005cd980 ef031500 ef031500
3e60: 00000000 c02df824 ef031500 c053e390 c0541084 f00b1e00 c05925e8 c02df864
3e80: 00001f5c ef031440 c053e390 c0278524 00000002 00000000 c0b9eb48 c02df280
3ea0: ee8c7180 00000100 c0542ca8 00000015 00000040 ef031500 ef031500 ef031500
3ec0: c027803c ef252000 00000040 000000ec c05380c0 c0b9eb40 c0b9eb48 c02df940
3ee0: ef060780 ffffa4dd c0564a9c c056343c 002e80a8 00000080 ef031500 00000001
3f00: c053808c ef252000 fffec100 00000003 00000004 002e80a8 0000000c c00258f0
3f20: 002e80a8 c005e704 00000005 00000100 c05634d0 c0538080 c05333e0 00000000
3f40: 0000000a c0565580 c05380c0 ffffa4dc c05434f4 00400100 00000004 c0534cd4
3f60: 00000098 00000000 fffec100 002e80a8 00000004 002e80a8 002a20e0 c0025da8
3f80: c0534cd4 c000f020 fffec10c c053ea60 ef253fb0 c0008530 0000ffe2 b6ef67f4
3fa0: 40000010 ffffffff 00000124 c0012f3c 0000ffe2 002e80f0 0000ffe2 00004000
3fc0: becb6338 becb6334 00000004 00000124 002e80a8 00000004 002e80a8 002a20e0
3fe0: becb6300 becb62f4 002773bb b6ef67f4 40000010 ffffffff 00000000 00000000
[<c03c7c3c>] (skb_panic+0x60/0x64) from [<c02d0ef0>] (skb_put+0x4c/0x50)
[<c02d0ef0>] (skb_put+0x4c/0x50) from [<c0318f8c>] (tcp_collapse+0x314/0x3ec)
[<c0318f8c>] (tcp_collapse+0x314/0x3ec) from [<c0319214>]
(tcp_try_rmem_schedule+0x1b0/0x3c4)
[<c0319214>] (tcp_try_rmem_schedule+0x1b0/0x3c4) from [<c031a19c>]
(tcp_data_queue+0x480/0xe6c)
[<c031a19c>] (tcp_data_queue+0x480/0xe6c) from [<c031cad0>]
(tcp_rcv_established+0x180/0x62c)
[<c031cad0>] (tcp_rcv_established+0x180/0x62c) from [<c032507c>]
(tcp_v4_do_rcv+0x13c/0x31c)
[<c032507c>] (tcp_v4_do_rcv+0x13c/0x31c) from [<c032799c>]
(tcp_v4_rcv+0x718/0x73c)
[<c032799c>] (tcp_v4_rcv+0x718/0x73c) from [<c0304fdc>]
(ip_local_deliver+0x98/0x274)
[<c0304fdc>] (ip_local_deliver+0x98/0x274) from [<c03055d8>]
(ip_rcv+0x420/0x758)
[<c03055d8>] (ip_rcv+0x420/0x758) from [<c02da948>]
(__netif_receive_skb_core+0x44c/0x5bc)
[<c02da948>] (__netif_receive_skb_core+0x44c/0x5bc) from [<c02df280>]
(netif_receive_skb+0x48/0xb4)
[<c02df280>] (netif_receive_skb+0x48/0xb4) from [<c02df824>]
(napi_gro_flush+0x70/0x94)
[<c02df824>] (napi_gro_flush+0x70/0x94) from [<c02df864>]
(napi_complete+0x1c/0x34)
[<c02df864>] (napi_complete+0x1c/0x34) from [<c0278524>]
(stmmac_poll+0x4e8/0x5c8)
[<c0278524>] (stmmac_poll+0x4e8/0x5c8) from [<c02df940>]
(net_rx_action+0xc4/0x1e4)
[<c02df940>] (net_rx_action+0xc4/0x1e4) from [<c00258f0>]
(__do_softirq+0x12c/0x2e8)
[<c00258f0>] (__do_softirq+0x12c/0x2e8) from [<c0025da8>] (irq_exit+0x78/0xac)
[<c0025da8>] (irq_exit+0x78/0xac) from [<c000f020>] (handle_IRQ+0x44/0x90)
[<c000f020>] (handle_IRQ+0x44/0x90) from [<c0008530>]
(gic_handle_irq+0x2c/0x5c)
[<c0008530>] (gic_handle_irq+0x2c/0x5c) from [<c0012f3c>]
(__irq_usr+0x3c/0x60)

3) The driver was setting the dma buffer size after allocating dma buffers,
which caused a system panic when changing the MTU.

BUG: Bad page state in process ifconfig  pfn:2e850
page:c0b72a00 count:0 mapcount:0 mapping:  (null) index:0x0
page flags: 0x200(arch_1)
Modules linked in:
CPU: 0 PID: 566 Comm: ifconfig Not tainted 3.13.0-rc6-01523-gf7111b9 #29
[<c001547c>] (unwind_backtrace+0x0/0xf8) from [<c00122dc>]
(show_stack+0x10/0x14)
[<c00122dc>] (show_stack+0x10/0x14) from [<c03c793c>] (dump_stack+0x70/0x88)
[<c03c793c>] (dump_stack+0x70/0x88) from [<c00b2620>] (bad_page+0xc8/0x118)
[<c00b2620>] (bad_page+0xc8/0x118) from [<c00b302c>]
(get_page_from_freelist+0x744/0x870)
[<c00b302c>] (get_page_from_freelist+0x744/0x870) from [<c00b40f4>]
(__alloc_pages_nodemask+0x118/0x86c)
[<c00b40f4>] (__alloc_pages_nodemask+0x118/0x86c) from [<c00b4858>]
(__get_free_pages+0x10/0x54)
[<c00b4858>] (__get_free_pages+0x10/0x54) from [<c00cba1c>]
(kmalloc_order_trace+0x24/0xa0)
[<c00cba1c>] (kmalloc_order_trace+0x24/0xa0) from [<c02d199c>]
(__kmalloc_reserve.isra.21+0x24/0x70)
[<c02d199c>] (__kmalloc_reserve.isra.21+0x24/0x70) from [<c02d240c>]
(__alloc_skb+0x68/0x13c)
[<c02d240c>] (__alloc_skb+0x68/0x13c) from [<c02d3930>]
(__netdev_alloc_skb+0x3c/0xe8)
[<c02d3930>] (__netdev_alloc_skb+0x3c/0xe8) from [<c0279378>]
(stmmac_open+0x63c/0x1024)
[<c0279378>] (stmmac_open+0x63c/0x1024) from [<c02e18cc>]
(__dev_open+0xa0/0xfc)
[<c02e18cc>] (__dev_open+0xa0/0xfc) from [<c02e1b40>]
(__dev_change_flags+0x94/0x158)
[<c02e1b40>] (__dev_change_flags+0x94/0x158) from [<c02e1c24>]
(dev_change_flags+0x18/0x48)
[<c02e1c24>] (dev_change_flags+0x18/0x48) from [<c0337bc0>]
(devinet_ioctl+0x638/0x700)
[<c0337bc0>] (devinet_ioctl+0x638/0x700) from [<c02c7aec>]
(sock_ioctl+0x64/0x290)
[<c02c7aec>] (sock_ioctl+0x64/0x290) from [<c0100890>]
(do_vfs_ioctl+0x78/0x5b8)
[<c0100890>] (do_vfs_ioctl+0x78/0x5b8) from [<c0100e0c>] (SyS_ioctl+0x3c/0x5c)
[<c0100e0c>] (SyS_ioctl+0x3c/0x5c) from [<c000e760>]

The fixes have been verified using reproducible, automated testing.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agodts: Add a binding for Synopsys emac max-frame-size
Vince Bridgers [Mon, 20 Jan 2014 11:39:00 +0000 (05:39 -0600)]
dts: Add a binding for Synopsys emac max-frame-size

This change adds a parameter for the Synopsys 10/100/1000
stmmac Ethernet driver to configure the maximum frame
size supported by the EMAC driver. Synopsys allows the FIFO
sizes to be configured when the cores are built for a particular
device, but do not provide a way for the driver to read
information from the device about the maximum MTU size
supported as limited by the device's FIFO size.

Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agorxrpc: out of bound read in debug code
Dan Carpenter [Mon, 20 Jan 2014 10:28:59 +0000 (13:28 +0300)]
rxrpc: out of bound read in debug code

Smatch complains because we are using an untrusted index into the
rxrpc_acks[] array.  It's just a read and it's only in the debug code,
but it's simple enough to add a check and fix it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years ago8021q: update description
Yegor Yefremov [Mon, 20 Jan 2014 09:49:39 +0000 (10:49 +0100)]
8021q: update description

Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add
some beautifications like replacing 'ethernet' with 'Ethernet' and
removing unneeded spaces.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: protect protocols not handling ipv4 from v4 connection/bind attempts
Hannes Frederic Sowa [Mon, 20 Jan 2014 04:16:39 +0000 (05:16 +0100)]
ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts

Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: enable anycast addresses as source addresses in ICMPv6 error messages
FX Le Bail [Sun, 19 Jan 2014 16:00:36 +0000 (17:00 +0100)]
ipv6: enable anycast addresses as source addresses in ICMPv6 error messages

- Uses ipv6_anycast_destination() in icmp6_send().

Suggested-by: Bill Fink <billfink@mindspring.com>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agotcp: delete redundant calls of tcp_mtup_init()
Peter Pan(潘卫平) [Sun, 19 Jan 2014 12:44:46 +0000 (20:44 +0800)]
tcp: delete redundant calls of tcp_mtup_init()

As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen
sock, we can delete the redundant calls of tcp_mtup_init() in
tcp_{v4,v6}_syn_recv_sock().

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agopacket: fix a couple of cppcheck warnings
Daniel Borkmann [Sun, 19 Jan 2014 10:46:53 +0000 (11:46 +0100)]
packet: fix a couple of cppcheck warnings

Doesn't bring much, but also doesn't hurt us to fix 'em:

1) In tpacket_rcv() flush dcache page we can restirct the scope
   for start and end and remove one layer of indent.

2) In tpacket_destruct_skb() we can restirct the scope for ph.

3) In alloc_one_pg_vec_page() we can remove the NULL assignment
   and change spacing a bit.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv4: remove the useless argument from ip_tunnel_hash()
Duan Jiong [Sun, 19 Jan 2014 08:43:42 +0000 (16:43 +0800)]
ipv4: remove the useless argument from ip_tunnel_hash()

Since commit c544193214("GRE: Refactor GRE tunneling code")
introduced function ip_tunnel_hash(), the argument itn is no
longer in use, so remove it.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agobond: make slave_sysfs_ops static
stephen hemminger [Sat, 18 Jan 2014 22:54:18 +0000 (14:54 -0800)]
bond: make slave_sysfs_ops static

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: remove unnecessary initializations in net_dev_init
Sabrina Dubroca [Sat, 18 Jan 2014 18:19:27 +0000 (19:19 +0100)]
net: remove unnecessary initializations in net_dev_init

softnet_data is already set to 0, no need to use memset or initialize
specific fields to 0 or NULL afterwards.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: add vxlan description
Jesse Brandeburg [Fri, 17 Jan 2014 19:00:33 +0000 (11:00 -0800)]
net: add vxlan description

Add a description to the vxlan module, helping save the world
from the minions of destruction and confusion.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'sfc'
David S. Miller [Tue, 21 Jan 2014 22:46:56 +0000 (14:46 -0800)]
Merge branch 'sfc'

Shradha Shah says:

====================
Bug Fixes for SFC driver

I am taking over the upstream patch submission work for the
sfc driver from Ben Hutchings.

These patches are bug fixes to the sfc driver involving
replacement of the PORT RESET MC command and fixing transposed
ptp_{under,over}size_sync_window_statistics

The PORT_RESET bug fix is needed for all versions supporting EF10
i.e all versions including and after 3.12.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosfc: Fix transposed ptp_{under, over}size_sync_windows statistics
Ben Hutchings [Fri, 17 Jan 2014 19:48:17 +0000 (19:48 +0000)]
sfc: Fix transposed ptp_{under, over}size_sync_windows statistics

Somehow I transposed these two while bringing the original statistics
support upstream.

Fixes: 99691c4ac112 ('sfc: Add PTP counters to ethtool stats')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosfc: Change efx_mcdi_reset_port to use ENTITY_RESET MC command.
Jon Cooper [Fri, 17 Jan 2014 19:48:06 +0000 (19:48 +0000)]
sfc: Change efx_mcdi_reset_port to use ENTITY_RESET MC command.

PORT_RESET MC command was NOP in the ef10 firmware hence we are using
ENTITY_RESET to make sure all resource allocations are reset.

Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()
WANG Cong [Fri, 17 Jan 2014 19:37:03 +0000 (11:37 -0800)]
net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()

So that we will not expose struct tcf_common to modules.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: act: fetch hinfo from a->ops->hinfo
WANG Cong [Fri, 17 Jan 2014 19:37:02 +0000 (11:37 -0800)]
net_sched: act: fetch hinfo from a->ops->hinfo

Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: fix NULL pointer dereference in stmmac_get_tx_hwtstamp
damuzi000 [Fri, 17 Jan 2014 15:47:59 +0000 (23:47 +0800)]
net: stmmac: fix NULL pointer dereference in stmmac_get_tx_hwtstamp

When timestamping is enabled, stmmac_tx_clean will call
stmmac_get_tx_hwtstamp to get tx TS.
But the skb can be NULL because the last of its tx_skbuff is NULL
if this packet frame is filled in more than one descriptors.

To fix the issue, change the code:
- Store TX skb to the tx_skbuff[] of frame's last segment.
- Check skb is not NULL in stmmac_get_tx_hwtstamp.

Signed-off-by: Bruce Liu <damuzi000@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: sunxi platform extensions for GMAC in Allwinner A20 SoC's
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:47 +0000 (21:24 +0800)]
net: stmmac: sunxi platform extensions for GMAC in Allwinner A20 SoC's

The Allwinner A20 has an ethernet controller that seems to be
an early version of Synopsys DesignWare MAC 10/100/1000 Universal,
which is supported by the stmmac driver.

Allwinner's GMAC requires setting additional registers in the SoC's
clock control unit.

The exact version of the DWMAC IP that Allwinner uses is unknown,
thus the exact feature set is unknown.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: Use driver data and callbacks tied with compatible strings
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:46 +0000 (21:24 +0800)]
net: stmmac: Use driver data and callbacks tied with compatible strings

The stmmac driver core allows passing feature flags and callbacks via
platform data. Add a similar stmmac_of_data to pass flags and callbacks
tied to compatible strings. This allows us to extend stmmac with glue
layers for different SoCs.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: Deprecate snps, phy-addr and auto-detect PHY address
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:45 +0000 (21:24 +0800)]
net: stmmac: Deprecate snps, phy-addr and auto-detect PHY address

The snps,phy-addr device tree property is non-standard, and should be
removed in favor of proper phy node support. Remove it from the binding
documents and warn if the property is still used.

Most PHYs respond to address 0, but a few don't, so auto-detect PHY
address by default, to make up for the lack of explicit address selection.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: Honor DT parameter to force DMA store and forward mode
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:44 +0000 (21:24 +0800)]
net: stmmac: Honor DT parameter to force DMA store and forward mode

"snps,force_sf_dma_mode" is documented in stmmac device tree bindings,
but is never handled by the driver.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoblackfin: Update stmmac callback signatures
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:43 +0000 (21:24 +0800)]
blackfin: Update stmmac callback signatures

stmmac callbacks have been extended for better separation.
Update them to avoid breakage.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: Allocate and pass soc/board specific data to callbacks
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:42 +0000 (21:24 +0800)]
net: stmmac: Allocate and pass soc/board specific data to callbacks

The current .init and .exit callbacks requires access to driver
private data structures. This is not a good seperation and abstraction.

Instead, we add a new .setup callback for allocating private data, and
pass the returned pointer to the other callbacks.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: Add support for optional reset control
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:41 +0000 (21:24 +0800)]
net: stmmac: Add support for optional reset control

The DWMAC has a reset assert line, which is used on some SoCs. Add an
optional reset control to stmmac driver core.

To support reset control deferred probing, this patch changes the driver
probe function to return the actual error, instead of just -EINVAL.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: Enable stmmac main clock when probing hardware
Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:40 +0000 (21:24 +0800)]
net: stmmac: Enable stmmac main clock when probing hardware

The stmmac driver does not enable the main clock during the probe phase.
If the clock was not enabled by the boot loader or was disabled by the
kernel, hardware features and the MDIO bus would not be probed properly.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoDT: net: davinci_emac: "phy-handle" property is actually optional
Sergei Shtylyov [Thu, 16 Jan 2014 22:32:13 +0000 (01:32 +0300)]
DT: net: davinci_emac: "phy-handle" property is actually optional

Though described as required, the "phy-handle" property for the DaVinci EMAC
binding is actually optional, as the driver will happily function without it,
assuming 100/FULL link; the property is not specified  either in the example
device node,  or in the actual EMAC device nodes for DA850 and AM3517 device
trees.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: fix "queues" uevent between network namespaces
Weilong Chen [Thu, 16 Jan 2014 09:24:31 +0000 (17:24 +0800)]
net: fix "queues" uevent between network namespaces

When I create a new namespace with 'ip netns add net0', or add/remove
new links in a namespace with 'ip link add/delete type veth', rx/tx
queues events can be got in all namespaces. That is because rx/tx queue
ktypes do not have namespace support, and their kobj parents are setted to
NULL. This patch is to fix it.

Reported-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet_sched: act: remove capab from struct tc_action_ops
WANG Cong [Wed, 15 Jan 2014 23:49:30 +0000 (15:49 -0800)]
net_sched: act: remove capab from struct tc_action_ops

It is not actually implemented.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: document accel_priv parameter for __dev_queue_xmit()
Jason Wang [Mon, 20 Jan 2014 03:25:13 +0000 (11:25 +0800)]
net: document accel_priv parameter for __dev_queue_xmit()

To silent "make htmldocs" warning.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosctp: fix missing SCTP mailing list address update
Jean Sacren [Sun, 19 Jan 2014 22:24:53 +0000 (15:24 -0700)]
sctp: fix missing SCTP mailing list address update

The commit 91705c61b5202 ("net: sctp: trivial: update mailing list
address") updated almost all the SCTP mailing list address from

"lksctp-developers@lists.sourceforge.net"
to
"linux-sctp@vger.kernel.org"

except for the one in include/linux/sctp.h file. Fix this way trivial
one so that all is updated.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: optimize link local address search
Hannes Frederic Sowa [Sun, 19 Jan 2014 20:58:19 +0000 (21:58 +0100)]
ipv6: optimize link local address search

ipv6_link_dev_addr sorts newly added addresses by scope in
ifp->addr_list. Smaller scope addresses are added to the tail of the
list. Use this fact to iterate in reverse over addr_list and break out
as soon as a higher scoped one showes up, so we can spare some cycles
on machines with lot's of addresses.

The ordering of the addresses is not relevant and we are more likely to
get the eui64 generated address with this change anyway.

Suggested-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agophy: cleanup 10g code
stephen hemminger [Sun, 19 Jan 2014 19:48:20 +0000 (11:48 -0800)]
phy: cleanup 10g code

Code should avoid needless exports, don't export something unless it used.
Make local functions static and remove unused stubs.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoqlcnic: fix sparse warnings
Fengguang Wu [Sun, 19 Jan 2014 19:37:12 +0000 (11:37 -0800)]
qlcnic: fix sparse warnings

Previous patch changed prototypes, but forgot functions.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams
Hannes Frederic Sowa [Mon, 20 Jan 2014 02:43:08 +0000 (03:43 +0100)]
ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams

We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agosch_netem: replace magic numbers with enumerate
Yang Yingliang [Sat, 18 Jan 2014 10:13:31 +0000 (18:13 +0800)]
sch_netem: replace magic numbers with enumerate

Replace some magic numbers which describe states of 4-state model
loss generator with enumerate.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: add flowlabel_consistency sysctl
Florent Fourcot [Fri, 17 Jan 2014 16:15:05 +0000 (17:15 +0100)]
ipv6: add flowlabel_consistency sysctl

With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
 * rename ip6_flowlabel_consistency to flowlabel_consistency
 * use net_info_ratelimited()
 * checkpatch cleanups

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: add a flag to get the flow label used remotly
Florent Fourcot [Fri, 17 Jan 2014 16:15:04 +0000 (17:15 +0100)]
ipv6: add a flag to get the flow label used remotly

This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET
Florent Fourcot [Fri, 17 Jan 2014 16:15:03 +0000 (17:15 +0100)]
ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET

With this option, the socket will reply with the flow label value read
on received packets.

The goal is to have a connection with the same flow label in both
direction of the communication.

Changelog of V4:
 * Do not erase the flow label on the listening socket. Use pktopts to
 store the received value

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoipv4: be friend with drop monitor
Eric Dumazet [Sun, 19 Jan 2014 02:27:49 +0000 (18:27 -0800)]
ipv4: be friend with drop monitor

Replace some dev_kfree_skb() with kfree_skb() calls when
we drop one skb, this might help bug tracking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: add build-time checks for msg->msg_name size
Steffen Hurrle [Fri, 17 Jan 2014 21:53:15 +0000 (22:53 +0100)]
net: add build-time checks for msg->msg_name size

This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: introduce SO_BPF_EXTENSIONS
Michal Sekletar [Fri, 17 Jan 2014 16:09:45 +0000 (17:09 +0100)]
net: introduce SO_BPF_EXTENSIONS

For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f85d ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.

Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.

As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sat, 18 Jan 2014 08:55:41 +0000 (00:55 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 18 Jan 2014 06:19:28 +0000 (22:19 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) The value choosen for the new SO_MAX_PACING_RATE socket option on
    parisc was very poorly choosen, let's fix it while we still can.
    From Eric Dumazet.

 2) Our generic reciprocal divide was found to handle some edge cases
    incorrectly, part of this is encoded into the BPF as deep as the JIT
    engines themselves.  Just use a real divide throughout for now.
    From Eric Dumazet.

 3) Because the initial lookup is lockless, the TCP metrics engine can
    end up creating two entries for the same lookup key.  Fix this by
    doing a second lookup under the lock before we actually create the
    new entry.  From Christoph Paasch.

 4) Fix scatter-gather list init in usbnet driver, from Bjørn Mork.

 5) Fix unintended 32-bit truncation in cxgb4 driver's bit shifting.
    From Dan Carpenter.

 6) Netlink socket dumping uses the wrong socket state for timewait
    sockets.  Fix from Neal Cardwell.

 7) Fix netlink memory leak in ieee802154_add_iface(), from Christian
    Engelmayer.

 8) Multicast forwarding in ipv4 can overflow the per-rule reference
    counts, causing all multicast traffic to cease.  Fix from Hannes
    Frederic Sowa.

 9) via-rhine needs to stop all TX queues when it resets the device,
    from Richard Weinberger.

10) Fix RDS per-cpu accesses broken by the this_cpu_* conversions.  From
    Gerald Schaefer.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions
  parisc: fix SO_MAX_PACING_RATE typo
  ipv6: simplify detection of first operational link-local address on interface
  tcp: metrics: Avoid duplicate entries with the same destination-IP
  net: rds: fix per-cpu helper usage
  e1000e: Fix compilation warning when !CONFIG_PM_SLEEP
  bpf: do not use reciprocal divide
  be2net: add dma_mapping_error() check for dma_map_page()
  bnx2x: Don't release PCI bars on shutdown
  net,via-rhine: Fix tx_timeout handling
  batman-adv: fix batman-adv header overhead calculation
  qlge: Fix vlan netdev features.
  net: avoid reference counter overflows on fib_rules in multicast forwarding
  dm9601: add USB IDs for new dm96xx variants
  MAINTAINERS: add virtio-dev ML for virtio
  ieee802154: Fix memory leak in ieee802154_add_iface()
  net: usbnet: fix SG initialisation
  inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
  cxgb4: silence shift wrapping static checker warning