git.karo-electronics.de Git - linux-beck.git/log

]> git.karo-electronics.de Git - linux-beck.git/log

Johan Hedberg [Mon, 16 Sep 2013 10:05:18 +0000 (13:05 +0300)]

Bluetooth: Fix responding to invalid L2CAP signaling commands

When we have an LE link we should not respond to any data on the BR/EDR
L2CAP signaling channel (0x0001) and vice-versa when we have a BR/EDR
link we should not respond to LE L2CAP (CID 0x0005) signaling commands.
This patch fixes this issue by checking for a valid link type and
ignores data if it is wrong.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Johan Hedberg [Mon, 16 Sep 2013 10:05:17 +0000 (13:05 +0300)]

Bluetooth: Fix sending responses to identified L2CAP response packets

When L2CAP packets return a non-zero error and the value is passed
onwards by l2cap_bredr_sig_cmd this will trigger a command reject packet
to be sent. However, the core specification (page 1416 in core 4.0) says
the following: "Command Reject packets should not be sent in response to
an identified Response packet.".

This patch ensures that a command reject packet is not sent for any
identified response packet by ignoring the error return value from the
response handler functions.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Johan Hedberg [Mon, 16 Sep 2013 10:05:16 +0000 (13:05 +0300)]

Bluetooth: Fix L2CAP command reject reason

There are several possible reason codes that can be sent in the command
reject L2CAP packet. Before this patch the code has used a hard-coded
single response code ("command not understood"). This patch adds a
helper function to map the return value of an L2CAP handler function to
the correct command reject reason.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Johan Hedberg [Mon, 16 Sep 2013 10:05:15 +0000 (13:05 +0300)]

Bluetooth: Fix L2CAP Disconnect response for unknown CID

If we receive an L2CAP Disconnect Request for an unknown CID we should
not just silently drop it but reply with a proper Command Reject
response. This patch fixes this by ensuring that the disconnect handler
returns a proper error instead of 0 and will cause the function caller
to send the right response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Johan Hedberg [Mon, 16 Sep 2013 10:05:14 +0000 (13:05 +0300)]

Bluetooth: Fix L2CAP error return used for failed channel lookups

The EFAULT error should only be used for memory address related errors
and ENOENT might be needed for other purposes than invalid CID errors.
This patch fixes the l2cap_config_req, l2cap_connect_create_rsp and
l2cap_create_channel_req handlers to use the unique EBADSLT error to
indicate failed lookups on a given CID.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Johan Hedberg [Mon, 16 Sep 2013 10:05:13 +0000 (13:05 +0300)]

Bluetooth: Fix double error response for l2cap_create_chan_req

When an L2CAP request handler returns non-zero the calling code will
send a command reject response. The l2cap_create_chan_req function will
in some cases send its own response but then still return a -EFAULT
error which would cause two responses to be sent. This patch fixes this
by making the function return 0 after sending its own response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Johan Hedberg [Mon, 16 Sep 2013 10:05:12 +0000 (13:05 +0300)]

Bluetooth: Remove unused event mask struct

The struct for HCI_Set_Event_Mask is never used. Instead a local 8-byte
array is used for sending this command. Therefore, remove the
unnecessary struct definition.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Wed, 4 Sep 2013 01:08:38 +0000 (18:08 -0700)]

Bluetooth: Only schedule raw queue when user channel is active

When the user channel is set and an user application has full control
over the device, do not bother trying to schedule any queues except
the raw queue.

This is an optimization since with user channel, only the raw queue
is in use.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Wed, 4 Sep 2013 01:11:07 +0000 (18:11 -0700)]

Bluetooth: Use GFP_KERNEL when cloning SKB in a workqueue

There is no need to use GFP_ATOMIC with skb_clone() when the code is
executed in a workqueue.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Wed, 4 Sep 2013 01:08:37 +0000 (18:08 -0700)]

Bluetooth: Disable upper layer connections when user channel is active

When the device has the user channel flag set, it means it is driven by
an user application. In that case do not allow any connections from
L2CAP or SCO sockets.

This is the same situation as when the device has the raw flag set and
it will then return EHOSTUNREACH.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Mon, 2 Sep 2013 17:41:39 +0000 (10:41 -0700)]

Bluetooth: Add support creating virtual AMP controllers

So far the only option to create a virtual AMP controller was by
setting a module parameter for the hci_vhci driver. This patch adds
the functionality to define inline to create either a BR/EDR or an
AMP controller.

In addition the client will be informed which HCI controller index
it got assigned. That is especially useful for automated end-to-end
testing.

To keep backwards compatibility with existing userspace, the command
for creating a controller type needs to be send right after opening
the device node. If the command is not send, it defaults back to
automatically creating a BR/EDR controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Tue, 27 Aug 2013 05:02:38 +0000 (22:02 -0700)]

Bluetooth: Use devname:vhci module alias for virtual HCI driver

To allow creating /dev/vhci device node, add the proper module alias for
this driver.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Tue, 27 Aug 2013 04:40:52 +0000 (21:40 -0700)]

Bluetooth: Introduce new HCI socket channel for user operation

This patch introcuces a new HCI socket channel that allows user
applications to take control over a specific HCI device. The application
gains exclusive access to this device and forces the kernel to stay away
and not manage it. In case of the management interface it will actually
hide the device.

Such operation is useful for security testing tools that need to operate
underneath the Bluetooth stack and need full control over a device. The
advantage here is that the kernel still provides the service of hardware
abstraction and HCI level access. The use of Bluetooth drivers for
hardware access also means that sniffing tools like btmon or hcidump
are still working and the whole set of transaction can be traced with
existing tools.

With the new channel it is possible to send HCI commands, ACL and SCO
data packets and receive HCI events, ACL and SCO packets from the
device. The format follows the well established H:4 protocol.

The new HCI user channel can only be established when a device has been
through its setup routine and is currently powered down. This is
enforced to not cause any problems with current operations. In addition
only one user channel per HCI device is allowed. It is exclusive access
for one user application. Access to this channel is limited to process
with CAP_NET_RAW capability.

Using this new facility does not require any external library or special
ioctl or socket filters. Just create the socket and bind it. After that
the file descriptor is ready to speak H:4 protocol.

        struct sockaddr_hci addr;
        int fd;

        fd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI);

        memset(&addr, 0, sizeof(addr));
        addr.hci_family = AF_BLUETOOTH;
        addr.hci_dev = 0;
        addr.hci_channel = HCI_CHANNEL_USER;

        bind(fd, (struct sockaddr *) &addr, sizeof(addr));

The example shows on how to create a user channel for hci0 device. Error
handling has been left out of the example. However with the limitations
mentioned above it is advised to handle errors. Binding of the user
cahnnel socket can fail for various reasons. Specifically if the device
is currently activated by BlueZ or if the access permissions are not
present.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Tue, 27 Aug 2013 04:40:51 +0000 (21:40 -0700)]

Bluetooth: Introduce user channel flag for HCI devices

This patch introduces a new user channel flag that allows to give full
control of a HCI device to a user application. The kernel will stay away
from the device and does not allow any further modifications of the
device states.

The existing raw flag is not used since it has a bit of unclear meaning
due to its legacy. Using a new flag makes the code clearer.

A device with the user channel flag set can still be enumerate using the
legacy API, but it does not longer enumerate using the new management
interface used by BlueZ 5 and beyond. This is intentional to not confuse
users of modern systems.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Mon, 26 Aug 2013 16:39:55 +0000 (09:39 -0700)]

Bluetooth: Restrict ioctls to HCI raw channel sockets

The various legacy ioctls used with HCI sockets are limited to raw
channel only. They are not used on the other channels and also have
no meaning there. So return an error if tried to use them.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Mon, 26 Aug 2013 16:29:39 +0000 (09:29 -0700)]

Bluetooth: Fix error handling for HCI socket options

The HCI sockets for monitor and control do not support any HCI specific
socket options and if tried, an error will be returned. However the
error used is EINVAL and that is not really descriptive. To make it
clear that these sockets are not handling HCI socket options, return
EBADFD instead.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Tue, 27 Aug 2013 03:57:58 +0000 (20:57 -0700)]

Bluetooth: Report error for HCI reset ioctl when device is down

Even if this is legacy API, there is no reason to not report a proper
error when trying to reset a HCI device that is down.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Mon, 26 Aug 2013 07:20:37 +0000 (00:20 -0700)]

Bluetooth: Fix handling of getsockname() for HCI sockets

The hci_dev check is not protected and so move it into the socket lock. In
addition return the HCI channel identifier instead of always 0 channel.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Mon, 26 Aug 2013 07:06:30 +0000 (00:06 -0700)]

Bluetooth: Fix handling of getpeername() for HCI sockets

The HCI sockets do not have a peer associated with it and so make sure
that getpeername() returns EOPNOTSUPP since this operation is actually
not supported on HCI sockets.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Marcel Holtmann [Mon, 26 Aug 2013 06:25:15 +0000 (23:25 -0700)]

Bluetooth: Refactor raw socket filter into more readable code

The handling of the raw socket filter is rather obscure code and it gets
in the way of future extensions. Instead of inline filtering in the raw
socket packet routine, refactor it into its own function.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:44 +0000 (00:19 +0200)]

net: ipv6: mld: document force_mld_version in ip-sysctl.txt

Document force_mld_version parameter in ip-sysctl.txt.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:43 +0000 (00:19 +0200)]

net: ipv6: mld: introduce mld_{gq, ifc, dad}_stop_timer functions

We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:42 +0000 (00:19 +0200)]

net: ipv6: mld: refactor query processing into v1/v2 functions

Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:41 +0000 (00:19 +0200)]

net: ipv6: mld: similarly to MLDv2 have min max_delay of 1

Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:40 +0000 (00:19 +0200)]

net: ipv6: mld: implement RFC3810 MLDv2 mode only

RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:

  A forged Version 1 Query message will put MLDv2 listeners on that
  link in MLDv1 Host Compatibility Mode. This scenario can be avoided
  by providing MLDv2 hosts with a configuration option to ignore
  Version 1 messages completely.

Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:

  echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version  or
  echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version

Note that <all> device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on <all> case.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:39 +0000 (00:19 +0200)]

net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation

Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:38 +0000 (00:19 +0200)]

net: ipv6: mld: clean up MLD_V1_SEEN macro

Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 22:19:37 +0000 (00:19 +0200)]

net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.

i) RFC3810, 9.2. Query Interval [QI] says:

   The Query Interval variable denotes the interval between General
   Queries sent by the Querier. Default value: 125 seconds. [...]

ii) RFC3810, 9.3. Query Response Interval [QRI] says:

  The Maximum Response Delay used to calculate the Maximum Response
  Code inserted into the periodic General Queries. Default value:
  10000 (10 seconds) [...] The number of seconds represented by the
  [Query Response Interval] must be less than the [Query Interval].

iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:

  The Older Version Querier Present Timeout is the time-out for
  transitioning a host back to MLDv2 Host Compatibility Mode. When an
  MLDv1 query is received, MLDv2 hosts set their Older Version Querier
  Present Timer to [Older Version Querier Present Timeout].

  This value MUST be ([Robustness Variable] times (the [Query Interval]
  in the last Query received)) plus ([Query Response Interval]).

Hence, on *default* the timeout results in:

  [RV] = 2, [QI] = 125sec, [QRI] = 10sec
  [OVQPT] = [RV] * [QI] + [QRI] = 260sec

Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...

  switchback = (idev->mc_qrv + 1) * max_delay

RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:

  This section is meant to provide advice to network administrators on
  how to tune these settings to their network. Ambitious router
  implementations might tune these settings dynamically based upon
  changing characteristics of the network. [...]

iv) RFC38010, 9.14.2. Query Interval:

  The overall level of periodic MLD traffic is inversely proportional
  to the Query Interval. A longer Query Interval results in a lower
  overall level of MLD traffic. The value of the Query Interval MUST
  be equal to or greater than the Maximum Response Delay used to
  calculate the Maximum Response Code inserted in General Query
  messages.

I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.

Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.

Hence, introduce necessary helper functions and fix this up properly as it
should be.

Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yuchung Cheng [Tue, 3 Sep 2013 21:14:35 +0000 (14:14 -0700)]

tcp: better comments for RTO initiallization

Commit 1b7fdd2ab585("tcp: do not use cached RTT for RTT estimation")
removes important comments on how RTO is initialized and updated.
Hopefully this patch puts those information back.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Pravin B Shelar [Tue, 3 Sep 2013 16:44:44 +0000 (09:44 -0700)]

vxlan: Optimize vxlan rcv

vxlan-udp-recv function lookup vxlan_sock struct on every packet
recv by using udp-port number. we can use sk->sk_user_data to
store vxlan_sock and avoid lookup.
I have open coded rcu-api to store and read vxlan_sock from
sk_user_data to avoid sparse warning as sk_user_data is not
__rcu pointer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andy Shevchenko [Tue, 3 Sep 2013 12:17:56 +0000 (15:17 +0300)]

atm: he: print MAC via %pM

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andy Shevchenko [Tue, 3 Sep 2013 12:13:43 +0000 (15:13 +0300)]

atm: nicstar: re-use native mac_pton() helper

There is a nice helper to parse MAC. Let's use it and remove custom
implementation.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sonic Zhang [Tue, 3 Sep 2013 05:55:07 +0000 (13:55 +0800)]

driver:stmmac: Adjust time stamp increase for 0.465 ns accurate only when Time stamp binary rollover is set.

The synopsys spec says When TSCRLSSR is cleard, the rollover value of
sub-second register is 0x7FFFFFFF(0.465 ns per clock).

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexander Sverdlin [Mon, 2 Sep 2013 13:58:25 +0000 (15:58 +0200)]

net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4

net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4

Initially the problem was observed with ipsec, but later it became clear that
SCTP data chunk fragmentation algorithm has problems with MTU values which are
not multiple of 4. Test program was used which just transmits 2000 bytes long
packets to other host. tcpdump was used to observe re-fragmentation in IP layer
after SCTP already fragmented data chunks.

With MTU 1500:
12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
    10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
    10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

With MTU 1499:
13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
    10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
    10.151.38.153 > 10.151.24.91: ip-proto-132
13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
    10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

Here problem in data portion limit calculation leads to re-fragmentation in IP,
which is sub-optimal. The problem is max_data initial value, which doesn't take
into account the fact, that data chunk must be padded to 4-bytes boundary.
It's enough to correct max_data, because all later adjustments are correctly
aligned to 4-bytes boundary.

After the fix is applied, everything is fragmented correctly for uneven MTUs:
15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
    10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
    10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
    10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]

The bug was there for years already, but
- is a performance issue, the packets are still transmitted
- doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Julia Lawall [Mon, 2 Sep 2013 09:54:21 +0000 (11:54 +0200)]

drivers:net: delete premature free_irq

Free_irq is not needed if there has been no request_irq. Free_irq is
removed from both the probe and remove functions. The correct request_irq
and free_irq are found in the open and close functions.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@

*e = platform_get_irq(...);
... when != request_irq(e,...)
*free_irq(e,...)
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Carlos O'Donell [Thu, 15 Aug 2013 09:28:10 +0000 (17:28 +0800)]

net: sync some IP headers with glibc

Solution:
=========

- Synchronize linux's `include/uapi/linux/in6.h'
  with glibc's `inet/netinet/in.h'.
- Synchronize glibc's `inet/netinet/in.h with linux's
  `include/uapi/linux/in6.h'.
- Allow including the headers in either other.
- First header included defines the structures and macros.

Details:
========

The kernel promises not to break the UAPI ABI so I don't
see why we can't just have the two userspace headers
coordinate?

If you include the kernel headers first you get those,
and if you include the glibc headers first you get those,
and the following patch arranges a coordination and
synchronization between the two.

Let's handle `include/uapi/linux/in6.h' from linux,
and `inet/netinet/in.h' from glibc and ensure they compile
in any order and preserve the required ABI.

These two patches pass the following compile tests:

cat >> test1.c <<EOF
int main (void) {
  return 0;
}
EOF
gcc -c test1.c

cat >> test2.c <<EOF
int main (void) {
  return 0;
}
EOF
gcc -c test2.c

One wrinkle is that the kernel has a different name for one of
the members in ipv6_mreq. In the kernel patch we create a macro
to cover the uses of the old name, and while that's not entirely
clean it's one of the best solutions (aside from an anonymous
union which has other issues).

I've reviewed the code and it looks to me like the ABI is
assured and everything matches on both sides.

Notes:
- You want netinet/in.h to include bits/in.h as early as possible,
  but it needs in_addr so define in_addr early.
- You want bits/in.h included as early as possible so you can use
  the linux specific code to define __USE_KERNEL_DEFS based on
  the _UAPI_* macro definition and use those to cull in.h.
- glibc was missing IPPROTO_MH, added here.

Compile tested and inspected.

Reported-by: Thomas Backlund <tmb@mageia.org>
Cc: Thomas Backlund <tmb@mageia.org>
Cc: libc-alpha@sourceware.org
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Carpenter [Wed, 4 Sep 2013 15:07:27 +0000 (18:07 +0300)]

sfc: check for allocation failure

It upsets static analyzers when we don't check for allocation failure.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 4 Sep 2013 16:40:37 +0000 (12:40 -0400)]

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to igb only.

Todd provides a fix for igb to not look for a PBA in the iNVM on
devices that are flashless.

Akeem provides igb patches to add a new PHY id for i354, as well as
a couple of patches to implement the new PHY id. He also provides
several patches to correctly report the appropriate media type as
well as correctly report advertised/supported link for i354 devices.
Lastly Akeem implements a 1 second delay mechanism for i210 devices
to avoid erroneous link issue with the link partner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 4 Sep 2013 16:28:02 +0000 (12:28 -0400)]

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
The following batch contains:

* Three fixes for the new synproxy target available in your
net-next tree, from Jesper D. Brouer and Patrick McHardy.

* One fix for TCPMSS to correctly handling the fragmentation
case, from Phil Oester. I'll pass this one to -stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Akeem G Abodunrin [Thu, 22 Aug 2013 14:23:10 +0000 (14:23 +0000)]

igb: Update version number

This patch updates igb driver version to 5.0.5

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Akeem G Abodunrin [Fri, 30 Aug 2013 23:49:36 +0000 (23:49 +0000)]

igb: Implementation to report advertised/supported link on i354 devices

This patch changes the way we report supported/advertised link for i354
devices, especially for 2.5 GB. Instead of reporting 2.5 GB for all i354
devices erroneously, check first, if it is 2.5 GB capable.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Akeem G Abodunrin [Wed, 28 Aug 2013 02:23:09 +0000 (02:23 +0000)]

igb: Get speed and duplex for 1G non_copper devices

This patch changes how we get speed/duplex for non_copper devices; it
now uses pcs register to get current speed and duplex instead of using
generic status register that we use to detect speed/duplex for copper
devices.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Phil Oester [Sun, 1 Sep 2013 15:32:21 +0000 (08:32 -0700)]

netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet

In commit b396966c4 (netfilter: xt_TCPMSS: Fix missing fragmentation handling),
I attempted to add safe fragment handling to xt_TCPMSS. However, Andy Padavan
of Project N56U correctly points out that returning XT_CONTINUE in this
function does not work. The callers (tcpmss_tg[46]) expect to receive a value
of 0 in order to return XT_CONTINUE.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Akeem G Abodunrin [Wed, 28 Aug 2013 02:23:04 +0000 (02:23 +0000)]

igb: Support to get 2_5G link status for appropriate media type

Since i354 2.5Gb devices are not Copper media type but SerDes, so this
patch changes the way we detect speed/duplex link info for this device.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Akeem G Abodunrin [Wed, 28 Aug 2013 02:22:53 +0000 (02:22 +0000)]

igb: No PHPM support in i354 devices

PHY Power Management does not exist for i354 device. So, there is no
need to read and write this register or clear go link Disconnect bit,
which could cause a lot of issues.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Akeem G Abodunrin [Wed, 28 Aug 2013 02:22:48 +0000 (02:22 +0000)]

igb: M88E1543 PHY downshift implementation

This patch implements downshift mechanism for M88E1543 PHY, so that
downshift is disabled first during link setup process, and later enabled
if we are master and downshift link is negotiated. Also cleaned up
return code implementation.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Akeem G Abodunrin [Wed, 28 Aug 2013 02:22:58 +0000 (02:22 +0000)]

igb: New PHY_ID for i354 device

This patch changes PHY_ID for i354 device, now using M88E1543
instead of M88E1545.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Akeem G Abodunrin [Wed, 28 Aug 2013 02:22:43 +0000 (02:22 +0000)]

igb: Implementation of 1-sec delay for i210 devices

This patch adds 1 sec delay mechanism to i210 device family, in order
to avoid erroneous link issue with the link partner.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Todd Fujinaka [Fri, 23 Aug 2013 07:49:00 +0000 (07:49 +0000)]

igb: Don't look for a PBA in the iNVM when flashless

When a part is flashless, do not look for a PBA in the iNVM.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jesper Dangaard Brouer [Thu, 29 Aug 2013 10:18:46 +0000 (12:18 +0200)]

netfilter: SYNPROXY: let unrelated packets continue

Packets reaching SYNPROXY were default dropped, as they were most
likely invalid (given the recommended state matching). This
patch, changes SYNPROXY target to let packets, not consumed,
continue being processed by the stack.

This will be more in line other target modules. As it will allow
more flexible configurations of handling, logging or matching on
packets in INVALID states.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Patrick McHardy [Thu, 29 Aug 2013 08:32:09 +0000 (10:32 +0200)]

netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length()

With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init:

[ 80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]()

The reason is that the conntrack template is set to confirmed before adding
the extension and it is invalid to add extensions to already confirmed
conntracks. Fix by adding the extensions before setting the conntrack to
confirmed.

Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Jesper Dangaard Brouer [Wed, 28 Aug 2013 13:14:38 +0000 (15:14 +0200)]

netfilter: more strict TCP flag matching in SYNPROXY

Its seems Patrick missed to incoorporate some of my requested changes
during review v2 of SYNPROXY netfilter module.

Which were, to avoid SYN+ACK packets to enter the path, meant for the
ACK packet from the client (from the 3WHS).

Further there were a bug in ip6t_SYNPROXY.c, for matching SYN packets
that didn't exclude the ACK flag.

Go a step further with SYN packet/flag matching by excluding flags
ACK+FIN+RST, in both IPv4 and IPv6 modules.

The intented usage of SYNPROXY is as follows:
(gracefully describing usage in commit)

iptables -t raw -A PREROUTING -i eth0 -p tcp --dport 80 --syn -j NOTRACK
iptables -A INPUT -i eth0 -p tcp --dport 80 -m state UNTRACKED,INVALID \
-j SYNPROXY --sack-perm --timestamp --mss 1480 --wscale 7 --ecn

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

This does filter SYN flags early, for packets in the UNTRACKED state,
but packets in the INVALID state with other TCP flags could still
reach the module, thus this stricter flag matching is still needed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Vijay Subramanian [Tue, 3 Sep 2013 19:23:22 +0000 (12:23 -0700)]

tcp: Change return value of tcp_rcv_established()

tcp_rcv_established() returns only one value namely 0. We change the return
value to void (as suggested by David Miller).

After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in
response to SYNs. We can remove the check and processing on the return value of
tcp_rcv_established().

We also fix jtcp_rcv_established() in tcp_probe.c to match that of
tcp_rcv_established().

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Tue, 3 Sep 2013 16:24:02 +0000 (18:24 +0200)]

net: tcp_probe: adapt tbuf size for recent changes

With recent changes in tcp_probe module (e.g. f925d0a62d ("net: tcp_probe:
add IPv6 support")) we also need to take into account that tbuf needs to
be updated as format string will be further expanded. tbuf sits on the stack
in tcpprobe_read() function that is invoked when user space reads procfs
file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established().
Having a size similarly as in sctp_probe module of 256 bytes is fully
sufficient for that, we need theoretical maximum of 252 bytes otherwise we
could get truncated.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Carpenter [Tue, 3 Sep 2013 09:13:47 +0000 (12:13 +0300)]

qlcnic: remove a stray semicolon

Just remove a small semicolon.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sucheta Chakraborty [Tue, 3 Sep 2013 09:07:37 +0000 (05:07 -0400)]

qlcnic: Fix sparse warning.

This patch fixes warning "warning: symbol 'qlcnic_set_dcb_ops' was
not declared. Should it be static?"

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Carpenter [Tue, 3 Sep 2013 09:03:40 +0000 (12:03 +0300)]

x25: add a sanity check parsing X.25 facilities

This was found with a manual audit and I don't have a reproducer. We
limit ->calling_len and ->called_len when we get them from
copy_from_user() in x25_ioctl() so when they come from skb->data then
we should cap them there as well.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Carpenter [Tue, 3 Sep 2013 09:02:32 +0000 (12:02 +0300)]

caif: add a sanity check to the tty name

"tty->name" and "name" are a 64 character buffers. My static checker
complains because we add the "cf" on the front so it look like we are
copying a 66 character string into a 64 character buffer.

Also if the name is larger than IFNAMSIZ (16) it triggers a BUG_ON()
inside the call to alloc_netdev().

This is all under CAP_SYS_ADMIN so it's not a security fix, it just adds
a little robustness.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Anton Blanchard [Mon, 2 Sep 2013 23:55:32 +0000 (09:55 +1000)]

ibmveth: Fix little endian issues

The hypervisor is big endian, so little endian kernel builds need
to byteswap.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jingoo Han [Mon, 2 Sep 2013 23:54:04 +0000 (08:54 +0900)]

net: netx-eth: remove unnecessary casting

Casting from 'void *' is unnecessary, because casting from 'void *'
to any pointer type is automatic.

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 2 Sep 2013 18:42:32 +0000 (11:42 -0700)]

cnic: Update version to 2.5.18.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 2 Sep 2013 18:42:31 +0000 (11:42 -0700)]

cnic: Eliminate local copy of pfid.

Use bp->pfid from bnx2x instead to avoid duplication.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 2 Sep 2013 18:42:30 +0000 (11:42 -0700)]

cnic: Eliminate CNIC_PORT macro and port_mode in local struct.

Use BP_PORT and chip_port_mode directly from bnx2x.h to avoid duplication.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 2 Sep 2013 18:42:29 +0000 (11:42 -0700)]

cnic: Redefine BNX2X_HW_CID using existing bnx2x macros

to avoid duplication of the same logic.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 2 Sep 2013 18:42:28 +0000 (11:42 -0700)]

cnic: Use CHIP_NUM macros from bnx2x.h

This eliminates duplication and ensures that all bnx2x chips will be
supported.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Veaceslav Falico [Mon, 2 Sep 2013 14:26:51 +0000 (16:26 +0200)]

net: correctly interlink lower/upper devices

Currently we're linking upper devices to lower ones, which results in
upside-down relationship: upper devices seeing lower devices via its upper
lists.

Fix this by correctly linking lower devices to the upper ones.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nicolas Dichtel [Mon, 2 Sep 2013 13:34:58 +0000 (15:34 +0200)]

tunnels: harmonize cleanup done on skb on rx path

The goal of this patch is to harmonize cleanup done on a skbuff on rx path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nicolas Dichtel [Mon, 2 Sep 2013 13:34:57 +0000 (15:34 +0200)]

tunnels: harmonize cleanup done on skb on xmit path

The goal of this patch is to harmonize cleanup done on a skbuff on xmit path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nicolas Dichtel [Mon, 2 Sep 2013 13:34:56 +0000 (15:34 +0200)]

skb: allow skb_scrub_packet() to be used by tunnels

This function was only used when a packet was sent to another netns. Now, it can
also be used after tunnel encapsulation or decapsulation.

Only skb_orphan() should not be done when a packet is not crossing netns.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nicolas Dichtel [Mon, 2 Sep 2013 13:34:55 +0000 (15:34 +0200)]

vxlan: remove net arg from vxlan[6]_xmit_skb()

This argument is not used, let's remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nicolas Dichtel [Mon, 2 Sep 2013 13:34:54 +0000 (15:34 +0200)]

iptunnels: remove net arg from iptunnel_xmit()

This argument is not used, let's remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

nikolay@redhat.com [Mon, 2 Sep 2013 11:51:42 +0000 (13:51 +0200)]

bonding: drop read_lock in bond_compute_features

bond_compute_features is always called with RTNL held, so we can safely
drop the read bond->lock.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

nikolay@redhat.com [Mon, 2 Sep 2013 11:51:41 +0000 (13:51 +0200)]

bonding: drop read_lock in bond_fix_features

We're protected by RTNL so nothing can happen and we can safely drop the
read bond->lock.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

nikolay@redhat.com [Mon, 2 Sep 2013 11:51:40 +0000 (13:51 +0200)]

bonding: simplify bond_3ad_update_lacp_rate and use RTNL for sync

We can drop the use of bond->lock for mutual exclusion in
bond_3ad_update_lacp_rate and use RTNL in the sysfs store function
instead. This way we'll prevent races with mode change and interface
up/down as well as simplify update_lacp_rate by removing the check for
port->slave because it'll always be initialized (done while enslaving
with RTNL). This change will also help in the future removal of reader
bond->lock from bond_enslave.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

nikolay@redhat.com [Mon, 2 Sep 2013 11:51:39 +0000 (13:51 +0200)]

bonding: trivial: remove outdated comment and braces

We don't have to release all slaves when closing the bond dev, so remove
the outdated comment and the braces around the left single statement.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

nikolay@redhat.com [Mon, 2 Sep 2013 11:51:38 +0000 (13:51 +0200)]

bonding: simplify and fix peer notification

This patch aims to remove a use of the bond->lock for mutual exclusion
which will later allow easier migration to RCU of the users of this
functionality. We use RTNL as a synchronizing mechanism since it's
always held when send_peer_notif is set, and when it is decremented from
the notifier function. We can also drop some locking, and fix the
leakage of the send_peer_notif counter.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Mon, 2 Sep 2013 08:41:01 +0000 (16:41 +0800)]

vhost_net: correctly limit the max pending buffers

As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.

So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Mon, 2 Sep 2013 08:41:00 +0000 (16:41 +0800)]

vhost_net: poll vhost queue after marking DMA is done

We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always polling the vhost thread before DMA is done.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Mon, 2 Sep 2013 08:40:59 +0000 (16:40 +0800)]

vhost_net: determine whether or not to use zerocopy at one time

Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determining zerocopy once by checking all
conditions at one time before.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Mon, 2 Sep 2013 08:40:58 +0000 (16:40 +0800)]

vhost: switch to use vhost_add_used_n()

Let vhost_add_used() to use vhost_add_used_n() to reduce the code
duplication. To avoid the overhead brought by __copy_to_user(). We will use
put_user() when one used need to be added.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Mon, 2 Sep 2013 08:40:57 +0000 (16:40 +0800)]

vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So switch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much less times of used index
updating and memory barriers.

2% performance improvement were seen on netperf TCP_RR test.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Mon, 2 Sep 2013 08:40:56 +0000 (16:40 +0800)]

vhost_net: make vhost_zerocopy_signal_used() return void

None of its caller use its return value, so let it return void.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jingoo Han [Mon, 2 Sep 2013 08:12:41 +0000 (17:12 +0900)]

net: sunhme: use pci_{get,set}_drvdata()

Use the wrapper functions for getting and setting the driver data
using pci_dev instead of using dev_{get,set}_drvdata() with
&pdev->dev, so we can directly pass a struct pci_dev. This is
a purely cosmetic change.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jingoo Han [Mon, 2 Sep 2013 08:11:53 +0000 (17:11 +0900)]

net: tulip: use pci_{get,set}_drvdata()

Use the wrapper functions for getting and setting the driver data
using pci_dev instead of using dev_{get,set}_drvdata() with
&pdev->dev, so we can directly pass a struct pci_dev. This is
a purely cosmetic change.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jingoo Han [Mon, 2 Sep 2013 08:10:09 +0000 (17:10 +0900)]

net: mdio-octeon: use platform_{get,set}_drvdata()

Use the wrapper functions for getting and setting the driver data
using platform_device instead of using dev_{get,set}_drvdata()
with &pdev->dev, so we can directly pass a struct platform_device.
This is a purely cosmetic change.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jingoo Han [Mon, 2 Sep 2013 08:08:44 +0000 (17:08 +0900)]

net: sunhme: use platform_{get,set}_drvdata()

Use the wrapper functions for getting and setting the driver data
using platform_device instead of using dev_{get,set}_drvdata()
with &pdev->dev, so we can directly pass a struct platform_device.
This is a purely cosmetic change.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jingoo Han [Mon, 2 Sep 2013 08:06:52 +0000 (17:06 +0900)]

net: emac: use platform_{get,set}_drvdata()

Use the wrapper functions for getting and setting the driver data
using platform_device instead of using dev_{get,set}_drvdata()
with &pdev->dev, so we can directly pass a struct platform_device.
This is a purely cosmetic change.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Joe Perches [Sun, 1 Sep 2013 22:48:27 +0000 (15:48 -0700)]

wireless: scan: Remove comment to compare_ether_addr

This function is being removed, so remove the reference to it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Joe Perches [Sun, 1 Sep 2013 22:45:08 +0000 (15:45 -0700)]

batman: Remove reference to compare_ether_addr

This function is being removed, rename the reference.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Joe Perches [Sun, 1 Sep 2013 20:11:55 +0000 (13:11 -0700)]

llc: Use normal etherdevice.h tests

Convert the llc_<foo> static inlines to the
equivalents from etherdevice.h and remove
the llc_<foo> static inline functions.

llc_mac_null -> is_zero_ether_addr
llc_mac_multicast -> is_multicast_ether_addr
llc_mac_match -> ether_addr_equal

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Joe Perches [Sun, 1 Sep 2013 18:51:23 +0000 (11:51 -0700)]

drivers/net: Convert uses of compare_ether_addr to ether_addr_equal

Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.

Done via cocci script: (and a little typing)

$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)

@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)

@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)

@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)

@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)

@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)

@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Liu Junliang [Sun, 1 Sep 2013 11:38:08 +0000 (19:38 +0800)]

USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support

Signed-off-by: Liu Junliang <liujunliang_ljl@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ajit Khaparde [Fri, 30 Aug 2013 20:01:16 +0000 (15:01 -0500)]

be2net: set and query VEB/VEPA mode of the PF interface

SkyHawk-R can support VEB or VEPA mode.
This patch will allow the user to set/query this switch setting.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Fri, 30 Aug 2013 14:36:14 +0000 (15:36 +0100)]

net: fix comment typo for __skb_alloc_pages()

The name of the function in the comment is __skb_alloc_page() while we
are actually commenting __skb_alloc_pages(). Fix this typo and make it
a valid kernel doc comment.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Claudiu Manoil [Fri, 30 Aug 2013 12:01:15 +0000 (15:01 +0300)]

gianfar: Fix reported number of sent bytes to BQL

Fix the amount of sent bytes reported to BQL by reporting the
number of bytes on wire in the xmit routine, and recording that
value for each skb in order to be correctly confirmed on Tx
confirmation cleanup.

Reporting skb->len to BQL just before exiting xmit is not correct
due to possible insertions of TOE block and alignment bytes in the
skb->data, which are being stripped off by the controller before
transmission on wire. This led to mismatch of (incorrectly)
reported bytes to BQL b/w xmit and Tx confirmation, resulting in
Tx timeout firing, for the h/w tx timestamping acceleration case.

There's no easy way to obtain the number of bytes on wire in the Tx
confirmation routine, so skb->cb is used to convey that information
from xmit to Tx confirmation, for now (as proposed by Eric). Revived
the currently unused GFAR_CB() construct for that purpose.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Aloni [Fri, 30 Aug 2013 10:59:46 +0000 (13:59 +0300)]

netconsole: avoid a crash with multiple sysfs writers

When my 'ifup eth' script was fired multiple times and ran concurrent on
my laptop, for some obscure /etc scripting reason, it was revealed
that the store_enabled() function in netconsole doesn't handle it nicely,
as recorded by the Oops below (a syslog paste, but not mangled too much
to prevent from discerning the traceback).

On Linux 3.10.4, this patch seeks to remedy the problem, and it has been
running stable on my laptop for a few days.

[52608.609325] BUG: unable to handle kernel NULL pointer dereference at 00000000000003e0
[52608.609331] IP: [<ffffffff81532a17>] __netpoll_cleanup+0x27/0xe0
[52608.609339] PGD 15e51a067 PUD 15433e067 PMD 0
[52608.609343] Oops: 0000 [#1] SMP re firewire_ohci firewire_core crc_itu_t [last unloaded: kvm_intel]
[52608.609347] Modules linked in: kvm_intel tun vfat fat ppdev parport_pc parport fuse ipt_MASQUERADE usb_storage nf_conntrack_netbios_ns nf_conn [..garbled..]
[52608.609433] RAX: 0000000000000000 RBX: ffff880210bbcc68 RCX: 0000000000000000
[52608.609435] RDX: 0000000000000000 RSI: ffff8801ba447da0 RDI: ffff880210bbcc68
[52608.609437] RBP: ffff8801ba447e18 R08: 0000000000000000 R09: 0000000000000001
[52608.609439] R10: 000000000000000a R11: f000000000000000 R12: ffff880210bbcc68
[52608.609441] R13: ffff88020bc41000 R14: 0000000000000002 R15: 000000000000000200000000000
[52608.609443] FS:  00007f38d7bff740(0000) GS:ffff88021dc40000(0000) knlGS:0000000000000000
[52608.609446] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003300000000001427e0
[52608.609448] CR2: 00000000000003e0 CR3: 0000000154103000 CR4: 00000000001427e0
[52608.609450] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[52608.609452] netpoll: netconsole: local port 6665ess 10.0.0.27
[52608.609454] netpoll: netconsole: local IPv4 address 10.0.0.27
[52608.609456] netpoll: netconsole: interface 'em1'
[52608.609457] netpoll: netconsole: remote port 514ress 10.0.0.15
[52608.609459] netpoll: netconsole: remote IPv4 address 10.0.0.15:65:a8:9a:c7
[52608.609461] netpoll: netconsole: remote ethernet address 1c:6f:65:a8:9a:c7
[52608.609463] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[52608.609464] Stack:801ba447e08 ffff880210bbcc68 ffffffffffffffea ffff88020bc41000
[52608.609466]  ffff8801ba447e08 ffff880210bbcc68 ffffffffffffffea ffff88020bc41000
[52608.609471]  0000000000000002 0000000000000002 ffff8801ba447e38 ffffffff81532af4
[52608.609475]  0000000000000000 ffff880210bbcc00 ffff8801ba447e78 ffffffff81420e7c
[52608.609479] Call Trace:
[52608.609484]  [<ffffffff81532af4>] netpoll_cleanup+0x24/0x50
[52608.609489]  [<ffffffff81420e7c>] store_enabled+0x5c/0xe0
[52608.609492]  [<ffffffff81420abe>] netconsole_target_attr_store+0x2e/0x40
[52608.609498]  [<ffffffff811ff2a2>] configfs_write_file+0xd2/0x130
[52608.609503]  [<ffffffff81188f95>] vfs_write+0xc5/0x1f0
[52608.609506]  [<ffffffff81189482>] SyS_write+0x52/0xa0/0x10
[52608.609511]  [<ffffffff81628c2e>] ? do_page_fault+0xe/0x10
[52608.609516]  [<ffffffff8162d402>] system_call_fastpath+0x16/0x1b
[52608.609517] Code: 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 30 4c 89 65 e0 48 89 5d d8 49 89 fc 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8 [..garbled..]
[52608.609559] RIP  [<ffffffff81532a17>] __netpoll_cleanup+0x27/0xe0
[52608.609563]  RSP <ffff8801ba447de8>
[52608.609564] CR2: 00000000000003e0
[52608.609567] ---[ end trace d25ec343349b61d2 ]---

Signed-off-by: Dan Aloni <alonid@postram.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kouei Abe [Fri, 30 Aug 2013 03:41:08 +0000 (12:41 +0900)]

sh_eth: Enable Rx descriptor word 0 shift for r8a7790

This corrects an oversight when r8a7790 support was added to sh_eth.

Signed-off-by: Kouei Abe <kouei.abe.cp@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kouei Abe [Fri, 30 Aug 2013 03:41:07 +0000 (12:41 +0900)]

sh_eth: Fix cache invalidation omission of receive buffer

Signed-off-by: Kouei Abe <kouei.abe.cp@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Veaceslav Falico [Thu, 29 Aug 2013 21:38:57 +0000 (23:38 +0200)]

bonding: use rlb_client_info->vlan_id instead of ->tag

Store VID in ->vlan_id (if any), and remove the useless ->tag.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Veaceslav Falico [Thu, 29 Aug 2013 21:38:56 +0000 (23:38 +0200)]

bonding: remove bond_vlan_used()

We're using it currently to verify if we have vlans before getting the tag
from the skb we're about to send. It's useless because the vlan_get_tag()
verifies if the skb has the tag (and returns an error if not), and we can
receive tagged skbs only if we *already* have vlans.

Plus, the current RCUed implementation is kind of useless anyway - the we
can remove the last vlan in the moment we return from the function.

So remove the only usage of it and the whole function.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Beck SC1x5 Kernel

RSS Atom