git.karo-electronics.de Git - linux-beck.git/log

netfilter: nf_nat: use local variable hdrlen

Use local variable hdrlen instead of ip_hdrlen(skb).

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ipvs: provide default ip_vs_conn_{in,out}_get_proto

This removes duplicate code by providing a default implementation
which is used by 3 of the 4 modules that provide these call.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ipvs: remove EXPERIMENTAL tag

IPVS was merged into the kernel quite a long time ago and
has been seeing wide-spread production use for even longer.

It seems appropriate for it to be no longer tagged as EXPERIMENTAL

Signed-off-as: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_conntrack_extend: introduce __nf_ct_ext_exist()

some users of nf_ct_ext_exist() know ct->ext isn't NULL. For these users, the
check for ct->ext isn't necessary, the function __nf_ct_ext_exist() can be
used instead.

the type of the return value of nf_ct_ext_exist() is changed to bool.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary

We currently disable BH for the whole duration of get_counters()

On machines with a lot of cpus and large tables, this might be too long.

We can disable preemption during the whole function, and disable BH only
while fetching counters for the current cpu.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: iptables: use skb->len for accounting

Use skb->len for accounting as xt_quota does.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: ip6tables: use skb->len for accounting

ipv6_hdr(skb)->payload_len is ZERO and can't be used for accounting, if
the payload is a Jumbo Payload specified in RFC2675.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

xt_quota: report initial quota value instead of current value to userspace

We should copy the initial value to userspace for iptables-save and
to allow removal of specific quota rules.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: xt_quota: use per-rule spin lock

Use per-rule spin lock to improve the scalability.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: arptables: use arp_hdr_len()

use arp_hdr_len().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_nat_core: merge the same lines

proto->unique_tuple() will be called finally, if the previous calls fail. This
patch checks the false condition of (range->flags &IP_NAT_RANGE_PROTO_RANDOM)
instead to avoid duplicate line of code: proto->unique_tuple().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: add xt_cpu match

In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.

With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)

Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.

Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.

Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
        -j REDIRECT --to-port 8080

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
        -j REDIRECT --to-port 8081

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
        -j REDIRECT --to-port 8082

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
        -j REDIRECT --to-port 8083

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

IPVS: make FTP work with full NAT support

Use nf_conntrack/nf_nat code to do the packet mangling and the TCP
sequence adjusting.  The function 'ip_vs_skb_replace' is now dead
code, so it is removed.

To SNAT FTP, use something like:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
    --vport 21 -j SNAT --to-source 192.168.10.10
and for the data connections in passive mode:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
    --vportctl 21 -j SNAT --to-source 192.168.10.10
using '-m state --state RELATED' would also works.

Make sure the kernel modules ip_vs_ftp, nf_conntrack_ftp, and
nf_nat_ftp are loaded.

[ up-port and minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

IPVS: make friends with nf_conntrack

Update the nf_conntrack tuple in reply direction, as we will see
traffic from the real server (RIP) to the client (CIP). Once this is
done we can use netfilters SNAT in POSTROUTING, especially with
xt_ipvs, to do source NAT, e.g.:

% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 80 \
-j SNAT --to-source 192.168.10.10

[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: xt_ipvs (netfilter matcher for IPVS)

This implements the kernel-space side of the netfilter matcher xt_ipvs.

[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
[ Patrick: added xt_ipvs.h to Kbuild ]
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: correct CHECKSUM header and export it

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: add CHECKSUM target

This adds a `CHECKSUM' target, which can be used in the iptables mangle
table.

You can use this target to compute and fill in the checksum in
a packet that lacks a checksum. This is particularly useful,
if you need to work around old applications such as dhcp clients,
that do not work well with checksum offloads, but don't want to
disable checksum offload in your device.

The problem happens in the field with virtualized applications.
For reference, see Red Hat bz 605555, as well as
http://www.spinics.net/lists/kvm/msg37660.html

Typical expected use (helps old dhclient binary running in a VM):
iptables -A POSTROUTING -t mangle -p udp --dport bootpc \
-j CHECKSUM --checksum-fill

Includes fixes by Jan Engelhardt <jengelh@medozas.de>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_ct_tcp: fix flow recovery with TCP window tracking enabled

This patch adds the missing bits to support the recovery of TCP flows
without disabling window tracking (aka be_liberal). To ensure a
successful recovery, we have to inject the window scale factor via
ctnetlink.

This patch has been tested with a development snapshot of conntrackd
and the new clause `TCPWindowTracking' that allows to perform strict
TCP window tracking recovery across fail-overs.

With this patch, we don't update the receiver's window until it's not
initiated. We require this to perform a successful recovery. Jozsef
confirmed in a private email that this spotted a real issue since that
should not happen.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

nfnetlink_log: do not expose NFULNL_COPY_DISABLED to user-space

This patch moves NFULNL_COPY_PACKET definition from
linux/netfilter/nfnetlink_log.h to net/netfilter/nfnetlink_log.h
since this copy mode is only for internal use.

I have also changed the value from 0x03 to 0xff. Thus, we avoid
a gap from user-space that may confuse users if we add new
copy modes in the future.

This change was introduced in:
http://www.spinics.net/lists/netfilter-devel/msg13535.html

Since this change is not included in any stable Linux kernel,
I think it's safe to make this change now. Anyway, this copy
mode does not make any sense from user-space, so this patch
should not break any existing setup.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: xt_TPROXY: the length of lines should be within 80

According to the Documentation/CodingStyle, the length of lines should
be within 80.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ipvs: lvs sctp protocol handler is incorrectly invoked ip_vs_app_pkt_out

lvs sctp protocol handler is incorrectly invoked ip_vs_app_pkt_out
Since there's no sctp helpers at present, it does the same thing as
ip_vs_app_pkt_in.

Signed-off-by: Xiaoyu Du <tingsrain@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ipvs: Kconfig cleanup

IP_VS_PROTO_AH_ESP should be set iff either of IP_VS_PROTO_{AH,ESP} is
selected. Express this with standard kconfig syntax.

Signed-off-by: Michal Marek <mmarek@suse.cz>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: ipt_REJECT: avoid touching dst ref

We can avoid a pair of atomic ops in ipt_REJECT send_reset()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: ipt_REJECT: postpone the checksum calculation.

postpone the checksum calculation, then if the output NIC supports checksum
offloading, we can utlize it. And though the output NIC doesn't support
checksum offloading, but we'll mangle this packet, this can free us from
updating the checksum, as the checksum calculation occurs later.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_conntrack_reasm: add fast path for in-order fragments

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

IB/{nes, ipoib}: Pass supported flags to ethtool_op_set_flags()

Following commit 1437ce3983bcbc0447a0dedcd644c14fe833d266 "ethtool:
Change ethtool_op_set_flags to validate flags", ethtool_op_set_flags
takes a third parameter and cannot be used directly as an
implementation of ethtool_ops::set_flags.

Changes nes and ipoib driver to pass in the appropriate value.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Update version to 2.0.16.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Dump some config space registers during TX timeout.

These config register values will be useful when the memory registers
are returning 0xffffffff which has been reported.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Add support for skb->rxhash.

Add skb->rxhash support for TCP packets only because the bnx2 RSS hash
does not hash UDP ports.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Always enable MSI-X on 5709.

Minor change to use MSI-X even if there is only one CPU. This allows
the CNIC driver to always have a dedicated MSI-X vector to handle
iSCSI events, instead of sharing the MSI vector.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevice.h: Change netif_<level> macros to call netdev_<level> functions

Reduces text ~300 bytes of text (woohoo!) in an x86 defconfig

$ size vmlinux*
text data bss dec hex filename
7198526 720112 1366288 9284926 8dad3e vmlinux
7198862 720112 1366288 9285262 8dae8e vmlinux.netdev

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevice.h net/core/dev.c: Convert netdev_<level> logging macros to functions

Reduces an x86 defconfig text and data ~2k.
text is smaller, data is larger.

$ size vmlinux*
text data bss dec hex filename
7198862 720112 1366288 9285262 8dae8e vmlinux
7205273 716016 1366288 9287577 8db799 vmlinux.device_h

Uses %pV and struct va_format
Format arguments are verified before printk

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

device.h drivers/base/core.c Convert dev_<level> logging macros to functions

Reduces an x86 defconfig text and data ~55k, .6% smaller.

$ size vmlinux*
text data bss dec hex filename
7205273 716016 1366288 9287577 8db799 vmlinux
7258890 719768 1366288 9344946 8e97b2 vmlinux.master

Uses %pV and struct va_format
Format arguments are verified before printk

The dev_info macro is converted to _dev_info because there are
existing uses of variables named dev_info in the kernel tree
like drivers/net/pcmcia/pcnet_cs.c

A dev_info macro is created to call _dev_info

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

vsprintf: Recursive vsnprintf: Add "%pV", struct va_format

Add the ability to print a format and va_list from a structure pointer

Allows __dev_printk to be implemented as a single printk while
minimizing string space duplication.

%pV should not be used without some mechanism to verify the
format and argument use ala __attribute__(format (printf(...))).

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6

bridge: add per bridge device controls for invoking iptables

Support more fine grained control of bridge netfilter iptables invocation
by adding seperate brnf_call_*tables parameters for each device using the
sysfs interface. Packets are passed to layer 3 netfilter when either the
global parameter or the per bridge parameter is enabled.

Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ixgbe: use NETIF_F_LRO

Both ETH_FLAG_LRO and NETIF_F_LRO have the same value, but NETIF_F_LRO
is intended to use with netdev->features.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Add comment

Add explanatory comment to avoid confusion when a pointer is set
to the second word of an array instead of the customary cast of a
pointer to the beginning of the array.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: correct link test not being run when link is down

The igb online link test was always reporting pass because instead of
checking for if_running it was checking for netif_carrier_ok.

This change corrects the test so that it is run if the interface is running
instead of checking for netif carrier ok.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Fix Tx hangs seen when loading igb with max_vfs > 7.

Check the value of max_vfs at the time of assignment of vfs_allocated_count.

The previous check in igb_probe_vfs was too late as by that time the rx/tx
rings were initialized with the wrong offset.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Use only a single Tx queue in SR-IOV mode

The 82576 expects the second rx queue in any pool to receive L2 switch
loop back packets sent from the second tx queue in another pool. The
82576 VF driver does not enable the second rx queue so if the PF driver
sends packets destined to a VF from its second tx queue then the VF
driver will never see them. In SR-IOV mode limit the number of tx queues
used by the PF driver to one. This patch fixes a bug reported in which
the PF cannot communciate with the VF and should be considered for 2.6.34
stable.

CC: stable@kernel.org
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: fix PHY config access on 82580

82580 NICs can have up to 4 functions. This fixes phy accesses
to use the correct locks for functions 2 and 3.

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86: Drop CONFIG_MCORE2 check around setting of NET_IP_ALIGN

This patch removes the CONFIG_MCORE2 check from around NET_IP_ALIGN. It is
based on a suggestion from Andi Kleen. The assumption is that there are
not any x86 cores where unaligned access is really slow, and this change
would allow for a performance improvement to still exist on configurations
that are not necessarily optimized for Core 2.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ll_temac: add error checking to DMA init path

Add error checking to DMA descriptor rings initialization code.

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: changes to properly provide phy details

be2net driver is currently not showing correct phy details in certain cases.
This patch fixes it.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Allocate stats buffer with GFP_KERNEL

Since ehea_get_stats calls ehea_h_query_ehea_port, which
can sleep, we can also sleep when allocating a page in
this function. This fixes some memory allocation failure
warnings seen under low memory conditions.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6

Conflicts:
drivers/net/wireless/libertas/host.h

x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch

x86 architectures can handle unaligned accesses in hardware, and it has
been shown that unaligned DMA accesses can be expensive on Nehalem
architectures. As such we should overwrite NET_IP_ALIGN to resolve
this issue.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: add 1g PHY support for 82599

Add support for 1G SFP+ PHY's to 82599.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Add support for RX flow hash control

Allow ethtool to query the number of RX rings, the fields used in RX
flow hashing and the hash indirection table.

Allow ethtool to update the RX flow hash indirection table.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Add support for control of RX flow hash indirection

Many NICs use an indirection table to map an RX flow hash value to one
of an arbitrary number of queues (not necessarily a power of 2). It
can be useful to remove some queues from this indirection table so
that they are only used for flows that are specifically filtered
there. It may also be useful to weight the mapping to account for
user processes with the same CPU-affinity as the RX interrupts.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vmxnet3: Remove incorrect implementation of ethtool_ops::get_flags()

Only some netdev feature flags correspond directly to ethtool feature
flags. ethtool_op_get_flags() does the right thing.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev: Make ethtool_ops::set_flags() return -EINVAL for unsupported flags

The documented error code for attempts to set unsupported flags (or
to clear flags that cannot be disabled) is EINVAL, not EOPNOTSUPP.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Change ethtool_op_set_flags to validate flags

ethtool_op_set_flags() does not check for unsupported flags, and has
no way of doing so. This means it is not suitable for use as a
default implementation of ethtool_ops::set_flags.

Add a 'supported' parameter specifying the flags that the driver and
hardware support, validate the requested flags against this, and
change all current callers to pass this parameter.

Change some other trivial implementations of ethtool_ops::set_flags to
call ethtool_op_set_flags().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Use correct shift factor for extracting the SGE DMA Ingress Padding Boundary

Use correct shift factor for extracting the SGE DMA Ingress Padding
Boundary. Was accidentally using the register field's shift which was
close enough (4 instead of the propper value of 5) that it actually
sort of worked for various packet sizes ...

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Remove obsolete comment about the lack of a TX Timer Callback

Remove obsolete comment about the lack of a TX Timer Callback -- which
we now _do_ have ...

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fragment: add fast path for in-order fragments

add fast path for in-order fragments

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/net/inet_frag.h |    1 +
net/ipv4/ip_fragment.c  |   12 ++++++++++++
net/ipv6/reassembly.c   |   11 +++++++++++
3 files changed, 24 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>

snmp: 64bit ipstats_mib for all arches

/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
InOctets: 244068329096
OutOctets: 244069348848

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: memory barrier fixes on IBM p7 platform

The ibm p7 architecure seems to reorder memory accesses more
aggressively than previous ppc64 architectures. This requires memory
barriers to ensure that rx/tx doorbells are pressed only after
memory to be DMAed is written.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cpmac: use resource_size()

The original code is off by one because we should start counting at
zero. So the size of the resource is end - start + 1. I switched it to
use resource_size() to do the calculation.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

act_nat: use stack variable

act_nat: use stack variable

structure tc_nat isn't too big for stack, so we can put it in stack.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/sched/act_nat.c | 31 ++++++++++---------------------
1 file changed, 10 insertions(+), 21 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

act_mirred: combine duplicate code

act_mirred: combine duplicate code

tcf_bstats is updated in any way, so we can do it earlier to reduce the size of
the code.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
----
net/sched/act_mirred.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

net/neighbour.h: fix typo

'Shoul' must be 'should'.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ath9k_htc: Add LED support for AR7010

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mac82011: Allow selection of minstrel_ht as default rc algorithm

Allow selection of minstrel_ht as default rate control algorithm. At
the moment minstrel_ht can only be requested by the driver code but
not selected as default in make menuconfig. Fix this by using
minstrel_ht when minstrel was selected as default and minstrel_ht
is available.

This change won't affect legacy devices as minstrel_ht falls back to
minstrel in that case.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ath9k: fix TSF after reset on AR913x

When issuing a reset, the TSF value is lost in the hardware because of
the 913x specific cold reset. As with some AR9280 cards, the TSF needs
to be preserved in software here.

Additionally, there's an issue that frequently prevents a successful
TSF write directly after the chip reset. In this case, repeating the
TSF write after the initval-writes usually works.

This patch detects failed TSF writes and recovers from them, taking
into account the delay caused by the initval writes.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Reported-by: Björn Smedman <bjorn.smedman@venatech.se>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix compile warning when debug disabled

CC [M] drivers/net/wireless/rt2x00/rt2800lib.o
drivers/net/wireless/rt2x00/rt2800lib.c: In function 'rt2800_ampdu_action':
drivers/net/wireless/rt2x00/rt2800lib.c:2821: warning: unused variable 'rt2x00dev'

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Always set BBP_CSR_CFG_BBP_RW_MODE to 1

Latest rt2870 legacy driver also sets BBP_CSR_CFG_BBP_RW_MODE to 1
when reading or writing the EEPROM. This means we can make the
BBP reading and writing completely equal on all platforms.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix antenna initialization

Legacy driver indicates that BBP1_TX_ANTENNA must be set
to 0 for TXPATH values of 1 and 3. So the previous statement
that nothing should be done for TXPATH = 3, is false.

Furthermore, remove the false BBP3_RX_ANTENNA initialization
when TXPATH is 1 for PCI and SOC devices. This field will always
be overridden in the next switch statement, making this initialization
bogus. History of this line indicates it was there from the beginning,
and was once caught as typo. Instead of replacing the line with the
correct line, the correct line was added...

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix IEEE80211_HT_CAP_RX_STBC assignment

IEEE80211_HT_CAP_RX_STBC is a 2 bit flag, and should thus
never be set as normal flag. Instead we must read the number
of RX paths from the EEPROM and set the IEEE80211_HT_CAP_RX_STBC
with the correct value (using the same logic as the number of TX
streams).

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: fix beacon reset on rt2800

When an interface is removed the according beacon entry should be reset.
The current approach to only clear the first word is not enough to stop
the device from sending out the beacon, hence resulting in beacons being
sent out for already removed interfaces.

Fix this by invalidating the entire TXWI in front of the beacon instead
of only the first word.

Also clear all beacons during startup in the same way.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Disable link tuning in AP mode

Since the link tuning is based on average RSSI values taken from all received
frames it doesn't make sense to enable it in AP mode where every associated
station provides independent RSSI values. Furthermore the legacy drivers
don't enable link tuning in AP mode as well.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix beacon updates in rt61pci

Fix rt61pci beacon updates in the same way as rt2800pci. rt61pci didn't
update the beacon template after each beacon interval, resulting in the
DTIM count being incorrect (if DTIM period > 1). Fix this by calling
rt2x00lib_beacondone after the current beacon was sent out.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix beacon updates in rt2800pci

rt2800pci didn't update the beacon template after each beacon interval,
resulting in the DTIM count being incorrect (if DTIM period > 1). Fix this
by calling rt2x00lib_beacondone after the current beacon was sent out.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Enable multiBSS in rt2800

MAC_BSSID_DW1_BSS_ID_MASK must be set to the mask 3, to
enable 8 BSSID's. The MAC_BSSID_DW1_BSS_BCN_NUM is initialized
to 7 to enable the 8 beacons.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Tested-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Align rt2800 EEPROM validation to Ralink vendor driver.

Align with the latest versions of the Ralink legacy driver(s).

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Correctly detect 93C86 EEPROMs in rt2800pci.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

eeprom_93cx6: Add support for 93c86 EEPROMs.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Split of TXWI writing to write_tx_data callback in rt2800usb.

Align with the way PCI devices are handled, even though it is not
strictly necessary.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Rename driver write_tx_datadesc callback function.

Now that the {usb,pci} specific write_tx_data functions are no longer
present we can rename the write_tx_datadesc callback function back to
its old name.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Move common txdone handling to rt2x00lib_txdone.

Now that the write_tx_data functions are merged, also merge the relevant
parts of the txdone handling into common code, rather than {usb,pci}
specific code.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Merge PCI and USB versions of write_tx_data into single function.

Now that rt2x00pci_write_tx_data and rt2x00usb_write_tx_data are similar
we can merge them in a single function in rt2x00queue.c.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Move filling of TX URB to rt2x00usb_kick_tx_entry function.

There is no need to fill the TX URB this early, and moving it to the
rt2x00usb_kick_tx_entry function allows us to merge the PCI and USB
variants of the write_tx_data function.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Fix frame dumping for USB devices.

We forgot to clear the SKBDESC_DESC_IN_SKB when the descriptor was removed
from the front of the skb.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Remove unneeded variable

The update_bssid is set only when BSS_CHANGED_BSSID is used,
but the check if that field is true is done later in the function
but also only when BSS_CHANGED_BSSID is set. This makes the
variable useless, as it can never result in a negative check.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Write the BSSID to register when interface is added

For the Master mode case, we initialized the BSSID as the MAC
address, but never wrote it into the registers. This causes
Hardware crypto to break in Master mode when receiving frames
which require the BSSID to be filled in.

This is safe for STA mode since the BSSID will be initialized
to 00:00:00:00:00 at this point, but will be set to the correct
value later when the device associates.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rt2x00: Implement tx mpdu aggregation

In order to implement tx mpdu aggregation we only have to implement
the ampdu_action callback such that mac80211 allows negotiation of
blockack sessions.

The hardware will handle everything on its own as long as the ampdu
flag in the TXWI struct is set up correctly and we translate the tx
status correctly.

For now, refuse requests to start rx aggregation.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

gianfar: Implement workaround for eTSEC-A002 erratum

MPC8313ECE says:

"If the controller receives a 1- or 2-byte frame (such as an illegal
runt packet or a packet with RX_ER asserted) before GRS is asserted
and does not receive any other frames, the controller may fail to set
GRSC even when the receive logic is completely idle. Any subsequent
receive frame that is larger than two bytes will reset the state so
the graceful stop can complete. A MAC receiver (Rx) reset will also
reset the state."

This patch implements the proposed workaround:

"If IEVENT[GRSC] is still not set after the timeout, read the eTSEC
register at offset 0xD1C. If bits 7-14 are the same as bits 23-30,
the eTSEC Rx is assumed to be idle and the Rx can be safely reset.
If the register fields are not equal, wait for another timeout
period and check again."

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Implement workaround for eTSEC76 erratum

MPC8313ECE says:

"For TOE=1 huge or jumbo frames, the data required to generate the
checksum may exceed the 2500-byte threshold beyond which the controller
constrains itself to one memory fetch every 256 eTSEC system clocks.

This throttling threshold is supposed to trigger only when the
controller has sufficient data to keep transmit active for the duration
of the memory fetches. The state machine handling this threshold,
however, fails to take large TOE frames into account. As a result,
TOE=1 frames larger than 2500 bytes often see excess delays before start
of transmission."

This patch implements the workaround as suggested by the errata
document, i.e.:

"Limit TOE=1 frames to less than 2500 bytes to avoid excess delays due to
memory throttling.
When using packets larger than 2700 bytes, it is recommended to turn TOE
off."

To be sure, we limit the TOE frames to 2500 bytes, and do software
checksumming instead.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Implement workaround for eTSEC74 erratum

MPC8313ECE says:

"If MACCFG2[Huge Frame]=0 and the Ethernet controller receives frames
which are larger than MAXFRM, the controller truncates the frames to
length MAXFRM and marks RxBD[TR]=1 to indicate the error. The controller
also erroneously marks RxBD[TR]=1 if the received frame length is MAXFRM
or MAXFRM-1, even though those frames are not truncated.
No truncation or truncation error occurs if MACCFG2[Huge Frame]=1."

There are two options to workaround the issue:

"1. Set MACCFG2[Huge Frame]=1, so no truncation occurs for invalid large
frames. Software can determine if a frame is larger than MAXFRM by
reading RxBD[LG] or RxBD[Data Length].

2. Set MAXFRM to 1538 (0x602) instead of the default 1536 (0x600), so
normal-length frames are not marked as truncated. Software can examine
RxBD[Data Length] to determine if the frame was larger than MAXFRM-2."

This patch implements the first workaround option by setting HUGEFRAME
bit, and gfar_clean_rx_ring() already checks the RxBD[Data Length].

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/core: use ntohs for skb->protocol

This is only noticed by people that are not doing everything correct in
the first place.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Use interface max_desync_factor instead of static default

max_desync_factor can be configured per-interface, but nothing is
using the value.

Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Clamp reported valid_lft to a minimum of 0

Since addresses are only revalidated every 2 minutes, the reported
valid_lft can underflow shortly before the address is deleted.
Clamp it to a minimum of 0, as for prefered_lft.

Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

usb: pegasus: fixed coding style issues

Fixed brace, static initialization, comment, whitespace and spacing
coding style issues.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c59x: Use fine-grained locks for MII and windowed register access

This avoids scheduling in atomic context and also means that IRQs
will only be deferred for relatively short periods of time.

Previously discussed in:
http://article.gmane.org/gmane.linux.network/155024

Reported-by: Arne Nordmark <nordmark@mech.kth.se>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: disable EEE support by default

Based on community feedback, EEE should be disabled by default until the
IEEE802.3az specification has been finalized.

Cc: bhutchings@solarflare.com
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: remove EEE module parameter

As requested by Dave Miller. A follow-on set of patches will allow for
ethtool to enable/disable the feature instead.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: suppress compile warnings on certain archs

Commit 84f4ee902ad3ee964b7b3a13d5b7cf9c086e9916 causes compile warnings on
architectures that have unsigned long long's that are not 64-bit, e.g.
ia64.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: don't inadvertently re-set INTX_DISABLE

Should e1000_test_msi() fail to see an msi interrupt, it attempts to
fallback to legacy INTx interrupts. But an error in the code may prevent
this from happening correctly.

Before calling e1000_test_msi_interrupt(), e1000_test_msi() disables SERR
by clearing the SERR bit from the just read PCI_COMMAND bits as it writes
them back out.

Upon return from calling e1000_test_msi_interrupt(), it re-enables SERR
by writing out the version of PCI_COMMAND it had previously read.

The problem with this is that e1000_test_msi_interrupt() calls
pci_disable_msi(), which eventually ends up in pci_intx(). And because
pci_intx() was called with enable set to 1, the INTX_DISABLE bit gets
cleared from PCI_COMMAND, which is what we want. But when we get back to
e1000_test_msi(), the INTX_DISABLE bit gets inadvertently re-set because
of the attempt by e1000_test_msi() to re-enable SERR.

The solution is to have e1000_test_msi() re-read the PCI_COMMAND bits as
part of its attempt to re-enable SERR.

During debugging/testing of this issue I found that not all the systems
I ran on had the SERR bit set to begin with. And on some of the systems
the same could be said for the INTX_DISABLE bit. Needless to say these
latter systems didn't have a problem falling back to legacy INTx
interrupts with the code as is.

Signed-off-by: Dean Nelson <dnelson@redhat.com>
CC: stable@kernel.org
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/Makefile: conditionally descend to wireless

Don't descend to wireless unless it is actually used.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: David S. Miller <davem@davemloft.net>