git.karo-electronics.de Git - karo-tx-linux.git/log

macvtap: zerocopy: fix offset calculation when building skb

This patch fixes the offset calculation when building skb:

- offset1 were used as skb data offset not vector offset
- reset offset to zero only when we advance to next vector

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio/tools: add delayed interupt mode

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

netem: add ECN capability

Add ECN (Explicit Congestion Notification) marking capability to netem

tc qdisc add dev eth0 root netem drop 0.5 ecn

Instead of dropping packets, try to ECN mark them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skb_peek()/skb_peek_tail() cleanups

remove useless casts and rename variables for less confusion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add a prefetch in socket backlog processing

TCP or UDP stacks have big enough latencies that prefetching next
pointer is worth it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6

The netlink API lets users create unmanaged L2TPv3 tunnels using
iproute2. Until now, a request to create an unmanaged L2TPv3 IP
encapsulation tunnel over IPv6 would be rejected with
EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP
encapsulation over IPv6, we can add support for that tunnel type.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: introduce L2TPv3 IP encapsulation support for IPv6

L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.

The implementation is derived from ipv6/raw and ipv4/l2tp_ip.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Export ipv6 functions for use by other protocols

For implementing other protocols on top of IPv6, such as L2TPv3's IP
encapsulation over ipv6, we'd like to call some IPv6 functions which
are not currently exported. This patch exports them.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels

This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: show IPv6 addresses in l2tp debugfs file

If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the
IPv6 address correctly.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: pppol2tp_connect() handles ipv6 sockaddr variants

Userspace uses connect() to associate a pppol2tp socket with a tunnel
socket. This needs to allow the caller to supply the new IPv6
sockaddr_pppol2tp structures if IPv6 is used.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pppox: Replace __attribute__((packed)) in if_pppox.h

Checkpatch warns about the use of __attribute__((packed)). So use the
recommended __packed syntax instead.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove unused stats from l2tp_ip socket

The l2tp_ip socket currently maintains packet/byte stats in its
private socket structure. But these counters aren't exposed to
userspace and so serve no purpose. The counters were also
smp-unsafe. So this patch just gets rid of the stats.

While here, change a couple of internal __u32 variables to u32.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: Use ip4_datagram_connect() in l2tp_ip_connect()

Cleanup the l2tp_ip code to make use of an existing ipv4 support function.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: fix locking of 64-bit counters for smp

L2TP uses 64-bit counters but since these are not updated atomically,
we need to make them safe for smp. This patch addresses that.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: remove PHY polling from atl1c_change_mtu

PHY polling code for FPGA is considered in every MDIO R/W API.
no need to add additional code to atl1c_change_mtu.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: David Liu <dwliu@qca.qaulcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: Disable L0S when no cable link

L0S might be unstable if no cable link, only enable it when link up.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: do MAC-reset when PHY link down

There may be tx-skbs still pending in HW when PHY link down.
Reset MAC will make the DMA engine go to the start point.
and release all pending skbs.
Note: Reset MAC will clear any interrupt status and mask.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: cancel task when interface closed

common_task might be running while close routine is called,
wait/cancel it.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: enlarge L1 response waiting timer

The hardware incorrectly process L0S/L1 entrance if the chipset/root
response after specific/shorter timer and cause system hang.
Enlarge the timeout value to avoid this issue.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: refine mac address related code

On some platform with EEPROM/OTP existing, the BIOS could overwrite
a new MAC address for the NIC. so, the permanent mac address should
be from BIOS. the address is restored when driver removing.
Voltage raising isn't applicable for l1d.
Replace swab32 with htonl for big/little endian platform.
related Registers are refined as well.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: remove code of closing register writable attribution

The Close-action is done by atl1c_reset_pcie, remove it from
atl1c_get_permanent_address.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: clear WoL status when reset pcie

WoL status is read-clear and should be cleared when in S0
status.
putting it in atl1c_reset_pcie is more suitable than
in atl1c_get_permanent_address.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: add PHY link event(up/down) patch

On some platforms the PHY settings need to change depending on the
cable link status to get better stability.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: add workaround for issue of bit INTX-disable for MSI interrupt

All supported devices have one issue that msi interrupt doesn't assert
if pci command register bit (PCI_COMMAND_INTX_DISABLE) is set.
Add workaround in drivers/pci/quirks.c

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

bnx2x: remove some bloat

Before doing skb->head_frag work on bnx2x driver, I found too much stuff
was inlined in bnx2x/bnx2x_cmn.h for no good reason and made my work not
very easy.

Move some big functions out of this include file to the respective .c
file.

A lot of inline keywords are not needed at all in this huge driver.

   text    data     bss     dec     hex filename
490083    1270      56 491409   77f91 bnx2x/bnx2x.ko.before
484206    1270      56 485532   7689c bnx2x/bnx2x.ko

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: reprogram multicast address register on reset

The reset logic after a Rx FIFO overrun will clear the programmed
multicast addresses. This patch fixes the issue by reprogramming the
registers after the reset.

The commit eefc48b ("pch_gbe: reprogram multicast address register on
reset") tried to fix this problem, but it introduces unnecessary
codes. In fact, all multicast addresses have been saved in netdev->mc,
So we can call pch_gbe_set_multi() directly after reset_hw and
reset_rx.

This commit kills 50+ line codes

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Takahiro Shimizu <tshimizu818@gmail.com>
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: makes skb_splice_bits() aware of skb->head_frag

__skb_splice_bits() can check if skb to be spliced has its skb->head
mapped to a page fragment, instead of a kmalloc() area.

If so we can avoid a copy of the skb head and get a reference on
underlying page.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: makes tcp_try_coalesce aware of skb->head_frag

TCP coalesce can check if skb to be merged has its skb->head mapped to a
page fragment, instead of a kmalloc() area.

We had to disable coalescing in this case, for performance reasons.

We 'upgrade' skb->head as a fragment in itself.

This reduces number of cache misses when user makes its copies, since a
less sk_buff are fetched.

This makes receive and ofo queues shorter and thus reduce cache line
misses in TCP stack.

This is a followup of patch "net: allow skb->head to be a page fragment"

Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce"
counter being incremented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make GRO aware of skb->head_frag

GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.

We 'upgrade' skb->head as a fragment in itself

This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.

This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.

This is a followup of patch "net: allow skb->head to be a page fragment"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: provide frags as skb head

This patch converts tg3 driver, one of our reference drivers, to use new
build_skb() api in frag mode.

Instead of using kmalloc() to allocate the memory block that will be
used by build_skb() as skb->head, we use a page fragment.

This is a followup of patch "net: allow skb->head to be a page fragment"

This allows GRO, TCP coalescing, and splice() to be more efficient.

Incidentally, this also removes SLUB slow path contention in kfree()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow skb->head to be a page fragment

skb->head is currently allocated from kmalloc(). This is convenient but
has the drawback the data cannot be converted to a page fragment if
needed.

We have three spots were it hurts :

1) GRO aggregation

When a linear skb must be appended to another skb, GRO uses the
frag_list fallback, very inefficient since we keep all struct sk_buff
around. So drivers enabling GRO but delivering linear skbs to network
stack aren't enabling full GRO power.

2) splice(socket -> pipe).

We must copy the linear part to a page fragment.
This kind of defeats splice() purpose (zero copy claim)

3) TCP coalescing.

Recently introduced, this permits to group several contiguous segments
into a single skb. This shortens queue lengths and save kernel memory,
and greatly reduce probabilities of TCP collapses. This coalescing
doesnt work on linear skbs (or we would need to copy data, this would be
too slow)

Given all these issues, the following patch introduces the possibility
of having skb->head be a fragment in itself. We use a new skb flag,
skb->head_frag to carry this information.

build_skb() is changed to accept a frag_size argument. Drivers willing
to provide a page fragment instead of kmalloc() data will set a non zero
value, set to the fragment size.

Then, on situations we need to convert the skb head to a frag in itself,
we can check if skb->head_frag is set and avoid the copies or various
fallbacks we have.

This means drivers currently using frags could be updated to avoid the
current skb->head allocation and reduce their memory footprint (aka skb
truesize). (thats 512 or 1024 bytes saved per skb). This also makes
bpf/netfilter faster since the 'first frag' will be part of skb linear
part, no need to copy data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: add transmit timestamping support

Insert an skb_tx_timestamp call in both ndo_start_xmit routines
Tested to work for the nv_start_xmit_optimized case

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: add transmit timestamping support

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: add transmit timestamping support

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: add transmit timestamping support

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: compress out gratuitous extra carriage returns

Some of the comment blocks are floating in limbo between two
functions, or between blocks of code.  Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel.  Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact.  We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

bridge: Fix fatal typo in setup of multicast_querier_expired

Unfortunately it seems that I didn't properly test the case of
an expired external querier in the recent multicast bridge series.

The setup of the timer in that case is completely broken and leads
to a NULL-pointer dereference. This patch fixes it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: Add missing net/net/ip6_checksum.h include.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/l2tp: add support for L2TP over IPv6 UDP

Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6.
Support has been tested with and without hardware offloading. This
version fixes the L2TP over localhost issue with incorrect checksums
being reported.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6

Now that the sematics of udpv6_queue_rcv_skb() match IPv4's
udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6/udp: UDP encapsulation: move socket locking into udpv6_queue_rcv_skb()

In order to make sure that when the encap_rcv() hook is introduced it is
not called with the socket lock held, move socket locking from callers into
udpv6_queue_rcv_skb(), matching what happens in IPv4.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb

This is the first step in reworking the IPv6 UDP code to be structured more
like the IPv4 UDP code. This patch creates __udpv6_queue_rcv_skb() with
the equivalent sematics to __udp_queue_rcv_skb(), and wires it up to the
backlog_rcv method.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

drivers/net/oki-semi: Donot recompute IP header checksum

If I understand correct, NETIF_F_IP_CSUM only means the hardware
will compute the TCP/UDP checksum, IP checksum is always computed
in software

So as a workround of hardware unable to compute small packages
checksum, do not need to compute IP header checksum.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/oki-semi: Remove the definition of PCH_GBE_ETH_ALEN

PCH_GBE_ETH_ALEN is equal to ETH_ALEN, so we can replace it with
ETH_ALEN.

If they are not equal, it must be a bug, since this is ethernet,
and the address has been already stored to mc_addr_list as ETH_ALEN
bytes when call pch_gbe_mac_mc_addr_list_update.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/at91_ether: use gpio_to_irq for phy IRQ line

Use the gpio_to_irq() function to retrieve the phy IRQ line
from the GPIO pin specification.
This fix is needed now that we have moved to irqdomains on AT91.

Reported-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Andrew Victor <avictor.za@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

AT91: Remove fixed mapping for AT91RM9200 ethernet

The AT91RM9200 Ethernet controller still has a fixed IO mapping.
So:
* Remove the fixed IO mapping and AT91_VA_BASE_EMAC definition.
* Pass the physical base-address via platform-resources to the driver.
* Convert at91_ether.c driver to perform an ioremap().
* Ethernet PHY detection needs to be performed during the driver
initialization process, it can no longer be done first.

Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fixed a coding style issue related to spaces.

Fixed a coding style issue relating to spaces
in net/core/sock.c

Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Reject payload messages with invalid message type

Adds check to ensure TIPC sockets reject incoming payload messages
that have an unrecognized message type.

Remove the old open question about whether TIPC_ERR_NO_PORT is
the proper return value. It is appropriate here since there are
valid instances where another node can make use of the reply,
and at this point in time the host is already broadcasting TIPC
data, so there are no real security concerns.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

ixgbe: check for WoL support in single function

This patch consolidates the case logic for checking whether a device supports
WoL into a single place. Previously ethtool and probe used similar logic that
was copied and maintained separately. This patch encapsulates the core logic
into a function so that a user only has to update one place.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Force flow control off during reset when forcing speed.

During igb_reset(), we initiate a hardware reset which will clear our
flow control settings. For auto-negotiation, we re-negotiate them when
linking up again, but we need to force them off properly for the forced
speed case.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: 82579 potential system hang on stress when ME enabled

Previously, a workaround was added to address a hardware bug in the
PCIm2PCI arbiter where a write by the driver of the Transmit/Receive
Descriptor Tail register could happen concurrently with a write of any
MAC CSR register by the Manageability Engine (ME) which could cause the
Tail register to have an incorrect value.  The arbiter is supposed to
prevent the concurrent writes but there is a bug that can cause the Host
(driver) access to be acknowledged later than it should.
After further investigation, it was discovered that a driver write access
of any MAC CSR register after being idle for some time can be lost when
ME is accessing a MAC CSR register.  When this happens, no further target
access is claimed by the MAC which could hang the system.
The workaround to check bit 24 in the FWSM register (set only when ME is
accessing a MAC CSR register) and delay for a limited amount of time until
it is cleared is now done for all driver writes of MAC CSR registers on
82579 with ME enabled.  In the rare case when the driver is writing the
Tail register and ME is accessing any MAC CSR register for a duration
longer than the maximum delay, write the register and verify it has the
correct value before continuing, otherwise reset the device.

This patch also moves some pre-existing macros from the hardware-specific
header file to the more appropriate generic driver header file.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: 82579 packet drop workaround

In K1 mode (a MAC/PHY interconnect power mode), the 82579 device shuts down
the Phase Lock Loop (PLL) of the interconnect to save power. When the PLL
starts working, the 82579 device may start to transfer the packet through
the interconnect before it is fully functional causing packet drops. This
workaround disables shutting down the PLL in K1 mode for 1G link speed.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Enable DMA Burst Mode on 82574 by default.

Performance testing has shown that enabling DMA burst on 82574
improves performance on small packets, so enable it by default.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Disable Far-End LoopBack following reset on 80003ES2LAN.

80003ES2LAN has an errata such that far-end loopback may be activated by
bit errors producing a reserved symbol. In order to disable far-end
loopback quickly enough, disable it immediately following a reset.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: cleanups in sock_setsockopt()

Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to
match !!sock_flag()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: doc: merge /proc/sys/net/core/* documents into one place

All parameter descriptions in /proc/sys/net/core/* now is separated
two places. So, merge them into Documentation/sysctl/net.txt.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: update the driver version

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix speed displayed by ethtool on certain SKUs

logical speed returned by link_status_query needs to be multiplied by 10.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Ignore status of some ioctls during driver load

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

NET: smsc-ircc2: mark non-experimental

This has been used by me and others for ages, let's stop calling it
experimental.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: bond_update_speed_duplex() can return void since no callers check its return

As none of the callers of bond_update_speed_duplex (need to) check its
return value, there is little point in it returning anything.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Allow a predefined set of capture masks for FW dump

o 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F and 0xFF are the allowed capture masks.
o Updated driver version to 5.0.28

Signed-off-by: Manish chopra <manish.chopra@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Adding mac statistics to ethtool.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Register device in FAILED state.

o Without failing probe, register netdevice when device is in FAILED state.
o Device will come up with minimum functionality.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crush: include header for global symbols

Include the header to pickup the definitions of the global symbols.

Quiets the following sparse warnings:

warning: symbol 'crush_find_rule' was not declared. Should it be static?
warning: symbol 'crush_do_rule' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Sage Weil <sage@newdream.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/eicon: use standard __init,__exit function markup

Remove the custom DIVA_{INIT,EXIT}_FUNCTION defines and use
the standard __init,__exit markup.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Armin Schindler <mac@melware.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing

Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Enhance error checking of published names

Consolidates validation of scope and name sequence range values into
a single routine where it applies both to local name publications
and to name publications issued by other nodes in the network. This
change means that the scope value for non-local publications is now
validated and the name sequence range for local publications is now
validated only once. Additionally, a publication attempt that fails
validation now creates an entry in the system log file only if debugging
capabilities have been enabled; this prevents the system log from being
cluttered up with messages caused by a defective application or network
node.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Create helper routine to delete unused name sequence structure

Replaces two identical chunks of code that delete an unused name
sequence structure from TIPC's name table with calls to a new routine
that performs this operation.

This change is cosmetic and doesn't impact the operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: remove redundant memset and stale comment from subscr.c

Eliminate code to zero-out the main topology service structure,
which is already zeroed-out.

Get rid of a comment documenting a field of the main topology
service structure that no longer exists.

Both are cosmetic changes with no impact on runtime behaviour.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Optimize initialization of network topology service

Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet. With the
current codebase, the deferral is no longer necessary.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Enhance re-initialization of network topology service

Streamlines the job of re-initializing TIPC's network topology service
when a node's network address is first assigned. Rather than destroying
the topology server port and breaking its connections to existing
subscribers, TIPC now simply lets the service continue running (since
the change to the port identifier of each port used by the topology
service no longer impacts the flow of messages between the service and
its subscribers).

This enhancement means that applications that utilize the topology
service prior to the assignment of TIPC's network address no longer need
to re-establish their subscriptions when the address is finally assigned.

However, it is worth noting that any subsequent events for existing
subscriptions report the new port identifier of the publishing port,
rather than the original port identifier. (For example, a name that was
previously reported as being published by <0.0.0:ref> may be subsequently
withdrawn by <Z.C.N:ref>.)

This doesn't impact any of the existing known userspace in tipc-utils,
since (a) TIPC continues to treat references to the original port ID
correctly and (b) normal use cases assign an address before active use.

However if there does happen to be some rare/custom application out
there that was relying on this, they can simply bypass the enhancement
by issuing a subscription to {0,0} and break its connection to the
topology service, if an associated withdrawal event occurs.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Optimize termination of configuration service

Termination no longer tests to see if the configuration service
port was successfully created or not. In the unlikely event that the
port was not created, attempting to delete the non-existent port is
detected gracefully and causes no harm.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Optimize initialization of configuration service

Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet. With the
current codebase, the deferral is no longer necessary.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: Optimize re-initialization of configuration service

Streamlines the job of re-initializing TIPC's configuration service
when a node's network address is first assigned. Rather than destroying
the configuration server port and then recreating it, TIPC now simply
withdraws the existing {0,<0.0.0>} name publication and creates a new
{0,<Z.C.N>} name publication that identifies the node's network address
to interested subscribers.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem

Conflicts:
drivers/net/wireless/iwlwifi/iwl-testmode.c

tcp repair: Fix unaligned access when repairing options (v2)

Don't pick __u8/__u16 values directly from raw pointers, but instead use
an array of structures of code:value pairs. This is OK, since the buffer
we take options from is not an skb memory, but a user-to-kernel one.

For those options which don't require any value now, require this to be
zero (for potential future extension of this API).

v2: Changed tcp_repair_opt to use two __u32-s as spotted by David Laight.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6lowpan: duplicate definition of IEEE802154_ALEN

The same macros is defined in 'include/net/af_ieee802154.h' and is called
IEEE802154_ADDR_LEN. No need another one, so remove it.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6lowpan: move frame allocation code to a separate function

Separate frame allocation routine from data processing function.
This makes code more human readable and easier for understanding.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Added support for fragmentation of E1 interfaces of hfcmulti driver.

Fragmentation is usefull if multiple devices are connected to an E1
interface. Each fragment will have a subset of the available timeslots.
These devices require a cascde connection or a multiplexer.

Signed-off-by: Andreas Eversberg <jolly@eversberg.eu>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Rework of LED status display for HFC-4S/8S/E1 cards.

LEDs will show RED if layer 1 is disabled or fails.
LEDs will show GREEN if layer 1 is active.
LEDs will blink if traffic on D-channel.

Signed-off-by: Andreas Eversberg <jolly@eversberg.eu>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Using FLG_ACTIVE flag to determine if layer 1 is active or not.

We already have the flag for L1 active, so we should use it.
L2 will be solved in a later patch.

Signed-off-by: Andreas Eversberg <andreas@eversberg.eu>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Fixed false interruption of audio during bridging change.

Transmitted audio data was interrupted if a bridge was enabled or disabled.
Now transmission seamlessly continues during that action.
Fix in hfcmulti.ko

Signed-off-by: Andreas Eversberg <jolly@eversberg.eu>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: refine start/enable code for MAC module

merge TXQ/RXQ/MAC start/enable code to one function as they
are started/enabled at the same time, just like stop/disable them
in the function of atl1c_stop_mac.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: add function atl1c_power_saving

This function is used for suspend of S1/S3/S4 and driver remove.
It sets MAC/PHY based on the WoL configuation to get lower power
consumption.
atl1c_phy_power_saving is renamed to atl1c_phy_to_ps_link, this
function is just make PHY enter a link/speed mode to eat less
power.
REG_MAC_CTRL register is refined as well.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: remove PHY reset/init for link down event

it's unnecessary to reset/init phy when link down.
Only L1/L2 chip (supported by atlx) need such action.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: update PHY reset related routine

Many magic data are re-configured for PHY during its reset operation
based on chip type to get better compability and stability.
REG_PHY_CTRL register may be configured by BIOS before enter OS.
so, the driver can't directly write to it without any Read-Op.
this change also affect suspend and phy_disable routines.
PHY debug ports and extension registers are refined as well.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: remove PHY polling from atl1c_open

PHY polling code for FPGA is considered in every MDIO R/W API.
no need to add additional code to atl1c_open.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: refine SERDES-clock related code

bit 17/18 of reg1424 must be clear for l2cb 1.x, or it will cause
the write-reg operation fail without cable connected.
so, please do connect the cable when apply this patch to the driver
to make sure these 2bits are cleared by new driver.
The revised code is move to al1c_reset_mac.
SERDES register definition is refined as well.

when do reset MAC, speed/duplex control right should be transferred
to software before do PHY auto-neg -- by bit MASTER_CTRL_SPEED_MODE_SW.
SERDES register definition is refined as well.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: remove PHY contrl in atl1c_reset_pcie

atl1c_reset_phy follows atl1c_reset_pcie in the whole driver,
so, it's unnecessary to add PHY control code in atl1c_reset_pcie.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: refine phy-register read/write function

phy register is read/write via MDIO control module ---
that module will be affected by the hibernate status,
to access phy regs in hib stutus, slow frequency clk must
be selected.
To access phy extension register, the MDIO related
registers are refined/updated, a _core function is
re-wroted for both regular PHY regs and extension regs.
existing PHY r/w function is revised based on the _core.
PHY extension registers will be used for the comming
patches.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: remove REG_PHY_STATUS

this register is used for l1e(dev=1026)
l1c/l1d/l2cb don't use it.

Signed-off-by: xiong <xiong@qca.qualcomm.com>
Tested-by: Liu David <dwliu@qca.qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix FW download for BE

Skip flashing a FW component if that component is not present in a
particular FW UFI image.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix wrong status getting returned for MCC commands

MCC Response CQEs are processed as part of NAPI poll routine and
also synchronously. If MCC completions are consumed by NAPI poll
routine, wrong status is returned to synchronously waiting routine.
Fix this by getting status of MCC command from command response
instead of response CQEs.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix Lancer statistics

Fix port num sent in command to get stats. Also skip unnecessary
parsing of stats for Lancer.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix traffic stall INTx mode

EQ is getting armed wrongly in INTx mode as INTx interrupt is taking
some time to deassert. This can cause another interrupt while NAPI is
scheduled and scheduling a NAPI in interrupt does not take effect.
This causes interrupt to be missed and traffic stalls. Fixing this by
preventing wrong arming of EQ.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>