Richard Cochran [Fri, 21 Nov 2014 13:16:20 +0000 (14:16 +0100)]
vlan: Pass ethtool get_ts_info queries to real device.
Commit a6111d3c "vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device"
intended to enable hardware time stamping on VLAN interfaces, but
passing SIOCSHWTSTAMP is only half of the story. This patch adds
the second half, by letting user space find out the time stamping
capabilities of the device backing a VLAN interface.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Nov 2014 20:23:02 +0000 (15:23 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-11-20
This series contains updates to ixgbevf, i40e and i40evf.
Emil updates ixgbevf with much of the work that Alex Duyck did while at
Intel. First updates the driver to clear the status bits on allocation
instead of in the cleanup routine, this way we can leave the recieve
descriptor rings as a read only memory block until we actually have
buffers to give back to the hardware. Clean up ixgbevf_clean_rx_irq()
by creating ixgbevf_process_skb_field() to merge several similar
operations into this new function. Cleanup temporary variables within
the receive hot-path and reducing the scope of variables that do not
need to exist outside the main loop. Save on stack space by just
storing our updated values back in next_to_clean instead of using
a stack variable, which also collapses the size the function. Improve
performace on IOMMU enabled systems and reduce cache misses by changing
the basic receive patch for ixgbevf so that instead of receiving the
data into an skb, it is received into a double buffered page. Add
netpoll support by creating ixgbevf_netpoll(), which is a callback for
.ndo_poll_controller to allow for the VF interface to be used with
netconsole.
Mitch provides several cleanups and trivial fixes for i40e and i40evf.
First is a fix the overloading of the msg_size field in the
arq_event_info struct by splitting the field into two and renaming to
indicate the actual function of each field. Updates code comments
to match the actual function. Cleanup several checkpatch.pl warnings
by adding or removing blank lines, aligning function parameters, and
correcting over-long lines (which makes the code more readable).
Shannon provides a patch for i40e to write the extra bits that will
turn off the ITR wait for the interrupt, since we want the SW INT to
go off as soon as possible.
v2: updated patch 07 based on feedback from Alex Duyck by
- adding pfmemalloc check to a new function for reusable page
- moved atomic_inc outside of #if/else in ixgbevf_add_rx_frag()
- reverted the removal of the API check in ixgbevf_change_mtu()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The following series of patches includes functional updates to the
driver as well as some trivial changes.
- Add a read memory barrier in the Tx and Rx path after checking the
descriptor ownership bit
- Wait for the Tx engine to stop/suspend before issuing a stop command
- Implement a smatch tool suggestion to simplify an if statement
- Separate out Tx and Rx ring data fields into their own structures
- Add BQL support
- Remove an unused variable
- Change Tx coalescing support to operate on packet basis instead of
a descriptor basis
- Add support for the skb->xmit_more flag
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 20 Nov 2014 17:04:02 +0000 (11:04 -0600)]
amd-xgbe: Perform Tx coalescing on a packet basis
The current form of Tx coalescing works on a descriptor basis instead
of on a packet basis and doesn't take into account TSO packets. Update
the Tx coalescing support to work on a packet basis, taking into
account the number of packets associated with a TSO transmit. Also,
only activate the Tx timer if a timer value is set.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 20 Nov 2014 17:03:50 +0000 (11:03 -0600)]
amd-xgbe: Add BQL support
Call the appropriate BQL functions to track the number of bytes queued
during Tx processing and to track the number of packets and bytes
that have been transmitted during Tx complete processing.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 20 Nov 2014 17:03:44 +0000 (11:03 -0600)]
amd-xgbe: Separate Tx/Rx ring data fields into new structs
Move the Tx and Rx related fields within the xgbe_ring_data struct into
their own structs in order to more easily see what fields are used for
each operation.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 20 Nov 2014 17:03:38 +0000 (11:03 -0600)]
amd-xgbe: Incorporate Smatch coding suggestion
The Smatch tool indicated that one of the if statements in xgbe-dev.c
could be rewritten to remove a redundant check for the 'err' variable
in an if statement.
Change the statement as suggested and add a comment to help clarify.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 20 Nov 2014 17:03:32 +0000 (11:03 -0600)]
amd-xgbe: Tx engine must not be active before stopping it
If the Tx engine is told to stop while it is actively processing Tx
descriptors it is possible that the Tx descriptor(s) will not be closed
out properly. When the Tx engine is restarted this could result in the
driver being stuck on the improperly closed descriptor.
Update the driver to wait for the Tx engine to be in a stopped or
suspended state before issuing the stop command.
This has not been an issue to date, but it's a good safe-guard to have.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 20 Nov 2014 17:03:26 +0000 (11:03 -0600)]
amd-xgbe: Add a read memory barrier to Tx/Rx path
Add a read memory barrier to the Tx and Rx paths where the ownership
bit is checked to be sure that all descriptor fields are read after
having read the ownership bit for the descriptor.
This has not been an issue to date, but it's a good safe-guard to have.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Thu, 20 Nov 2014 13:47:12 +0000 (14:47 +0100)]
net: Xilinx: Deletion of unnecessary checks before two function calls
The functions kfree() and of_node_put() test whether their argument is NULL
and then return immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Thu, 20 Nov 2014 12:19:44 +0000 (14:19 +0200)]
net/mlx4_en: mlx4_en_set_settings() always fails when autoneg is set
Fix ethtool set settings to not check AUTONEG_ENABLE
mlx4_en_set_settings should not check if cmd->autoneg == AUTONEG_ENABLE,
cmd->autoneg can be enabled by default and this check will fail other settings requests.
mlx4_en driver doesn't support changing autoneg value, but shouldn't fail the request
in case cmd->autoneg was set.
Fixes: d48b3ab ("net/mlx4_en: Use PTYS register to set ethtool settings (Speed)") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Added a pci_dma_mapping_error() call to check for mapping errors before
further using the dma handle. In case of error, control goes to a new label
where the incoming skb is freed. Unchecked dma handles were found using
Coccinelle:
@rule1@
expression e1;
identifier x;
@@
*x = pci_map_single(...);
... when != pci_dma_mapping_error(e1,x)
Signed-off-by: Tina Johnson <tinajohnson.1234@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Nov 2014 20:01:35 +0000 (15:01 -0500)]
Merge branch 'tipc-next'
Richard Alpe says:
====================
tipc: new netlink API
v3
The old API is not removed.
The new API is separated from the old because of a bug in the old
tipc-config utility using it. When adding commands to the existing
genl_ops struct the get-family response message grows to a point where
it overflows the small receive buffer in tipc-config, subsequently
breaking the tool. Hence the two genl_family and genl_ops structs.
The new headers are placed in a new file called tipc_netlink.h rather
than added to tipc_config.h as they where in previous versions of this
patchset.
/v3
v2
Redesigned "socket list command" to address David Millers comments in
net-next v1 of this patchset.
Simply put the problem is that we can have an arbitrary amount of
sockets with an arbitrary amount of associated publications. In the
previous patchset this was solved by nesting as many publications as
possible into a socket. If all didn't fit it sent the same socket again
with the remaining publications. As David Miller pointed out this makes
each message malformed as the receiver cannot by the data itself know if
it has received a complete set or not. This was flagged outside of the
data and the client did the reassembly.
o socket 1
o publ 1
o publ 2
o socket 1
o publ 3
o publ 4
In this patchset this is divided into socket listing and publication
listing to avoid having nested data of arbitrary size.
TIPC_NL_SOCK_GET now dumps all sockets with any nested connection
information. However, it no longer include publication information,
only a HAS_PUBL flag to indicate whether the socket has publications or
not. To compliment this there is a new command TIPC_NL_PUBL_GET which
takes a socket as argument and dumps all associated publications.
This means that on "top-level" the data is always complete. In the case
of "tipc socket list" (new tipc-config -p) it first queries all sockets
with TIPC_NL_SOCK_GET and if the socket is published it fetches the
publications using TIPC_NL_PUBL_GET. This is slow for large amount of
sockets with a low publication count (worst case). However, the
integrity is preserved and there is no malformed messages.
/v2
This is a new netlink API for TIPC. It's intended to replace the
existing ASCII API. It utilizes many of the standard netlink
functionalities in the kernel, such as attribute nesting and
input polices.
There are a couple of reasons for this rewrite. The main and most
easily justifiable is that the existing API doesn't scale. Meaning
that a TIPC cluster with a larger amount of nodes, publications or
ports will rapidly exceed what the exiting API can handle. Resulting
in truncated or corrupt responses. In addition to this, the existing
ASCII API rarely uses "standard" kernel functions and has several
tipc specific functions for sanity checking and string formating.
The new API utilizes standard function for pushing data to socket
buffers and netlink attribute nesting to logically group data.
The new API can handle an arbitrary amount of data for things that
are likely to scale up as the TIPC usage and/or cluster size
increases.
A new user-space tool has been developed to work with this new API.
It is called "tipc" and is part of the "tipc-utils" package that
comes with many Linux distributions. The new "tipc" tool utilizes
standard functions from libnl to format, send, receive and process
messages. The tool has borrowed design philosophies from git and the
ip tool. Making the syntax resemble that of ip whiles its strong
modularity resembles that of git.
The existing tool for managing TIPC, "tipc-config" remains in the
package, but when built for kernels that has this new API it is
replaced by a script-based wrapper that maps the old syntax to the
new tool. This way, backwards compatibility is mostly preserved.
MORE ABOUT THE CODE
The main challenge here is to handle the case where the data is of
arbitrary size. This was largely neglected in the old API design.
For example when there is a lot of sockets that has a large amount of
associated publications. In this specific case we can't assume that
all ports nor for that matter all the publications can fit inside a
single netlink message. Sending everything in one batch isn't an
option as we need to yield for the socket layer to cope.
This is solved by using the standard netlink callback for dumping
data and releasing the locks when the netlink message is full. The
dumping mechanism gets us back and we keep a reference (logical) to
where we where when the message became full. This means that we are
not "atomic", what is retrieved by user-space isn't a snapshot at a
certain time but rather a continuously updated data set. In the case
where we can't find our way back i.e. our logical reference are gone
we set a standard flag (NLM_F_DUMP_INTR) to tell user-space that the
dump was interrupted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:20 +0000 (10:29 +0100)]
tipc: add name table dump to new netlink api
Add TIPC_NL_NAME_TABLE_GET command to the new tipc netlink API.
This command supports dumping the name table of all nodes.
Netlink logical layout of name table response message:
-> name table
-> publication
-> type
-> lower
-> upper
-> scope
-> node
-> ref
-> key
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:19 +0000 (10:29 +0100)]
tipc: add net set to new netlink api
Add TIPC_NL_NET_SET command to the new tipc netlink API.
This command can set the network id and network (tipc) address.
Netlink logical layout of network set message:
-> net
[ -> id ]
[ -> address ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:18 +0000 (10:29 +0100)]
tipc: add net dump to new netlink api
Add TIPC_NL_NET_GET command to the new tipc netlink API.
This command dumps the network id of the node.
Netlink logical layout of returned network data:
-> net
-> id
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:17 +0000 (10:29 +0100)]
tipc: add node get/dump to new netlink api
Add TIPC_NL_NODE_GET to the new tipc netlink API.
This command can dump the address and node status of all nodes in the
tipc cluster.
Netlink logical layout of returned node/address data:
-> node
-> address
-> up flag
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:16 +0000 (10:29 +0100)]
tipc: add media set to new netlink api
Add TIPC_NL_MEDIA_SET command to the new tipc netlink API.
This command can set one or more link properties for a particular
media.
Netlink logical layout of bearer set message:
-> media
-> name
-> link properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:15 +0000 (10:29 +0100)]
tipc: add media get/dump to new netlink api
Add TIPC_NL_MEDIA_GET command to the new tipc netlink API.
This command supports dumping all information about all defined
media as well as getting all information about a specific media.
The information about a media includes name and link properties.
Netlink logical layout of media get response message:
-> media
-> name
-> link properties
-> tolerance
-> priority
-> window
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:14 +0000 (10:29 +0100)]
tipc: add link stat reset to new netlink api
Add TIPC_NL_LINK_RESET_STATS command to the new netlink API.
This command resets the link statistics for a particular link.
Netlink logical layout of link reset message:
-> link
-> name
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:13 +0000 (10:29 +0100)]
tipc: add link set to new netlink api
Add TIPC_NL_LINK_SET to the new tipc netlink API.
This command can set one or more link properties for a particular
link.
Netlink logical layout of link set message:
-> link
-> name
-> properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:12 +0000 (10:29 +0100)]
tipc: add link get/dump to new netlink api
Add TIPC_NL_LINK_GET command to the new tipc netlink API.
This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).
The information about a link includes name, transmission info,
properties and link statistics.
As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:11 +0000 (10:29 +0100)]
tipc: add publication dump to new netlink api
Add TIPC_NL_PUBL_GET command to the new tipc netlink API.
This command supports dumping of all publications for a specific
socket.
Netlink logical layout of request message:
-> socket
-> reference
Netlink logical layout of response message:
-> publication
-> type
-> lower
-> upper
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:10 +0000 (10:29 +0100)]
tipc: add sock dump to new netlink api
Add TIPC_NL_SOCK_GET command to the new tipc netlink API.
This command supports dumping of all available sockets with their
associated connection or publication(s). It could be extended to reply
with a single socket if the NLM_F_DUMP isn't set.
The information about a socket includes reference, address, connection
information / publication information.
Netlink logical layout of response message:
-> socket
-> reference
-> address
[
-> connection
-> node
-> socket
[
-> connected flag
-> type
-> instance
]
]
[
-> publication flag
]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:09 +0000 (10:29 +0100)]
tipc: add bearer set to new netlink api
Add TIPC_NL_BEARER_SET command to the new tipc netlink API.
This command can set one or more link properties for a particular
bearer.
Netlink logical layout of bearer set message:
-> bearer
-> name
-> link properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:08 +0000 (10:29 +0100)]
tipc: add bearer get/dump to new netlink api
Add TIPC_NL_BEARER_GET command to the new tipc netlink API.
This command supports dumping all data about all bearers or getting
all information about a specific bearer.
The information about a bearer includes name, link priorities and
domain.
Netlink logical layout of bearer get message:
-> bearer
-> name
Netlink logical layout of returned bearer information:
-> bearer
-> name
-> link properties
-> priority
-> tolerance
-> window
-> domain
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Alpe [Thu, 20 Nov 2014 09:29:07 +0000 (10:29 +0100)]
tipc: add bearer disable/enable to new netlink api
A new netlink API for tipc that can disable or enable a tipc bearer.
The new API is separated from the old API because of a bug in the
user space client (tipc-config). The problem is that older versions
of tipc-config has a very low receive limit and adding commands to
the legacy genl_opts struct causes the ctrl_getfamily() response
message to grow, subsequently breaking the tool.
The new API utilizes netlink policies for input validation. Where the
top-level netlink attributes are tipc-logical entities, like bearer.
The top level entities then contain nested attributes. In this case
a name, nested link properties and a domain.
Netlink commands implemented in this patch:
TIPC_NL_BEARER_ENABLE
TIPC_NL_BEARER_DISABLE
Netlink logical layout of bearer disable message:
-> bearer
-> name
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 20 Nov 2014 08:31:05 +0000 (16:31 +0800)]
macvtap: advance iov iterator when needed in macvtap_put_user()
When mergeable buffer is used, vnet_hdr_sz is greater than sizeof struct
virtio_net_hdr. So we need advance the iov iterators in this case.
Fixes 6c36d2e26cda1ad3e2c4b90dd843825fc62fe5b4 ("macvtap: Use iovec iterators") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Thu, 20 Nov 2014 02:29:05 +0000 (10:29 +0800)]
r8152: adjust r8152_submit_rx
The behavior of handling the returned status from r8152_submit_rx()
is almost same, so let r8152_submit_rx() deal with the error
directly. This could avoid the duplicate code.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Nov 2014 00:54:48 +0000 (01:54 +0100)]
net: sctp: keep owned chunk in destructor_arg instead of skb->cb
It's just silly to hold the skb destructor argument around inside
skb->cb[] as we currently do in SCTP.
Nowadays, we're sort of cheating on data accounting in the sense
that due to commit 4c3a5bdae293 ("sctp: Don't charge for data in
sndbuf again when transmitting packet"), we orphan the skb already
in the SCTP output path, i.e. giving back charged data memory, and
use a different destructor only to make sure the sk doesn't vanish
on skb destruction time. Thus, cb[] is still valid here as we
operate within the SCTP layer. (It's generally actually a big
candidate for future rework, imho.)
However, storing the destructor in the cb[] can easily cause issues
should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP
layer, since cb[] may get overwritten by lower layers and thus can
corrupt the chunk pointer. There are no such issues at present,
but lets keep the chunk in destructor_arg, as this is the actual
purpose for it.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 19 Nov 2014 18:29:56 +0000 (10:29 -0800)]
net: bcmgenet: log RX buffer allocation and RX/TX dma failures
To help troubleshoot heavy memory pressure conditions, add a bunch of
statistics counter to log RX buffer allocation and RX/TX DMA mapping
failures. These are reported like any other counters through the ethtool
stats interface.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 19 Nov 2014 18:29:55 +0000 (10:29 -0800)]
net: systemport: log RX buffer allocation and RX/TX DMA failures
To help troubleshoot heavy memory pressure conditions, add a bunch of
statistics counter to log RX buffer allocation and RX/TX DMA mapping
failures. These are reported like any other counters through the ethtool
stats interface.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Wed, 19 Nov 2014 18:10:16 +0000 (13:10 -0500)]
packet: make packet_snd fail on len smaller than l2 header
When sending packets out with PF_PACKET, SOCK_RAW, ensure that the
packet is at least as long as the device's expected link layer header.
This check already exists in tpacket_snd, but not in packet_snd.
Also rate limit the warning in tpacket_snd.
Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 19 Nov 2014 13:05:01 +0000 (14:05 +0100)]
net: move make_writable helper into common code
note that skb_make_writable already exists in net/netfilter/core.c
but does something slightly different.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 19 Nov 2014 13:05:00 +0000 (14:05 +0100)]
vlan: introduce __vlan_insert_tag helper which does not free skb
There's a need for helper which inserts vlan tag but does not free the
skb in case of an error.
Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 19 Nov 2014 13:04:55 +0000 (14:04 +0100)]
openvswitch: actions: use skb_postpull_rcsum when possible
Replace duplicated code by calling skb_postpull_rcsum
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds device and device-type abstractions to the micrel
driver, and enables support for RMII-reference clock selection for
KSZ8081 and KSZ8091 devices.
While adding support for more features for the Micrel PHYs mentioned
above, it became apparent that the configuration space is much too large
and that adding type-specific callbacks will simply not scale. Instead I
added a driver_data field to struct phy_device, which can be used to
store static device type data that can be parsed and acted on in
generic driver callbacks. This allows a lot of duplicated code to be
removed, and should make it much easier to add new features or deal with
device-type quirks in the future.
The series has been tested on a dual KSZ8081 setup. Further testing on
other Micrel PHYs would be much appreciated.
The recent commit a95a18afe4c8 ("phy/micrel: KSZ8031RNL RMII clock
reconfiguration bug") currently prevents KSZ8031 PHYs from using the
generic config-init. Bruno, who is the author of that patch, has agreed
to test this series and some follow-up diagnostic patches to determine
how best to incorporate these devices as well. I intend to send a
follow-up patch that removes the custom 8031 config-init and documents
this quirk, but the current series can be applied meanwhile.
These patches are against net-next which contains some already merged
prerequisite patches to the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 19 Nov 2014 11:59:21 +0000 (12:59 +0100)]
dt/bindings: add clock-select function property to micrel phy binding
Add "micrel,rmii-reference-clock-select-25-mhz" to Micrel ethernet PHY
binding documentation.
This property is needed to properly describe some revisions of Micrel
PHYs which has the function of this configuration bit inverted so that
setting it enables 25 MHz rather than 50 MHz clock mode.
Note that a clock reference ("rmii-ref") is still needed to actually
select either mode.
Cc: devicetree@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 19 Nov 2014 11:59:19 +0000 (12:59 +0100)]
net: phy: micrel: add support for clock-mode select to KSZ8081/KSZ8091
Micrel KSZ8081 and KSZ8091 PHYs have the RMII Reference Clock Select
bit, which is used to select 25 or 50 MHz clock mode.
Note that on some revisions of the PHY (e.g. KSZ8081RND) the function of
this bit is inverted so that setting it enables 25 rather than 50 MHz
mode. Add a new device-tree property
"micrel,rmii-reference-clock-select-25-mhz" to describe this.
Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 19 Nov 2014 11:59:18 +0000 (12:59 +0100)]
net: phy: micrel: add generic clock-mode-select support
Add generic RMII-Reference-Clock-Select support.
Several Micrel PHY have an RMII-Reference-Clock-Select bit to select
25 MHz or 50 MHz clock mode. Recently, support for configuring this
through device tree for KSZ8021 and KSZ8031 was added.
Generalise this support so that it can be configured for other PHY types
as well.
Note that some PHY revisions (of the same type) has this bit inverted.
This should be either configurable through a new device-tree property,
or preferably, determined based on PHY ID if possible.
Also note that this removes support for setting 25 MHz mode from board
files which was also added by the above mentioned commit 45f56cb82e45
("net/phy: micrel: Add clock support for KSZ8021/KSZ8031").
Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ian Morris <ipm@chirality.org.uk> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Check and update posted_index only when skb->xmit_more is 0 or tx queue is full.
v2:
use txq_map instead of skb_get_queue_mapping(skb)
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 11 Nov 2014 20:04:35 +0000 (20:04 +0000)]
i40e: trigger SW INT with no ITR wait
Since we want the SW INT to go off as soon as possible, write the
extra bits that will turn off the ITR wait for the interrupt.
Change-ID: I6d5382ba60840fa32abb7dea17c839eb4b5f68f7 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Tue, 11 Nov 2014 20:03:13 +0000 (20:03 +0000)]
i40evf: remove unnecessary else
Since the if part of this statement contains a break, there's no reason
for the else. Clean up the code and make it more obvious that the delay
happens each time through the loop.
Change-ID: I9292eaf7dd687688bdc401b8bd8d1d14f6944460 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Tue, 11 Nov 2014 20:02:52 +0000 (20:02 +0000)]
i40evf: make comparisons consistent
Most of the null-checking in this driver is of the style if (!foo),
except these few. Make these checks consistent with the rest of the
code.
Change-ID: I991924f34072fa607a1b626a8b3f1fa5195d43e9 Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Tue, 11 Nov 2014 20:02:42 +0000 (20:02 +0000)]
i40evf: make checkpatch happy
This patch is the result of running checkpatch on the i40evf driver with
the --strict option. The vast majority of changes are adding/removing
blank lines, aligning function parameters, and correcting over-long
lines.
The only possible functional change is changing the flags member of the
adapter structure to be non-volatile. However, according to the kernel
documentation, this is not necessary and the volatile should be removed.
Change-ID: Ie8c6414800924f529bef831e8845292b970fe2ed Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Tue, 11 Nov 2014 20:02:31 +0000 (20:02 +0000)]
i40evf: update header comments
No code changes. Update comments to match actual function declarations.
Change-ID: Ib830d2f154ee917a104955c0914267fc98f3d2c8 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Tue, 11 Nov 2014 20:02:19 +0000 (20:02 +0000)]
i40e: don't overload fields
Overloading the msg_size field in the arq_event_info struct is just a
bad idea. It leads to repeated bugs when the structure is used in a
loop, since the input value (buffer size) is overwritten by the output
value (actual message length).
Fix this by splitting the field into two and renaming to indicate the
actual function of each field.
Since the arq_event struct has now changed, we need to change the drivers
to support this. Note that we no longer need to initialize the buffer size
each time we go through a loop as this value is no longer destroyed by
arq processing.
In the process, we also fix a bug in i40evf_verify_api_ver where the
buffer size was not correctly reinitialized each time through the loop.
Change-ID: Ic7f9633cdd6f871f93e698dfb095e29c696f5581 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Ashish Shah <ashish.n.shah@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 8 Nov 2014 01:39:56 +0000 (01:39 +0000)]
ixgbevf: add netpoll support
This patch adds ixgbevf_netpoll() a callback for .ndo_poll_controller to
allow for the VF interface to be used with netconsole.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 8 Nov 2014 01:39:51 +0000 (01:39 +0000)]
ixgbevf: compare total_rx_packets and budget in ixgbevf_clean_rx_irq
total_rx_packets is the number of packets we had cleaned, and budget is
the total number of packets that we could clean per poll. Instead of
altering both of these values we can save ourselves one write to memory by
just comparing total_rx_packets to the budget and as long as we are less
than budget we continue cleaning.
Also change the do{}while logic to while{} in order to avoid processing
packets when budget is 0.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Fri, 21 Nov 2014 02:57:15 +0000 (02:57 +0000)]
ixgbevf: Change receive model to use double buffered page based receives
This patch changes the basic receive path for ixgbevf so that instead of
receiving the data into an skb it is received into a double buffered page.
The main change is that the receives will be done in pages only and then
pull the header out of the page and copy it into the sk_buff data.
This has the advantages of reduced cache misses and improved performance on
IOMMU enabled systems.
v2:
- added pfmemalloc check to a new function for reusable page
- moved atomic_inc outside of #if/else in ixgbevf_add_rx_frag()
- reverted the removal of the api check in ixgbevf_change_mtu()
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 8 Nov 2014 01:39:41 +0000 (01:39 +0000)]
ixgbevf: Update Rx next to clean in real time
Since the next_to_clean value is only accessed by the Rx interrupt handler
we can save on stack space by just storing our updated values back in
next_to_clean instead of using the stack variable i. This should help to
reduce stack space and we can further collapse the size of the function.
Also removed non_eop_descs counter as it was never shown in the stats.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 8 Nov 2014 01:39:35 +0000 (01:39 +0000)]
ixgbevf: reorder main loop in ixgbe_clean_rx_irq to allow for do/while/continue
This change allows us to go from a loop based on the descriptor to one
primarily based on the budget. The advantage to this is that we can avoid
carrying too many values from one iteration to the next.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to help cleanup the usage of temporary variables
within the Rx hot-path by removing unnecessary variables and reducing
the scope of variables that do not need to exist outside the main loop.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 8 Nov 2014 01:39:25 +0000 (01:39 +0000)]
ixgbevf: Combine the logic for post Rx processing into single function
This patch cleans up ixgbevf_clean_rx_irq() by merging several similar
operations into a new function - ixgbevf_process_skb_fields().
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 8 Nov 2014 01:39:20 +0000 (01:39 +0000)]
ixgbevf: Test Rx status bits directly out of the descriptor
Instead of keeping a local copy of the status bits from the descriptor
we can just read them directly - this is accomplished with the addition
of ixgbevf_test_staterr().
In addition instead of doing a byteswap on the status bits value, we
can byteswap the constant values we are testing since that can be done
at compile time which should help to improve performance on big-endian
systems.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 8 Nov 2014 01:39:15 +0000 (01:39 +0000)]
ixgbevf: Update ixgbevf_alloc_rx_buffers to handle clearing of status bits
Instead of clearing the status bits in the cleanup it makes more sense to
just clear the status bits on allocation. This way we can leave the Rx
descriptor rings as a read only memory block until we actually have buffers
to give back to the hardware.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Thu, 20 Nov 2014 00:10:17 +0000 (19:10 -0500)]
Merge branch 'bonding_4ad'
Xie Jianhua says:
====================
bonding: Introduce 4 AD link speed
The speed field of AD Port Key was based on bitmask, it supported 5
kinds of link speed at most, as there were only 5 bits in the speed
field of the AD Port Key. This patches series change the speed type
(AD_LINK_SPEED_BITMASK) from bitmask to enum type in order to enhance
speed type from 5 to 32, and then introduce 4 AD link speed to fix
agg_bandwidth.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianhua Xie [Wed, 19 Nov 2014 08:48:59 +0000 (16:48 +0800)]
bonding: Introduce 4 AD link speed to fix agg_bandwidth
This patch adds [2.5|20|40|56] Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jianhua Xie [Wed, 19 Nov 2014 08:48:58 +0000 (16:48 +0800)]
bonding: change AD_LINK_SPEED_BITMASK to enum to suport more speed
Port Key was determined as 16 bits according to the link speed,
duplex and user key (which is yet not supported). In the old
speed field, 5 bits are for speed [1|10|100|1000|10000]Mbps as
below:
--------------------------------------------------------------
Port key :| User key | Speed | Duplex|
--------------------------------------------------------------
16 6 1 0
This patch keeps the old layout, but changes AD_LINK_SPEED_BITMASK
from bit type to an enum type. In this way, the speed field can
expand speed type from 5 to 32.
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Mon, 10 Nov 2014 03:33:45 +0000 (22:33 -0500)]
{compat_,}verify_iovec(): switch to generic copying of iovecs
use {compat_,}rw_copy_check_uvector(). As the result, we are
guaranteed that all iovecs seen in ->msg_iov by ->sendmsg()
and ->recvmsg() will pass access_ok().
Al Viro [Sun, 6 Apr 2014 18:03:05 +0000 (14:03 -0400)]
separate kernel- and userland-side msghdr
Kernel-side struct msghdr is (currently) using the same layout as
userland one, but it's not a one-to-one copy - even without considering
32bit compat issues, we have msg_iov, msg_name and msg_control copied
to kernel[1]. It's fairly localized, so we get away with a few functions
where that knowledge is needed (and we could shrink that set even
more). Pretty much everything deals with the kernel-side variant and
the few places that want userland one just use a bunch of force-casts
to paper over the differences.
The thing is, kernel-side definition of struct msghdr is *not* exposed
in include/uapi - libc doesn't see it, etc. So we can add struct user_msghdr,
with proper annotations and let the few places that ever deal with those
beasts use it for userland pointers. Saner typechecking aside, that will
allow to change the layout of kernel-side msghdr - e.g. replace
msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need
to modify the iovec as we copy data to/from it, etc.
We could introduce kernel_msghdr instead, but that would create much more
noise - the absolute majority of the instances would need to have the
type switched to kernel_msghdr and definition of struct msghdr in
include/linux/socket.h is not going to be seen by userland anyway.
This commit just introduces user_msghdr and switches the few places that
are dealing with userland-side msghdr to it.
[1] actually, it's even trickier than that - we copy msg_control for
sendmsg, but keep the userland address on recvmsg.
bpf: fix arraymap NULL deref and missing overflow and zero size checks
- fix NULL pointer dereference:
kernel/bpf/arraymap.c:41 array_map_alloc() error: potential null dereference 'array'. (kzalloc returns null)
kernel/bpf/arraymap.c:41 array_map_alloc() error: we previously assumed 'array' could be null (see line 40)
- integer overflow check was missing in arraymap
(hashmap checks for overflow via kmalloc_array())
- arraymap can round_up(value_size, 8) to zero. check was missing.
- hashmap was missing zero size check as well, since roundup_pow_of_two() can
truncate into zero
- found a typo in the arraymap comment and unnecessary empty line
Fix all of these issues and make both overflow checks explicit U32 in size.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Recently many changes have been done inside the driver
so this patch updates the driver's doc for example reviewing
information for the rx and tx processes that are managed
by napi method, adding new information for missing glue-logic files
etc.
Also this reviews and fixes what is reported when run kernel-doc script.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently many changes have been done inside the driver
so this patch updates the driver's doc for example reviewing
information for the rx and tx processes that are managed
by napi method, adding new information for missing glue-logic files
etc.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 18 Nov 2014 07:06:20 +0000 (23:06 -0800)]
tcp: make connect() mem charging friendly
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.
We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()
Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
Instead of open coding memcpy_fromiovecend(), simply use this helper.
This leaves in socket write queue clean fast clone skbs.
This was tested against our fastopen packetdrill tests.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Tue, 18 Nov 2014 05:20:41 +0000 (13:20 +0800)]
tun: return NET_XMIT_DROP for dropped packets
After commit 5d097109257c03a71845729f8db6b5770c4bbedc
("tun: only queue packets on device"), NETDEV_TX_OK was returned for
dropped packets. This will confuse pktgen since dropped packets were
counted as sent ones.
Fixing this by returning NET_XMIT_DROP to let pktgen count it as error
packet.
Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Mon, 17 Nov 2014 22:04:29 +0000 (14:04 -0800)]
icmp: Remove some spurious dropped packet profile hits from the ICMP path
If icmp_rcv() has successfully processed the incoming ICMP datagram, we
should use consume_skb() rather than kfree_skb() because a hit on the likes
of perf -e skb:kfree_skb is not called-for.
Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>