Scott Feldman [Mon, 16 Mar 2015 04:07:14 +0000 (21:07 -0700)]
switchdev: add swdev ops
As discussed at netconf, introduce swdev_ops as first step to move switchdev
ops from ndo to swdev. This will keep switchdev from cluttering up ndo ops
space.
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 15 Mar 2015 10:12:05 +0000 (21:12 +1100)]
rhashtable: Fix rhashtable_remove failures
The commit 9d901bc05153bbf33b5da2cd6266865e531f0545 ("rhashtable:
Free bucket tables asynchronously after rehash") causes gratuitous
failures in rhashtable_remove.
The reason is that it inadvertently introduced multiple rehashing
from the perspective of readers. IOW it is now possible to see
more than two tables during a single RCU critical section.
Fortunately the other reader rhashtable_lookup already deals with
this correctly thanks to c4db8848af6af92f90462258603be844baeab44d
("rhashtable: rhashtable: Move future_tbl into struct bucket_table")
so only rhashtable_remove is broken by this change.
This patch fixes this by looping over every table from the first
one to the last or until we find the element that we were trying
to delete.
Incidentally the simple test for detecting rehashing to prevent
starting another shrinking no longer works. Since it isn't needed
anyway (the work queue and the mutex serves as a natural barrier
to unnecessary rehashes) I've simply killed the test.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sun, 15 Mar 2015 10:12:04 +0000 (21:12 +1100)]
rhashtable: Fix use-after-free in rhashtable_walk_stop
The commit c4db8848af6af92f90462258603be844baeab44d ("rhashtable:
Move future_tbl into struct bucket_table") introduced a use-after-
free bug in rhashtable_walk_stop because it dereferences tbl after
droping the RCU read lock.
This patch fixes it by moving the RCU read unlock down to the bottom
of rhashtable_walk_stop. In fact this was how I had it originally
but it got dropped while rearranging patches because this one
depended on the async freeing of bucket_table.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
V1->V2:
- refactored field access converter into common helper convert_skb_access()
used in both classic and extended BPF
- added missing build_bug_on for field 'len'
- added comment to uapi/linux/bpf.h as suggested by Daniel
- dropped exposing 'ifindex' field for now
classic BPF has a way to access skb fields, whereas extended BPF didn't.
This patch introduces this ability.
Classic BPF can access fields via negative SKF_AD_OFF offset.
Positive bpf_ld_abs N is treated as load from packet, whereas
bpf_ld_abs -0x1000 + N is treated as skb fields access.
Many offsets were hard coded over years: SKF_AD_PROTOCOL, SKF_AD_PKTTYPE, etc.
The problem with this approach was that for every new field classic bpf
assembler had to be tweaked.
I've considered doing the same for extended, but for every new field LLVM
compiler would have to be modifed. Since it would need to add a new intrinsic.
It could be done with single intrinsic and magic offset or use of inline
assembler, but neither are clean from compiler backend point of view, since
they look like calls but shouldn't scratch caller-saved registers.
Another approach was to introduce a new helper functions like bpf_get_pkt_type()
for every field that we want to access, but that is equally ugly for kernel
and slow, since helpers are calls and they are slower then just loads.
In theory helper calls can be 'inlined' inside kernel into direct loads, but
since they were calls for user space, compiler would have to spill registers
around such calls anyway. Teaching compiler to treat such helpers differently
is even uglier.
They were few other ideas considered. At the end the best seems to be to
introduce a user accessible mirror of in-kernel sk_buff structure:
No new instructions added. LLVM doesn't need to be modified.
JITs don't change and verifier already knows when it accesses 'ctx' pointer.
The only thing needed was to convert user visible offset within __sk_buff
to kernel internal offset within sk_buff.
For 'len' and other fields conversion is trivial.
Converting 'pkt_type' takes 2 or 3 instructions depending on endianness.
More fields can be exposed by adding to the end of the 'struct __sk_buff'.
Like vlan_tci and others can be added later.
When pkt_type field is moved around, goes into different structure, removed or
its size changes, the function convert_skb_access() would need to updated and
it will cover both classic and extended.
Patch 2 updates examples to demonstrates how fields are accessed and
adds new tests for verifier, since it needs to detect a corner case when
attacker is using single bpf instruction in two branches with different
register types.
The 4 fields of __sk_buff are already exposed to user space via classic bpf and
I believe they're useful in extended as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- modify sockex1 example to count number of bytes in outgoing packets
- modify sockex2 example to count number of bytes and packets per flow
- add 4 stress tests that exercise 'skb->field' code path of verifier
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 14 Mar 2015 01:27:17 +0000 (02:27 +0100)]
ebpf: add helper for obtaining current processor id
This patch adds the possibility to obtain raw_smp_processor_id() in
eBPF. Currently, this is only possible in classic BPF where commit da2033c28226 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added
facilities for this.
Perhaps most importantly, this would also allow us to track per CPU
statistics with eBPF maps, or to implement a poor-man's per CPU data
structure through eBPF maps.
David S. Miller [Sun, 15 Mar 2015 23:56:52 +0000 (19:56 -0400)]
Merge branch 'gianfar-next'
Claudiu Manoil says:
====================
gianfar: ARM port driver updates (2/2)
The 2nd round of driver updates to make gianfar portable on ARM,
for the ARM based SoC that integrates eTSEC - "ls1021a".
The patches address the bulk of remaining endianess issues -
handling DMA fields (BD and FCB), and device tree properties.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jingchang Lu [Fri, 13 Mar 2015 08:52:32 +0000 (10:52 +0200)]
gianfar: Consider dts property endianess on handling
Use of_property_read*() to get arch endian consistent
property values. Do some refactoring in the process.
Signed-off-by: Jingchang Lu <jingchang.lu@freescale.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sat, 14 Mar 2015 02:57:24 +0000 (13:57 +1100)]
rhashtable: Add rehash counter to bucket_table
This patch adds a rehash counter to bucket_table to indicate
the last bucket that has been rehashed. This serves two purposes:
1. Any bucket that has been rehashed can never gain a new object.
2. If the rehash counter reaches the size of the table, the table
will forever remain empty.
This patch also downsizes bucket_table->size to an unsigned int
since we do not support sizes greater than 32 bits yet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sat, 14 Mar 2015 02:57:23 +0000 (13:57 +1100)]
rhashtable: Free bucket tables asynchronously after rehash
There is in fact no need to wait for an RCU grace period in the
rehash function, since all insertions are guaranteed to go into
the new table through spin locks.
This patch uses call_rcu to free the old/rehashed table at our
leisure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sat, 14 Mar 2015 02:57:20 +0000 (13:57 +1100)]
rhashtable: Fix walker behaviour during rehash
Previously whenever the walker encountered a resize it simply
snaps back to the beginning and starts again. However, this only
works if the rehash started and completed while the walker was
idle.
If the walker attempts to restart while the rehash is still ongoing,
we may miss objects that we shouldn't have.
This patch fixes this by making the walker walk the old table
followed by the new table just like all other readers. If a
rehash is detected we will still signal our caller of the fact
so they can prepare for duplicates but we will simply continue
the walk onto the new table after the old one is finished either
by us or by the rehasher.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 14 Mar 2015 20:21:59 +0000 (13:21 -0700)]
net: dsa: do not use slave MII bus for fixed PHYs
Commit cd28a1a9baee7 ("net: dsa: fully divert PHY reads/writes if
requested") introduced a check for particular PHYs that need to be
accessed using the slave MII bus created by DSA, but this check was too
inclusive. This would prevent fixed PHYs from being successfully
registered because those should not go through the slave MII bus created
by DSA.
Make sure we check that the PHY is not a fixed PHY to prevent that from
happening.
Fixes: cd28a1a9baee7 ("net: dsa: fully divert PHY reads/writes if requested") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Mar 2015 19:08:02 +0000 (15:08 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-03-13
This series contains updates to ixgbe and ixgbevf.
Don adds additional support for X550 MAC types, which require additional
steps around enabling and disabling Rx. Also cleans up variable type
inconsistency.
I provide a patch to allow relaxed ordering to be enabled on SPARC
architectures. Also cleans up ixgbevf whitespace and code comments to
align the driver with networking coding standard. Lastly cleaned up
uses of memcpy() where ether_addr_copy() could have been used.
Alex removes some dead code in the ixgbe cleanup patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 13 Mar 2015 22:51:10 +0000 (15:51 -0700)]
inet: fill request sock ir_iif for IPv4
Once request socks will be in ehash table, they will need to have
a valid ir_iff field.
This is currently true only for IPv6. This patch extends support
for IPv4 as well.
This means inet_diag_fill_req() can now properly use ir_iif,
which is better for IPv6 link locals anyway, as request sockets
and established sockets will propagate consistent netlink idiag_if.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Mar 2015 18:38:41 +0000 (14:38 -0400)]
Merge branch 'tipc-next'
Jon Maloy says:
====================
tipc: some optimizations and impovements
The commits in this series contain some relatively simple changes that
lead to better throughput across TIPC connections. We also make changes
to the implementation of link transmission queueing and priority
handling, in order to make the code more comprehensible and maintainable.
v2: Commit #2: Redesigned tipc_msg_validate() to use pskb_may_pull(),
as per feedback from David Miller.
Commit #3: Some cosmetic changes to tipc_msg_extract(). I tried to
replace the unconditional skb_linearize() with calls to
pskb_may_pull() at selected locations, but I gave up.
First, skb_trim() requires a fully linearized buffer.
Second, it doesn't make much sense; the whole buffer
will end up linearized, one way or another.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 13 Mar 2015 20:08:11 +0000 (16:08 -0400)]
tipc: clean up handling of message priorities
Messages transferred by TIPC are assigned an "importance priority", -an
integer value indicating how to treat the message when there is link or
destination socket congestion.
There is no separate header field for this value. Instead, the message
user values have been chosen in ascending order according to perceived
importance, so that the message user field can be used for this.
This is not a good solution. First, we have many more users than the
needed priority levels, so we end up with treating more priority
levels than necessary. Second, the user field cannot always
accurately reflect the priority of the message. E.g., a message
fragment packet should really have the priority of the enveloped
user data message, and not the priority of the MSG_FRAGMENTER user.
Until now, we have been working around this problem in different ways,
but it is now time to implement a consistent way of handling such
priorities, although still within the constraint that we cannot
allocate any more bits in the regular data message header for this.
In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
that will be the only one used apart from the four (lower) user data
levels. All non-data messages map down to this priority. Furthermore,
we take some free bits from the MSG_FRAGMENTER header and allocate
them to store the priority of the enveloped message. We then adjust
the functions msg_importance()/msg_set_importance() so that they
read/set the correct header fields depending on user type.
This small protocol change is fully compatible, because the code at
the receiving end of a link currently reads the importance level
only from user data messages, where there is no change.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 13 Mar 2015 20:08:10 +0000 (16:08 -0400)]
tipc: split link outqueue
struct tipc_link contains one single queue for outgoing packets,
where both transmitted and waiting packets are queued.
This infrastructure is hard to maintain, because we need
to keep a number of fields to keep track of which packets are
sent or unsent, and the number of packets in each category.
A lot of code becomes simpler if we split this queue into a transmission
queue, where sent/unacknowledged packets are kept, and a backlog queue,
where we keep the not yet sent packets.
In this commit we do this separation.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 13 Mar 2015 20:08:09 +0000 (16:08 -0400)]
tipc: eliminate unnecessary call to broadcast ack function
The unicast packet header contains a broadcast acknowledge sequence
number, that may need to be conveyed to the broadcast link for proper
treatment. Currently, the function tipc_rcv(), which is on the most
critical data path, calls the function tipc_bclink_acknowledge() to
have this done. This call is made for each received packet, and results
in the unconditional grabbing of the broadcast link spinlock.
This is unnecessary, since we can see directly from tipc_rcv() if
the acknowledged number differs from what has been previously acked
from the node in question. In the vast majority of cases the numbers
won't differ, and there is nothing to update.
We now make the call to tipc_bclink_acknowledge() conditional
to that the two ack values differ.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 13 Mar 2015 20:08:08 +0000 (16:08 -0400)]
tipc: extract bundled buffers by cloning instead of copying
When we currently extract a bundled buffer from a message bundle in
the function tipc_msg_extract(), we allocate a new buffer and explicitly
copy the linear data area.
This is unnecessary, since we can just clone the buffer and do
skb_pull() on the clone to move the data pointer to the correct
position.
This is what we do in this commit.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 13 Mar 2015 20:08:07 +0000 (16:08 -0400)]
tipc: eliminate unnecessary linearization of incoming buffers
Currently, TIPC linearizes all incoming buffers directly at reception
before passing them upwards in the stack. This is clearly a waste of
CPU resources, and must be avoided.
In this commit, we eliminate this unnecessary linearization. We still
ensure that at least the message header is linear, and that the buffer
is linearized where this is still needed, i.e. when unbundling and when
reversing messages.
In addition, we ensure that fragmented messages are validated after
reassembly before delivering them upwards in the stack.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 13 Mar 2015 20:08:06 +0000 (16:08 -0400)]
tipc: move message validation function to msg.c
The function link_buf_validate() is in reality re-entrant and context
independent, and will in later commits be called from several locations.
Therefore, we move it to msg.c, make it outline and rename the it to
tipc_msg_validate().
We also redesign the function to make proper use of pskb_may_pull()
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 13 Mar 2015 20:08:05 +0000 (16:08 -0400)]
tipc: add framework for node capabilities exchange
The TIPC protocol spec has defined a 13 bit capability bitmap in
the neighbor discovery header, as a means to maintain compatibility
between different code and protocol generations. Until now this field
has been unused.
We now introduce the basic framework for exchanging capabilities
between nodes at first contact. After exchange, a peer node's
capabilities are stored as a 16 bit bitmap in struct tipc_node.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Mar 2015 18:29:45 +0000 (14:29 -0400)]
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
Here's another set of Bluetooth & ieee802154 patches intended for 4.1:
- Added support for QCA ROME chipset family in the btusb driver
- at86rf230 driver fixes & cleanups
- ieee802154 cleanups
- Refactoring of Bluetooth mgmt API to allow new users
- New setting for static Bluetooth address exposed to user space
- Refactoring of hci_dev flags to remove limit of 32
- Remove unnecessary fast-connectable setting usage restrictions
- Fix behavior to be consistent when trying to pair already paired device
- Service discovery corner-case fixes
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Mon, 9 Mar 2015 12:56:11 +0000 (13:56 +0100)]
at86rf230: add support for calibration timeout
This patch adds a handling for calibration if we are 5 minutes in PLL
state. I first tried to implement the calibration functionality in
TX_ON state via register values CF_START and DCU_START, but this occurs
a one second delay at each calibration time.
An another solution to start a calibration is to switch from TRX_OFF
state into TX_ON, then a calibration is done automatically by
transceiver. This method will be used in this patch, after each transmit
of a frame we check with jiffies if the PLL is set 5 minutes without
doing a TRX_OFF->(TX_ON || RX_AACK_ON) or channel switch. The worst case
would be a transceiver in receiving mode only, but this is under normal
operation very unlikely.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Cc: Werner Almesberger <werner@almesberger.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Alexander Aring [Mon, 9 Mar 2015 12:56:10 +0000 (13:56 +0100)]
at86rf230: replace state change sleeps with hrtimer
This patch replace the state change timing relevant sleeps with
hrtimers. Currently the sleeps are done in the complete handler of
spi_async. The relation of doing the state change timing sleep with a
timer will get the sleep functionality out of spi_async complete handler
context.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Alexander Aring [Sat, 7 Mar 2015 21:07:07 +0000 (22:07 +0100)]
at86rf230: init xtal_trim with zero
This patch initialize xtal_trim value to zero. The xtal_trim property is
an optional device tree value. Currently if no xtal_trim property is
given the xtal_trim value can be contain random data, because it's a
stack variable. This patch init the xtal_trim value to zero which is
also the default value after reset for at86rf230 transceivers.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Alexander Aring [Wed, 4 Mar 2015 20:19:59 +0000 (21:19 +0100)]
mac802154: correct max sifs size handling
This patch fix the max sifs size correction when the
IEEE802154_HW_TX_OMIT_CKSUM flag is set. With this flag the sk_buff
doesn't contain the CRC, because the transceiver will add the CRC
while transmit.
Also add some defines for the max sifs frame size value and frame check
sequence according to 802.15.4 standard.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Alexander Aring [Mon, 2 Mar 2015 14:10:04 +0000 (15:10 +0100)]
ieee802154: change wpan-phy name to phy
Currently the wpan_phy under /sys/class/ieee802154/ is named as
"wpan-phy#", this patch will change the name to phy. This will
introduce the same naming convention like wireless.
Note: wpan-tools users will not type "wpan-phy#" anymore, just a simple
"phy#" is enough.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Alexander Aring [Mon, 2 Mar 2015 14:10:03 +0000 (15:10 +0100)]
ieee802154: 6lowpan: fix ARPHRD to ARPHRD_6LOWPAN
Currently there exists two interface types with ARPHRD_IEEE802154. These
are the 802.15.4 interfaces and 802.15.4 6LoWPAN interfaces. This is
more a bug because some userspace applications checks on this value like
wireshark. This occurs that wireshark will always try to parse a lowpan
interface as 802.15.4 frames. With ARPHRD_6LOWPAN wireshark will parse
it as IPv6 frames which is correct.
Much applications checks on this value to readout the EUI64 mac address
which should be the same for ARPHRD_6LOWPAN. BTLE 6LoWPAN and ieee802154
6LoWPAN will share now the same ARPHRD.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Jeff Kirsher [Tue, 10 Feb 2015 11:42:33 +0000 (11:42 +0000)]
ixgbevf: Fix code comments and whitespace
Fix the code comments to align with drivers/net/ code commenting style,
as well as whitespace issues. The whitespace issues resolve checkpatch
errors, like lines exceeding 80 chars (except for strings) and the use
of tabs where possible.
CC: <kernel-team@fb.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Alexander Duyck [Wed, 25 Feb 2015 17:45:54 +0000 (17:45 +0000)]
ixgbe: Remove IXGBE_FLAG_IN_NETPOLL since it doesn't do anything
This patch removes some dead code from the cleanup path for ixgbe.
Setting and clearing the flag doesn't do anything since all we are
doing is setting the flag, scheduling NAPI, clearing the flag and
then letting netpoll do the polling cleanup. As such it doesn't
make much sense to have it there.
This patch also removes one minor white-space error.
CC: <kernel-team@fb.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 13 Mar 2015 21:03:25 +0000 (14:03 -0700)]
ixgbe: cleanup make ixgbe_set_ethertype_anti_spoofing_X550 static
Correcting a mistake when I initial created this function. I should
have made this static since it is only referenced where the function
pointer is assigned.
CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 13 Mar 2015 21:01:34 +0000 (14:01 -0700)]
ixgbe: Clean up type inconsistency
Missed this when I created commit 6a14ee0cfb197 ("ixgbe: Add X550 support
function pointers"). Use a the __be* type to be consistent with how the
value is assigned.
CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 13 Mar 2015 20:54:30 +0000 (13:54 -0700)]
ixgbe: add new wrapper for X550 support
For the X550 mac type we have to do additional steps around
enabling/disabling Rx. This patch will add a layer of indirection
around these support functions to enable this.
CC: <kernel-team@fb.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Florian Fainelli [Fri, 13 Mar 2015 19:11:07 +0000 (12:11 -0700)]
net: bcmgenet: add support for xmit_more
Delay the update of the TDMA producer index unless this is the last SKB
in a batch, or the queue is already stopped. Move the check for whether
the queue should be stopped before the xmit_more check to avoid locking
the transmit queue in case there was a SKB submitted which has xmit_more
set.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 13 Mar 2015 19:11:06 +0000 (12:11 -0700)]
net: bcmgenet: update ring producer index and buffer count in xmit
There is no need to have both bcmgenet_xmit_single() and
bcmgenet_xmit_frag() perform a free_bds decrement and a prod_index
increment by one. In case one of these functions fails to map a SKB or
fragment for transmit, we will return and exit bcmgenet_xmit() with an
error.
We can therefore safely use our local copy of nr_frags to know by how
much we should decrement the number of free buffers available, and by
how much the producer count must be incremented and do this in the tail
of bcmgenet_xmit().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petri Gynther [Thu, 12 Mar 2015 22:48:00 +0000 (15:48 -0700)]
net: bcmgenet: rewrite bcmgenet_rx_refill()
Currently, bcmgenet_desc_rx() calls bcmgenet_rx_refill() at the end of
Rx packet processing loop, after the current Rx packet has already been
passed to napi_gro_receive(). However, bcmgenet_rx_refill() might fail
to allocate a new Rx skb, thus leaving a hole on the Rx queue where no
valid Rx buffer exists.
To eliminate this situation:
1. Rewrite bcmgenet_rx_refill() to retain the current Rx skb on the Rx
queue if a new replacement Rx skb can't be allocated and DMA-mapped.
In this case, the data on the current Rx skb is effectively dropped.
2. Modify bcmgenet_desc_rx() to call bcmgenet_rx_refill() at the top of
Rx packet processing loop, so that the new replacement Rx skb is
already in place before the current Rx skb is processed.
Signed-off-by: Petri Gynther <pgynther@google.com> Tested-by: Jaedon Shin <jaedon.shin@gmail.com>-- Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcel Holtmann [Fri, 13 Mar 2015 17:20:35 +0000 (10:20 -0700)]
Bluetooth: Merge hdev->dbg_flags fields into hdev->dev_flags
With the extension of hdev->dev_flags utilizing a bitmap now, the space
is no longer restricted. Merge the hdev->dbg_flags into hdev->dev_flags
to save space on 64-bit architectures. On 32-bit architectures no size
reduction happens.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Daniel Drake [Tue, 24 Feb 2015 00:16:47 +0000 (18:16 -0600)]
Bluetooth: btusb: Add helper for READ_LOCAL_VERSION command
Multiple codepaths duplicate some simple code to read and
sanity-check local version information. Before I add a couple more
such codepaths, add a helper to reduce duplication.
Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Marcel Holtmann [Fri, 13 Mar 2015 05:30:58 +0000 (22:30 -0700)]
Bluetooth: Add support connectable advertising setting
The patch adds a second advertising setting that allows switching of the
controller into connectable mode independent of the global connectable
setting.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This is a small pile of patches that convert tcp_metrics from using a
hash table per network namespace to using a single hash table for all
network namespaces.
This is broken up into several patches so that each small step along
the way could be carefully scrutinized as I wrote it, and equally so
that each small step can be reviewed.
There are several cleanups included in this series. The addition of
panic calls during boot where we can not handle failure, and not trying
simplifies the code. The removal of the return code from
tcp_metrics_flush_all.
The motivation for this change is that the tcp_metrics hash table at
128KiB is one of the largest components of a freshly allocated network
namespace.
I am resending the the previous version I sent has suffered bitrot, so I
have respun the patches so that they apply. I believe I have addressed
all of the review concerns except optimal behavior on little machines
with 32-byte cache lines, which is beyond me as even the current code
has bad behavior in that case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_metrics: Use a single hash table for all network namespaces.
Now that all of the operations are safe on a single hash table
accross network namespaces, allocate a single global hash table
and update the code to use it.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_metrics: Add a field tcpm_net and verify it matches on lookup
In preparation for using one tcp metrics hash table for all network
namespaces add a field tcpm_net to struct tcp_metrics_block, and
verify that field on all hash table lookups.
Make the field tcpm_net of type possible_net_t so it takes no space
when network namespaces are disabled.
Further add a function tm_net to read that field so we can be
efficient when network namespaces are disabled and concise
the rest of the time.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 13 Mar 2015 02:00:58 +0000 (11:00 +0900)]
vxlan: Don't set s_addr in vxlan_create_sock
In the case of AF_INET s_addr was set to INADDR_ANY (0) which which both
symmetric with the AF_INET6 case, where s_addr is not set, and unnecessary
as udp_conf is zeroed out earlier in the same function.
I suspect this change does not have any run-time effect due to compiler
optimisations. But it does make the code a little easier on the/my eyes.
Cc: Tom Herbert <therbert@google.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reobert Shearman noticed that mpls_egress is failing to verify that
the bytes to be examined are in fact present in the packet before
mpls_egress reads those bytes.
As suggested by David Miller reduce this to a single pskb_may_pull
call so that we don't do unnecessary work in the fast path.
Reported-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jaeden Amero [Thu, 12 Mar 2015 23:07:54 +0000 (18:07 -0500)]
net/macb: Only adjust tx_clk on link change
The PHY state machine (in drivers/net/phy/phy.c) will unconditionally
call phydev->adjust_link (macb_handle_link_change) when polling in the
PHY_CHANGELINK state. As currently written, macb always ends up
requesting a new tx_clk frequency in macb_handle_link_change. It is a
waste of time to request a new tx_clk frequency if the link state hasn't
changed, as the tx_clk will already be configured properly.
Let's only request a new tx_clk clock frequency when necessary.
Signed-off-by: Jaeden Amero <jaeden.amero@ni.com> Cc: Josh Cartwright <joshc@ni.com> Cc: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 13 Mar 2015 01:54:10 +0000 (12:54 +1100)]
rhashtable: Fix read-side crash during rehash
This patch fixes a typo rhashtable_lookup_compare where we fail
to recompute the hash when looking up the new table. This causes
elements to be missed and potentially a crash during a resize.
Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 12 Mar 2015 14:28:40 +0000 (15:28 +0100)]
rhashtable: kill ht->shift atomic operations
Commit c0c09bfdc415 ("rhashtable: avoid unnecessary wakeup for worker
queue") changed ht->shift to be atomic, which is actually unnecessary.
Instead of leaving the current shift in the core rhashtable structure,
it can be cached inside the individual bucket tables.
There, it will only be initialized once during a new table allocation
in the shrink/expansion slow path, and from then onward it stays immutable
for the rest of the bucket table liftime.
That allows shift to be non-atomic. The patch also moves hash_rnd
management into the table setup. The rhashtable structure now consumes
3 instead of 4 cachelines.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 12 Mar 2015 11:07:49 +0000 (22:07 +1100)]
rhashtable: Fix reader/rehash race
There is a potential race condition between readers and the rehasher.
In particular, the rehasher could have started a rehash while the
reader finishes a scan of the old table but fails to see the new
table pointer.
This patch closes this window by adding smp_wmb/smp_rmb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 12 Mar 2015 23:44:04 +0000 (16:44 -0700)]
inet: add TCP_NEW_SYN_RECV state
TCP_SYN_RECV state is currently used by fast open sockets.
Initial TCP requests (the pseudo sockets created when a SYN is received)
are not yet associated to a state. They are attached to their parent,
and the parent is in TCP_LISTEN state.
This commit adds TCP_NEW_SYN_RECV state, so that we can convert
TCP stack to a different schem gradually.
This state is not exported to user space.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
I forgot to update dccp_v6_conn_request() & cookie_v6_check().
They both need to set ireq->ireq_net and ireq->ir_cookie
Lets clear ireq->ir_cookie in inet_reqsk_alloc()
Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 33cf7c90fe2f ("net: add real socket cookies") Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 12 Mar 2015 19:03:12 +0000 (20:03 +0100)]
cls_bpf: do eBPF invocation under non-bh RCU lock variant for maps
Currently, it is possible in cls_bpf to access eBPF maps only under
rcu_read_lock_bh() variants: while on ingress side, that is, handle_ing(),
the classifier would be called from __netif_receive_skb_core() under
rcu_read_lock(); on egress side, however, it's rcu_read_lock_bh() via
__dev_queue_xmit().
This rcu/rcu_bh mix doesn't work together with eBPF maps as they require
soley to be called under rcu_read_lock(). eBPF maps could also be shared
among various other eBPF programs (possibly even with other eBPF program
types, f.e. tracing) and user space processes, so any context is assumed.
Therefore, a possible fix for cls_bpf is to wrap/nest eBPF program
invocation under non-bh RCU lock variant.
Fixes: e2e9b6541dd4 ("cls_bpf: add initial eBPF support for programmable classifiers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Mar 2015 22:26:58 +0000 (18:26 -0400)]
Merge branch 'fib_trie_table_merge_fixes'
Alexander Duyck says:
====================
fib_trie: Minor fixes for table merge
This patch set addresses two issues reported with the tables merged, the
first is a NULL pointer dereference, and the other is to remove a WARN_ON
and set the ordering for aliases from different tables with the same slen
values.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 12 Mar 2015 21:46:29 +0000 (14:46 -0700)]
fib_trie: Provide a deterministic order for fib_alias w/ tables merged
This change makes it so that we should always have a deterministic ordering
for the main and local aliases within the merged table when two leaves
overlap.
So for example if we have a leaf with a key of 192.168.254.0. If we
previously added two aliases with a prefix length of 24 from both local and
main the first entry would be first and the second would be second. When I
was coding this I had added a WARN_ON should such a situation occur as I
wasn't sure how likely it would be. However this WARN_ON has been
triggered so this is something that should be addressed.
With this patch the ordering of the aliases is as follows. First they are
sorted on prefix length, then on their table ID, then tos, and finally
priority. This way what we end up doing is essentially interleaving the
two tables on what used to be leaf_info structure boundaries.
Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 12 Mar 2015 21:46:23 +0000 (14:46 -0700)]
fib_trie: Avoid NULL pointer if local table is not allocated
The function fib_unmerge assumed the local table had already been
allocated. If that is not the case however when custom rules are applied
then this can result in a NULL pointer dereference.
In order to prevent this we must check the value of the local table pointer
and if it is NULL simply return 0 as there is no local table to separate
from the main.
Fixes: 0ddcf43d5 ("ipv4: FIB Local/MAIN table collapse") Reported-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 12 Mar 2015 16:21:42 +0000 (17:21 +0100)]
ebpf: verifier: check that call reg with ARG_ANYTHING is initialized
I noticed that a helper function with argument type ARG_ANYTHING does
not need to have an initialized value (register).
This can worst case lead to unintented stack memory leakage in future
helper functions if they are not carefully designed, or unintended
application behaviour in case the application developer was not careful
enough to match a correct helper function signature in the API.
The underlying issue is that ARG_ANYTHING should actually be split
into two different semantics:
1) ARG_DONTCARE for function arguments that the helper function
does not care about (in other words: the default for unused
function arguments), and
2) ARG_ANYTHING that is an argument actually being used by a
helper function and *guaranteed* to be an initialized register.
The current risk is low: ARG_ANYTHING is only used for the 'flags'
argument (r4) in bpf_map_update_elem() that internally does strict
checking.
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>