Gao Feng [Wed, 7 Dec 2016 04:23:18 +0000 (12:23 +0800)]
driver: macvlan: Remove the rcu member of macvlan_port
When free macvlan_port in macvlan_port_destroy, it is safe to free
directly because netdev_rx_handler_unregister could enforce one
grace period.
So it is unnecessary to use kfree_rcu for macvlan_port.
Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Wed, 7 Dec 2016 00:44:47 +0000 (08:44 +0800)]
driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcu
There are two functions which would free the ipvl_port now. The first
is ipvlan_port_create. It frees the ipvl_port in the error handler,
so it could kfree it directly. The second is ipvlan_port_destroy. It
invokes netdev_rx_handler_unregister which enforces one grace period
by synchronize_net firstly, so it also could kfree the ipvl_port
directly and safely.
So it is unnecessary to use kfree_rcu to free ipvl_port.
Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 7 Dec 2016 00:15:44 +0000 (01:15 +0100)]
bpf: fix loading of BPF_MAXINSNS sized programs
General assumption is that single program can hold up to BPF_MAXINSNS,
that is, 4096 number of instructions. It is the case with cBPF and
that limit was carried over to eBPF. When recently testing digest, I
noticed that it's actually not possible to feed 4096 instructions
via bpf(2).
The check for > BPF_MAXINSNS was added back then to bpf_check() in cbd357008604 ("bpf: verifier (add ability to receive verification log)").
However, 09756af46893 ("bpf: expand BPF syscall with program load/unload")
added yet another check that comes before that into bpf_prog_load(),
but this time bails out already in case of >= BPF_MAXINSNS.
Fix it up and perform the check early in bpf_prog_load(), so we can drop
the second one in bpf_check(). It makes sense, because also a 0 insn
program is useless and we don't want to waste any resources doing work
up to bpf_check() point. The existing bpf(2) man page documents E2BIG
as the official error for such cases, so just stick with it as well.
Fixes: 09756af46893 ("bpf: expand BPF syscall with program load/unload") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Dec 2016 16:13:49 +0000 (11:13 -0500)]
Merge branch 'ti-cpts-update-and-fixes'
Grygorii Strashko says:
====================
net: ethernet: ti: cpts: update and fixes
It is preparation series intended to clean up and optimize TI CPTS driver to
facilitate further integration with other TI's SoCs like Keystone 2.
Changes in v5:
- fixed copy paste error in cpts_release
- reworked cc.mult/shift and cc_mult initialization
Changes in v4:
- fixed build error in patch
"net: ethernet: ti: cpts: clean up event list if event pool is empty"
- rebased on top of net-next
Changes in v3:
- patches reordered: fixes and small updates moved first
- added comments in code about cpts->cc_mult
- conversation range (maxsec) limited to 10sec
Changes in v2:
- patch "net: ethernet: ti: cpts: rework initialization/deinitialization"
was split on 4 patches
- applied comments from Richard Cochran
- dropped patch
"net: ethernet: ti: cpts: add return value to tx and rx timestamp funcitons"
- new patches added:
"net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs"
and "clocksource: export the clocks_calc_mult_shift to use by timestamp code"
net: ethernet: ti: cpts: fix overflow check period
The CPTS drivers uses 8sec period for overflow checking with
assumption that CPTS retclk will not exceed 500MHz. But that's not
true on some TI platforms (Kesytone 2). As result, it is possible that
CPTS counter will overflow more than once between two readings.
Hence, fix it by selecting overflow check period dynamically as
max_sec_before_overflow/2, where
max_sec_before_overflow = max_counter_val / rftclk_freq.
Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: cpts: calc mult and shift from refclk freq
The cyclecounter mult and shift values can be calculated based on the
CPTS rfclk frequency and timekeepnig framework provides required algos
and API's.
Hence, calc mult and shift basing on CPTS rfclk frequency if both
cpts_clock_shift and cpts_clock_mult properties are not provided in DT (the
basis of calculation algorithm is borrowed from
__clocksource_update_freq_scale() commit 7d2f944a2b83 ("clocksource:
Provide a generic mult/shift factor calculation")). After this change
cpts_clock_shift and cpts_clock_mult DT properties will become optional.
Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
clocksource: export the clocks_calc_mult_shift to use by timestamp code
The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
requires to know mult and shift factors for timestamp conversion from raw
value to nanoseconds (ptp clock). Now these mult and shift factors are
calculated manually and provided through DT, which makes very hard to
support of a lot number of platforms, especially if CPTS refclk is not the
same for some kind of boards and depends on efuse settings (Keystone 2
platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
mult and shift factors.
Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Move DT properties parsing into CPTS driver to simplify CPSW
code and CPTS driver porting on other SoC in the future
(like Keystone 2) - with this change it will not be required
to add the same DT parsing code in Keystone 2 NETCP driver.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation CPTS initialization and deinitialization
(represented by cpts_register/unregister()) does too many static
initialization from .ndo_open(), which is reasonable to do once at probe
time instead, and also require caller to allocate memory for struct cpts,
which is internal for CPTS driver in general.
This patch splits CPTS initialization and deinitialization on two parts:
- static initializtion cpts_create()/cpts_release() which expected to be
executed when parent driver is probed/removed;
- dynamic part cpts_register/unregister() which expected to be executed
when network device is opened/closed.
As result, current code of CPTS parent driver - CPSW - will be simplified
(and it also will allow simplify adding support for Keystone 2 devices in
the future), plus more initialization errors will be catched earlier. In
addition, this change allows to clean up cpts.h for the case when CPTS is
disabled.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: cpts: drop excessive writes to CTRL and INT_EN regs
CPTS module and IRQs are always enabled when CPTS is registered,
before starting overflow check work, and disabled during
deregistration, when overflow check work has been canceled already.
So, It doesn't require to (re)enable CPTS module and IRQs in
cpts_overflow_check().
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WingMan Kwok [Wed, 7 Dec 2016 00:00:39 +0000 (18:00 -0600)]
net: ethernet: ti: cpts: clean up event list if event pool is empty
When a CPTS user does not exit gracefully by disabling cpts
timestamping and leaving a joined multicast group, the system
continues to receive and timestamps the ptp packets which eventually
occupy all the event list entries. When this happns, the added code
tries to remove some list entries which are expired.
Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: cpts: disable cpts when unregistered
The cpts now is left enabled after unregistration.
Hence, disable it in cpts_unregister().
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The ptp clock registered before spinlock, which is protecting it, and
before timecounter and cyclecounter initialization in cpts_register().
So, ensure that ptp clock is registered the last, after everything
else is done.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: cpts: fix unbalanced clk api usage in cpts_register/unregister
There are two issues with TI CPTS code which are reproducible when TI
CPSW ethX device passes few up/down iterations:
- cpts refclk prepare counter continuously incremented after each
up/down iteration;
- devm_clk_get(dev, "cpts") is called many times.
Hence, fix these issues by using clk_disable_unprepare() in
cpts_clk_release() and skipping devm_clk_get() if cpts refclk has been
acquired already.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: allow cpts to be built separately
TI CPTS IP is used as part of TI OMAP CPSW driver, but it's also
present as part of NETCP on TI Keystone 2 SoCs. So, It's required
to enable build of CPTS for both this drivers and this can be
achieved by allowing CPTS to be built separately.
Hence, allow cpts to be built separately and convert it to be
a module as both CPSW and NETCP drives can be built as modules.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: cpts: switch to readl/writel_relaxed()
Switch to readl/writel_relaxed() APIs, because this is recommended
API and the CPTS IP is reused on Keystone 2 SoCs
where LE/BE modes are supported.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Dec 2016 15:59:27 +0000 (10:59 -0500)]
Merge branch 'bnxt_en-RDMA'
Michael Chan says:
====================
bnxt_en: Add interface to support RDMA driver.
This series adds an interface to support a brand new RDMA driver bnxt_re.
The first step is to re-arrange some code so that pci_enable_msix() can
be called during pci probe. The purpose is to allow the RDMA driver to
initialize and stay initialized whether the netdev is up or down.
Then we make some changes to VF resource allocation so that there is
enough resources to support RDMA.
Finally the last patch adds a simple interface to allow the RDMA driver to
probe and register itself with any bnxt_en devices that support RDMA.
Once registered, the RDMA driver can request MSIX, send fw messages, and
receive some notifications.
v2: Fixed kbuild test robot warnings.
David, please consider this series for net-next. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 7 Dec 2016 05:26:21 +0000 (00:26 -0500)]
bnxt_en: Add interface to support RDMA driver.
Since the network driver and RDMA driver operate on the same PCI function,
we need to create an interface to allow the RDMA driver to share resources
with the network driver.
1. Create a new bnxt_en_dev struct which will be returned by
bnxt_ulp_probe() upon success. After that, all calls from the RDMA driver
to bnxt_en will pass a pointer to this struct.
2. This struct contains additional function pointers to register, request
msix, send fw messages, register for async events.
3. If the RDMA driver wants to enable RDMA on the function, it needs to
call the function pointer bnxt_register_device(). A ulp_ops structure
is passed for RCU protected upcalls from bnxt_en to the RDMA driver.
4. The RDMA driver can call firmware APIs using the bnxt_send_fw_msg()
function pointer.
5. 1 stats context is reserved when the RDMA driver registers. MSIX
and completion rings are reserved when the RDMA driver calls
bnxt_request_msix() function pointer.
6. When the RDMA driver calls bnxt_unregister_device(), all RDMA resources
will be cleaned up.
v2: Fixed 2 uninitialized variable warnings.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 7 Dec 2016 05:26:20 +0000 (00:26 -0500)]
bnxt_en: Refactor the driver registration function with firmware.
The driver register function with firmware consists of passing version
information and registering for async events. To support the RDMA driver,
the async events that we need to register may change. Separate the
driver register function into 2 parts so that we can just update the
async events for the RDMA driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 7 Dec 2016 05:26:19 +0000 (00:26 -0500)]
bnxt_en: Reserve RDMA resources by default.
If the device supports RDMA, we'll setup network default rings so that
there are enough minimum resources for RDMA, if possible. However, the
user can still increase network rings to the max if he wants. The actual
RDMA resources won't be reserved until the RDMA driver registers.
v2: Fix compile warning when BNXT_CONFIG_SRIOV is not set.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 7 Dec 2016 05:26:18 +0000 (00:26 -0500)]
bnxt_en: Improve completion ring allocation for VFs.
All available remaining completion rings not used by the PF should be
made available for the VFs so that there are enough rings in the VF to
support RDMA. The earlier workaround code of capping the rings by the
statistics context is removed.
When SRIOV is disabled, call a new function bnxt_restore_pf_fw_resources()
to restore FW resources. Later on we need to add some logic to account
for RDMA resources.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 7 Dec 2016 05:26:17 +0000 (00:26 -0500)]
bnxt_en: Move function reset to bnxt_init_one().
Now that MSIX is enabled in bnxt_init_one(), resources may be allocated by
the RDMA driver before the network device is opened. So we cannot do
function reset in bnxt_open() which will clear all the resources.
The proper place to do function reset now is in bnxt_init_one().
If we get AER, we'll do function reset as well.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 7 Dec 2016 05:26:16 +0000 (00:26 -0500)]
bnxt_en: Enable MSIX early in bnxt_init_one().
To better support the new RDMA driver, we need to move pci_enable_msix()
from bnxt_open() to bnxt_init_one(). This way, MSIX vectors are available
to the RDMA driver whether the network device is up or down.
Part of the existing bnxt_setup_int_mode() function is now refactored into
a new bnxt_init_int_mode(). bnxt_init_int_mode() is called during
bnxt_init_one() to enable MSIX. The remaining logic in
bnxt_setup_int_mode() to map the IRQs to the completion rings is called
during bnxt_open().
v2: Fixed compile warning when CONFIG_BNXT_SRIOV is not set.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 7 Dec 2016 03:32:50 +0000 (19:32 -0800)]
net: sock_rps_record_flow() is for connected sockets
Paolo noticed a cache line miss in UDP recvmsg() to access
sk_rxhash, sharing a cache line with sk_drops.
sk_drops might be heavily incremented by cpus handling a flood targeting
this socket.
We might place sk_drops on a separate cache line, but lets try
to avoid wasting 64 bytes per socket just for this, since we have
other bottlenecks to take care of.
sock_rps_record_flow() should only access sk_rxhash for connected
flows.
Testing sk_state for TCP_ESTABLISHED covers most of the cases for
connected sockets, for a zero cost, since system calls using
sock_rps_record_flow() also access sk->sk_prot which is on the
same cache line.
A follow up patch will provide a static_key (Jump Label) since most
hosts do not even use RFS.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes tun.c to call netif_receive_skb instead of netif_rx
when a packet is received (if CONFIG_4KSTACKS is not enabled to avoid
stack exhaustion). The difference between the two is that netif_rx queues
the packet into the backlog, and netif_receive_skb proccesses the packet
in the current context.
This patch is required for syzkaller [1] to collect coverage from packet
receive paths, when a packet being received through tun (syzkaller collects
coverage per process in the process context).
As mentioned by Eric this change also speeds up tun/tap. As measured by
Peter it speeds up his closed-loop single-stream tap/OVS benchmark by
about 23%, from 700k packets/second to 867k packets/second.
A similar patch was introduced back in 2010 [2, 3], but the author found
out that the patch doesn't help with the task he had in mind (for cgroups
to shape network traffic based on the original process) and decided not to
go further with it. The main concern back then was about possible stack
exhaustion with 4K stacks.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Dec 2016 02:46:58 +0000 (21:46 -0500)]
Merge branch 'w83977af_ir-neatening'
Joe Perches says:
====================
irda: w83977af_ir: Neatening
Originally on top of Arnd's overly long udelay patches because I
noticed a misindented block. That's now already fixed along with some
other whitespace problems. These patches are the remainder style
issues from my original series.
Even though I haven't turned on the netwinder in a box in the
garage in who knows how long, if this device is still used somewhere,
might as well neaten the code too.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 1 Dec 2016 05:48:30 +0000 (08:48 +0300)]
dbri: move dereference after check for NULL
We accidentally introduced a dereference before the NULL check in
xmit_descs() as part of silencing a GCC warning.
Fixes: 16f46050e709 ("dbri: Fix compiler warning") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) When dcbnl_cee_fill() fails to be able to push a new netlink
attribute, it return 0 instead of an error code. From Pan Bian.
2) Two suffix handling fixes to FIB trie code, from Alexander Duyck.
3) bnxt_hwrm_stat_ctx_alloc() goes through all the trouble of setting
and maintaining a return code 'rc' but fails to actually return it.
Also from Pan Bian.
4) ping socket ICMP handler needs to validate ICMP header length, from
Kees Cook.
5) caif_sktinit_module() has this interesting logic:
int err = sock_register(...);
if (!err)
return err;
return 0;
Just return sock_register()'s return value directly which is the
only possible correct thing to do.
6) Two bnx2x driver fixes from Yuval Mintz, return a reasonable
estimate from get_ringparam() ethtool op when interface is down and
avoid trying to use UDP port based tunneling on 577xx chips.
7) Fix ep93xx_eth crash on module unload from Florian Fainelli.
8) Missing uapi exports, from Stephen Hemminger.
9) Don't schedule work from sk_destruct(), because the socket will be
freed upon return from that function. From Herbert Xu.
10) Buggy drivers, of which we know there is at least one, can send a
huge packet into the TCP stack but forget to set the gso_size in the
SKB, which causes all kinds of problems.
Correct this when it happens, and emit a one-time warning with the
device name included so that it can be diagnosed more easily.
From Marcelo Ricardo Leitner.
11) virtio-net does DMA off the stack causes hiccups with VMAP_STACK,
fix from Andy Lutomirski.
12) Fix fec driver compilation with CONFIG_M5272, from Nikita
Yushchenko.
13) mlx5 fixes from Kamal Heib, Saeed Mahameed, and Mohamad Haj Yahia.
(erroneously flushing queues on error, module parameter validation,
etc)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
net/mlx5e: Change the SQ/RQ operational state to positive logic
net/mlx5e: Don't flush SQ on error
net/mlx5e: Don't notify HW when filling the edge of ICO SQ
net/mlx5: Fix query ISSI flow
net/mlx5: Remove duplicate pci dev name print
net/mlx5: Verify module parameters
net: fec: fix compile with CONFIG_M5272
be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.
virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
tcp: warn on bogus MSS and try to amend it
uapi glibc compat: fix outer guard of net device flags enum
net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before writing
netlink: Do not schedule work from sk_destruct
uapi: export nf_log.h
uapi: export tc_skbmod.h
net: ep93xx_eth: Do not crash unloading module
bnx2x: Prevent tunnel config for 577xx
bnx2x: Correct ringparam estimate when DOWN
isdn: hisax: set error code on failure
net: bnx2x: fix improper return value
...
Linus Torvalds [Mon, 5 Dec 2016 20:10:29 +0000 (12:10 -0800)]
shmem: fix shm fallocate() list corruption
The shmem hole punching with fallocate(FALLOC_FL_PUNCH_HOLE) does not
want to race with generating new pages by faulting them in.
However, the wait-queue used to delay the page faulting has a serious
problem: the wait queue head (in shmem_fallocate()) is allocated on the
stack, and the code expects that "wake_up_all()" will make sure that all
the queue entries are gone before the stack frame is de-allocated.
And that is not at all necessarily the case.
Yes, a normal wake-up sequence will remove the wait-queue entry that
caused the wakeup (see "autoremove_wake_function()"), but the key
wording there is "that caused the wakeup". When there are multiple
possible wakeup sources, the wait queue entry may well stay around.
And _particularly_ in a page fault path, we may be faulting in new pages
from user space while we also have other things going on, and there may
well be other pending wakeups.
So despite the "wake_up_all()", it's not at all guaranteed that all list
entries are removed from the wait queue head on the stack.
Fix this by introducing a new wakeup function that removes the list
entry unconditionally, even if the target process had already woken up
for other reasons. Use that "synchronous" function to set up the
waiters in shmem_fault().
This problem has never been seen in the wild afaik, but Dave Jones has
reported it on and off while running trinity. We thought we fixed the
stack corruption with the blk-mq rq_list locking fix (commit 7fe311302f7d: "blk-mq: update hardware and software queues for sleeping
alloc"), but it turns out there was _another_ stack corruptor hiding
in the trinity runs.
Vegard Nossum (also running trinity) was able to trigger this one fairly
consistently, and made us look once again at the shmem code due to the
faults often being in that area.
Reported-and-tested-by: Vegard Nossum <vegard.nossum@oracle.com>. Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net/mlx5e: Change the SQ/RQ operational state to positive logic
When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen
we will have a time interval that the RQ/SQ is not really ready and the
state indicates that its not in FLUSH state because the initial SQ/RQ struct
memory starts as zeros.
Now we changed the state to indicate if the SQ/RQ is opened and we will
set the READY state after finishing preparing all the SQ/RQ resources.
Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Fixes: f2fde18c52a7 ("net/mlx5e: Don't wait for RQ completions on close") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Tue, 6 Dec 2016 15:32:47 +0000 (17:32 +0200)]
net/mlx5e: Don't flush SQ on error
We are doing SQ descriptors cleanup in driver.
Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Tue, 6 Dec 2016 15:32:46 +0000 (17:32 +0200)]
net/mlx5e: Don't notify HW when filling the edge of ICO SQ
We are going to do this a couple of steps ahead anyway.
Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kamal Heib [Tue, 6 Dec 2016 15:32:45 +0000 (17:32 +0200)]
net/mlx5: Fix query ISSI flow
In old FWs query ISSI command is not supported and for some of those FWs
it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR".
In such case instead of failing the driver load, we will treat any FW
status other than 0 for Query ISSI FW command as ISSI not supported and
assume ISSI=0 (most basic driver/FW interface).
In case of driver syndrom (query ISSI failure by driver) we will fail
driver load.
Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kamal Heib [Tue, 6 Dec 2016 15:32:44 +0000 (17:32 +0200)]
net/mlx5: Remove duplicate pci dev name print
Remove duplicate pci dev name printing from mlx5_core_warn/dbg.
Fixes: 5a7883989b1c ('net/mlx5_core: Improve mlx5 messages') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kamal Heib [Tue, 6 Dec 2016 15:32:43 +0000 (17:32 +0200)]
net/mlx5: Verify module parameters
Verify the mlx5_core module parameters by making sure that they are in
the expected range and if they aren't restore them to their default
values.
Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Tue, 6 Dec 2016 11:09:46 +0000 (11:09 +0000)]
net: hns: Fix to conditionally convey RX checksum flag to stack
This patch introduces the RX checksum function to check the
status of the hardware calculated checksum and its error and
appropriately convey status to the upper stack in skb->ip_summed
field.
In hardware, we only support checksum for the following
protocols:
1) IPv4,
2) TCP(over IPv4 or IPv6),
3) UDP(over IPv4 or IPv6),
4) SCTP(over IPv4 or IPv6)
but we support many L3(IPv4, IPv6, MPLS, PPPoE etc) and
L4(TCP, UDP, GRE, SCTP, IGMP, ICMP etc.) protocols.
Hardware limitation:
Our present hardware RX Descriptor lacks L3/L4 checksum
"Status & Error" bit (which usually can be used to indicate whether
checksum was calculated by the hardware and if there was any error
encountered during checksum calculation).
Software workaround:
We do get info within the RX descriptor about the kind of
L3/L4 protocol coming in the packet and the error status. These
errors might not just be checksum errors but could be related to
version, length of IPv4, UDP, TCP etc.
Because there is no-way of knowing if it is a L3/L4 error due
to bad checksum or any other L3/L4 error, we will not (cannot)
convey hardware checksum status(CHECKSUM_UNNECESSARY) for such
cases to upper stack and will not maintain the RX L3/L4 checksum
counters as well.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 80cca775cdc4 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.
However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.
Fix that by adding explicit handling of !defined(CONFIG_M5272) case.
Fixes: 80cca775cdc4 ("net: fec: cache statistics while device is down") Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Duvvuru [Tue, 6 Dec 2016 05:33:50 +0000 (00:33 -0500)]
be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.
OPCODE_COMMON_GET_FN_PRIVILEGES is returning only DEVSEC
privilege (Unrestricted Administrative Privilege) for Lancer NIC functions.
So, driver is failing SET_HSW_CONFIG command, as DEVSEC privilege was not
set in the privilege bitmap. This patch fixes the problem by setting DEVSEC
privilege in SET_HSW_CONFIG’s privilege bitmap.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Lutomirski [Tue, 6 Dec 2016 02:10:58 +0000 (18:10 -0800)]
virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
pointer to the stack and it will OOPS. Copy the address to the heap
to prevent the crash.
Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Laura Abbott <labbott@redhat.com> Reported-by: zbyszek@in.waw.pl Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 6 Dec 2016 01:45:00 +0000 (03:45 +0200)]
net: ethernet: ti: cpsw: fix early budget split
The budget split function requires the phy speed to be known.
While ndo open a phy speed identification is postponed till the
moment link is up. Hence, move it to appropriate callback, when link
is up.
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com> Fixes: 8feb0a196507 ("net: ethernet: ti: cpsw: split tx budget according between channels") Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell says:
If I am reading the code correctly, then I would have two concerns:
1) Has that been tested? That seems like an extremely dramatic
decrease in cwnd. For example, if the cwnd is 80, and there are 40
ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
calculations seem to suggest that after just 11 ACKs the cwnd would be
down to a minimal value of 2 [..]
2) That seems to contradict another passage in the draft [..] where it
sazs:
Just as specified in [RFC3168], DCTCP does not react to congestion
indications more than once for every window of data.
Neal is right. Fortunately we don't have to complicate this by testing
vs. current rtt estimate, we can just revert the patch.
Normal stack already handles this for us: receiving ACKs with ECE
set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
adjusted and prr will take care of cwnd adjustment.
Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event") Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: dsa: mv88e6xxx: rework reset and PPU code
Old Marvell chips (like 88E6060) don't have a PHY Polling Unit (PPU).
Next chips (like 88E6185) have a PPU, which has exclusive access to the
PHY registers, thus must be disabled before access.
Newer chips (like 88E6352) have an indirect mechanism to access the PHY
registers whenever, thus loose control over the PPU (always enabled).
Here's a summary:
Model | PPU? | Has PPU ctrl? | PPU state readable? | PHY access
----- | ---- | -------------- | ------------------- | ----------
6060 | no | no | no | direct
6185 | yes | yes, PPUEn bit | yes, PPUState 2-bit | direct w/ PPU dis.
6352 | yes | no | yes, PPUState 1-bit | indirect
6390 | yes | no | yes, InitState bit | indirect
Depending on the PPU control, a switch may have to restart the PPU when
resetting the switch. Once the switch is reset, we must wait for the PPU
state to be active polling again before accessing the registers.
For that purpose, add new operations to the chips to enable/disable the
PPU, and execute software reset. With these new ops in place, rework the
switch reset code and finally get rid of the MV88E6XXX_FLAG_PPU* flags.
Changes in v3:
- consider 6097 as 6352 (no PPU ops and use mv88e6352_g1_reset).
Changes in v2:
- wait in ppu/reset ops so that ppu_polling is not needed anymore.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 5 Dec 2016 22:30:26 +0000 (17:30 -0500)]
net: dsa: mv88e6xxx: add helper to hardware reset
Add an helper to toggle the eventual GPIO connected to the reset pin.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 5 Dec 2016 22:30:25 +0000 (17:30 -0500)]
net: dsa: mv88e6xxx: add helper to disable ports
Before resetting a switch, the ports should be set to the Disabled state
and the transmit queues should be drained.
Add an helper to explicit that.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 Dec 2016 16:24:29 +0000 (11:24 -0500)]
Merge branch 'Alacritech-SLIC-driver'
Lino Sanfilippo says:
====================
Gigabit ethernet driver for Alacritechs SLIC devices (v4)
this is the forth version of the slicoss gigabit ethernet driver (which is a
rework of the driver from Alacritech which can currently be found under
drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
Kalahari cards, for both copper and fiber.
If this code is accepted the staging version can be removed.
The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).
v4:
- fix wrong driver name in Kconfig file (reported by Rami Rosen)
- remove unused variable from driver struct (reported by Rami Rosen)
- return "err" instead of 0 in slic_load_rcvseq_firmware() (reported by Rami Rosen)
- Fix typos in constants, comments and error message (reported by Markus Böhme)
- fix various warnings concerning signedness (reported by Markus Böhme)
- improve line formatting (reported by Markus Böhme)
- add comment describing the need for SLIC_MAX_TX_COMPLETIONS (suggested by Florian Fainelli)
- do not zero out complete rx descriptor (suggested by Florian Fainelli)
- add missing write barrier (reported by Florian Fainelli)
- remove unneeded assignment of net_device to skb (reported by Florian Fainelli)
- use napi_complete_done() instead of napi_complete (suggested by Florian Fainelli)
- use napi_schedule_irqoff() instead of napi_schedule (suggested by Florian Fainelli)
- do not map error returned by slic_init() to -ENOMEM
- do proper dma syncs before and after rx descriptor status is set to 0
- if after dma sync for CPU rx descriptor is not used return it to HW by means of dma sync for device
v3:
- dont add defines to pci_ids.h but instead put it into the drivers header file
(requested by Greg Kroah-Hartman)
v2:
- remove unusual padding in statistic strings (suggested by Andrew Lunn)
- for mdio register and bit names use defines from mii.h instead of own ones
(suggested by Andrew Lunn)
- remove unused defines
- ensure PCI flush at two more places
- use mmiowb before lock to prevent mmio writes leaking out of lock
- fix some typos in comments
- add copyright and GPL header
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer
interface control) technology. The driver provides basic support without
SLIC for the following devices:
- Mojave cards (single port PCI Gigabit) both copper and fiber
- Oasis cards (single and dual port PCI-x Gigabit) copper and fiber
- Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
There have been some reports lately about TCP connection stalls caused
by NIC drivers that aren't setting gso_size on aggregated packets on rx
path. This causes TCP to assume that the MSS is actually the size of the
aggregated packet, which is invalid.
Although the proper fix is to be done at each driver, it's often hard
and cumbersome for one to debug, come to such root cause and report/fix
it.
This patch amends this situation in two ways. First, it adds a warning
on when this situation occurs, so it gives a hint to those trying to
debug this. It also limit the maximum probed MSS to the adverised MSS,
as it should never be any higher than that.
The result is that the connection may not have the best performance ever
but it shouldn't stall, and the admin will have a hint on what to look
for.
Tested with virtio by forcing gso_size to 0.
v2: updated msg per David's suggestion
v3: use skb_iif to find the interface and also log its name, per Eric
Dumazet's suggestion. As the skb may be backlogged and the interface
gone by then, we need to check if the number still has a meaning.
v4: use helper tcp_gro_dev_warn() and avoid pr_warn_once inside __once, per
David's suggestion
Cc: Jonathan Maxwell <jmaxwell37@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jonas Gorski [Sat, 3 Dec 2016 16:31:45 +0000 (17:31 +0100)]
uapi glibc compat: fix outer guard of net device flags enum
Fix a wrong condition preventing the higher net device flags
IFF_LOWER_UP etc to be defined if net/if.h is included before
linux/if.h.
The comment makes it clear the intention was to allow partial
definition with either parts.
This fixes compilation of userspace programs trying to use
IFF_LOWER_UP, IFF_DORMANT or IFF_ECHO.
Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 5 Dec 2016 17:57:19 +0000 (09:57 -0800)]
net/udp: do not touch skb->peeked unless really needed
In UDP recvmsg() path we currently access 3 cache lines from an skb
while holding receive queue lock, plus another one if packet is
dequeued, since we need to change skb->next->prev
1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08)
2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e)
3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4)
skb->peeked is only needed to make sure 0-length packets are properly
handled while MSG_PEEK is operated.
I had first the intent to remove skb->peeked but the "MSG_PEEK at
non-zero offset" support added by Sam Kumar makes this not possible.
This patch avoids one cache line miss during the locked section, when
skb->len and skb->peeked do not have to be read.
It also avoids the skb_set_peeked() cost for non empty UDP datagrams.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Mon, 5 Dec 2016 17:12:54 +0000 (18:12 +0100)]
net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before writing
WR_OSR_LMT and RD_OSR_LMT have a reset value of 1.
Since the reset value wasn't cleared before writing, the value in the
register would be incorrect if specifying an uneven value for
snps,wr_osr_lmt/snps,rd_osr_lmt.
Zero is a valid value for the properties, since the databook specifies:
maximum outstanding requests = WR_OSR_LMT + 1.
We do not want to change the behavior for existing users when the
property is missing. Therefore, default to 1 if the property is missing,
since that is the same as the reset value.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: hix5hd2_gmac: add tx sg feature and reset/clock control signals
The "hix5hd2" is SoC name, add the generic ethernet driver compatible string.
The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
the SG/TXCSUM/TSO/UFO features.
This patch set only adds the SG(scatter-gather) driver for transmitting,
the drivers of other features will be submitted later.
Add the MAC reset control signals and clock signals.
We make these signals optional to be backward compatible with
the hix5hd2 SoC.
Changes in v2:
- Make the compatible string changes be a separate patch and
the most specific string come first than the generic string
as advised by Rob.
- Make the MAC reset control signals and clock signals optional
to be backward compatible with the hix5hd2 SoC.
- Change the compatible string and give the clock a specific name
in hix5hd2 dts file.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The "mac_core_rst" represents "mac core reset signal", it resets
the mac core including packet processing unit, descriptor processing unit,
tx engine, rx engine, control unit.
The "mac_ifc_rst" represents "mac interface reset signal", it resets
the mac interface. The mac interface unit connects mac core and
data interface like MII/RMII/RGMII. After we set a new value of
interface mode, we must reset mac interface to reload the new mode value.
The "mac_core_rst" and "mac_ifc_rst" are both optional to be
backward compatible with the hix5hd2 SoC.
The "phy_rst" represents "phy reset signal", it does a hardware reset
on the PHY chip. This reset signal is optional if the PHY can work well
without the hardware reset.
Add one more clock signal, the existing is MAC core clock,
and the new one is MAC interface clock.
The MAC interface clock is optional to be backward compatible with
the hix5hd2 SoC.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dongpo Li [Mon, 5 Dec 2016 13:27:59 +0000 (21:27 +0800)]
net: hix5hd2_gmac: add tx scatter-gather feature
"hisi-gemac-v2" adds the SG/TXCSUM/TSO/UFO features.
This patch only adds the SG(scatter-gather) driver for transmitting,
the drivers of other features will be submitted later.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dongpo Li [Mon, 5 Dec 2016 13:27:58 +0000 (21:27 +0800)]
net: hix5hd2_gmac: add generic compatible string
The "hix5hd2" is SoC name, add the generic ethernet driver name.
The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
the SG/TXCSUM/TSO/UFO features.
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Use DSA_TAG_PROTO_EDSA as tag_protocol for the mv88e6097. The
initialisation was missing before.
Fixes: a1f482aa8c33 ("net: dsa: mv88e6xxx: Move the tagging protocol into info") Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Mon, 5 Dec 2016 09:30:52 +0000 (10:30 +0100)]
bpf: add additional verifier tests for BPF_PROG_TYPE_LWT_*
- direct packet read is allowed for LWT_*
- direct packet write for LWT_IN/LWT_OUT is prohibited
- direct packet write for LWT_XMIT is allowed
- access to skb->tc_classid is prohibited for LWT_*
Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 2 Dec 2016 23:55:38 +0000 (15:55 -0800)]
tools: hv: Enable network manager for bonding scripts on RHEL
We found network manager is necessary on RHEL to make the synthetic
NIC, VF NIC bonding operations handled automatically. So, enabling
network manager here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 5 Dec 2016 07:28:21 +0000 (15:28 +0800)]
netlink: Do not schedule work from sk_destruct
It is wrong to schedule a work from sk_destruct using the socket
as the memory reserve because the socket will be freed immediately
after the return from sk_destruct.
Instead we should do the deferral prior to sk_free.
This patch does just that.
Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When we unload the ep93xx_eth, whether we have opened the network
interface or not, we will either hit a kernel paging request error, or a
simple NULL pointer de-reference because:
- if ep93xx_open has been called, we have created a valid DMA mapping
for ep->descs, when we call ep93xx_stop, we also call
ep93xx_free_buffers, ep->descs now has a stale value
- if ep93xx_open has not been called, we have a NULL pointer for
ep->descs, so performing any operation against that address just won't
work
Fix this by adding a NULL pointer check for ep->descs which means that
ep93xx_free_buffers() was able to successfully tear down the descriptors
and free the DMA cookie as well.
Fixes: 1d22e05df818 ("[PATCH] Cirrus Logic ep93xx ethernet driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 5 Dec 2016 20:33:11 +0000 (15:33 -0500)]
Merge branch 'bpf-prog-digest'
Daniel Borkmann says:
====================
Minor BPF cleanups and digest
First two patches are minor cleanups, and the third one adds
a prog digest. For details, please see individual patches.
After this one, I have a set with tracepoint support that makes
use of this facility as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 4 Dec 2016 22:19:41 +0000 (23:19 +0100)]
bpf: add prog_digest and expose it via fdinfo/netlink
When loading a BPF program via bpf(2), calculate the digest over
the program's instruction stream and store it in struct bpf_prog's
digest member. This is done at a point in time before any instructions
are rewritten by the verifier. Any unstable map file descriptor
number part of the imm field will be zeroed for the hash.
When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded. Furthermore, this can also be
exposed through other means such as netlink in case of a tc cls/act
dump (or xdp in future), but also through tracepoints or other
facilities to identify the program. Other than that, the digest can
also serve as a base name for the work in progress kallsyms support
of programs. The digest doesn't depend/select the crypto layer, since
we need to keep dependencies to a minimum. iproute2 will get support
for this facility.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 4 Dec 2016 22:19:40 +0000 (23:19 +0100)]
bpf, cls: consolidate prog deletion path
Commit 18cdb37ebf4c ("net: sched: do not use tcf_proto 'tp' argument from
call_rcu") removed the last usage of tp from cls_bpf_delete_prog(), so also
remove it from the function as argument to not give a wrong impression. tp
is illegal to access from this callback, since it could already have been
freed.
Refactor the deletion code a bit, so that cls_bpf_destroy() can call into
the same code for prog deletion as cls_bpf_delete() op, instead of having
it unnecessarily duplicated.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 4 Dec 2016 22:19:39 +0000 (23:19 +0100)]
bpf: remove type arg from __is_valid_{,xdp_}access
Commit d691f9e8d440 ("bpf: allow programs to write to certain skb
fields") pushed access type check outside of __is_valid_access()
to have different restrictions for socket filters and tc programs.
type is thus not used anymore within __is_valid_access() and should
be removed as a function argument. Same for __is_valid_xdp_access()
introduced by 6a773a15a1e8 ("bpf: add XDP prog type for early driver
filter").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
- implement a proper PHYLIB adjust_link callback to set the duplex mode
accordingly
- do not open code the fetching of a MAC address in OF/DT environments
- demote an error message that occurs more frequently than expected in low
CPU/memory/bandwidth environments
Tested on a Cirrus Logic EP93xx / TS7300 board.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethoc: Demote packet dropped error message to debug
Spamming the console with: net eth1: packet dropped can happen
fairly frequently if the adapter is busy transmitting, demote the
message to a debug print.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Do not open code getting the MAC address exclusively from the
"local-mac-address" property, but instead use of_get_mac_address() which
looks up the MAC address using the 3 typical property names.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ethoc_mdio_poll() which is our PHYLIB adjust_link callback does nothing,
we should at least react to duplex changes and change MODER accordingly.
Speed changes is not a problem, since the OpenCores Ethernet core seems
to be reacting okay without us telling it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 4 Dec 2016 13:30:18 +0000 (15:30 +0200)]
bnx2x: Prevent tunnel config for 577xx
Only the 578xx adapters are capable of configuring UDP ports for
the purpose of tunnelling - doing the same on 577xx might lead to
a firmware assertion.
We're already not claiming support for any related feature for such
devices, but we also need to prevent the configuration of the UDP
ports to the device in this case.
Fixes: f34fa14cc033 ("bnx2x: Add vxlan RSS support") Reported-by: Anikina Anna <anikina@gmail.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 4 Dec 2016 13:30:17 +0000 (15:30 +0200)]
bnx2x: Correct ringparam estimate when DOWN
Until interface is up [and assuming ringparams weren't explicitly
configured] when queried for the size of its rings bnx2x would
claim they're the maximal size by default.
That is incorrect as by default the maximal number of buffers would
be equally divided between the various rx rings.
This prevents the user from actually setting the number of elements
on each rx ring to be of maximal size prior to transitioning the
interface into up state.
To fix this, make a rough estimation about the number of buffers.
It wouldn't always be accurate, but it would be much better than
current estimation and would allow users to increase number of
buffers during early initialization of the interface.
Reported-by: Seymour, Shane <shane.seymour@hpe.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Sun, 4 Dec 2016 13:25:19 +0000 (15:25 +0200)]
net/sched: cls_flower: Set the filter Hardware device for all use-cases
Check if the returned device from tcf_exts_get_dev function supports tc
offload and in case the rule can't be offloaded, set the filter hw_dev
parameter to the original device given by the user.
The filter hw_device parameter should always be set by fl_hw_replace_filter
function, since this pointer is used by dump stats and destroy
filter for each flower rule (offloaded or not).
Fixes: 7091d8c7055d ('net/sched: cls_flower: Add offload support using egress Hardware device') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reported-by: Simon Horman <horms@verge.net.au> Tested-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pan Bian [Sun, 4 Dec 2016 10:43:31 +0000 (18:43 +0800)]
isdn: hisax: set error code on failure
In function hfc4s8s_probe(), the value of return variable err should be
negative on failures. However, when the call to request_region() returns
NULL, the value of err is 0. This patch fixes the bug, assigning
"-EBUSY" to err on the path that request_region() fails.
Pan Bian [Sun, 4 Dec 2016 10:46:03 +0000 (18:46 +0800)]
net: bnx2x: fix improper return value
Macro BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate
memory, and jumps to label "lbl" if the allocation fails. Label "lbl"
first cleans memory and then returns variable rc. Before calling the
macro, the value of variable rc is 0. Because 0 means no error, the
callers of bnx2x_init_firmware() may be misled. This patch fixes the bug,
assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET().
Pan Bian [Sun, 4 Dec 2016 05:53:53 +0000 (13:53 +0800)]
net: ethernet: qlogic: set error code on failure
When calling dma_mapping_error(), the value of return variable rc is 0.
And when the call returns an unexpected value, rc is not set to a
negative errno. Thus, it will return 0 on the error path, and its
callers cannot detect the bug. This patch fixes the bug, assigning
"-ENOMEM" to err.
Pan Bian [Sun, 4 Dec 2016 05:45:15 +0000 (13:45 +0800)]
atm: fix improper return value
It returns variable "error" when ioremap_nocache() returns a NULL
pointer. The value of "error" is 0 then, which will mislead the callers
to believe that there is no error. This patch fixes the bug, returning
"-ENOMEM".
Pan Bian [Sun, 4 Dec 2016 05:27:40 +0000 (13:27 +0800)]
net: irda: set error code on failures
When the calls to kzalloc() fail, the value of return variable ret may
be 0. 0 means success in this context. This patch fixes the bug,
assigning "-ENOMEM" to ret before calling kzalloc().
Erik Nordmark [Sun, 4 Dec 2016 04:57:09 +0000 (20:57 -0800)]
ipv6: Allow IPv4-mapped address as next-hop
Made kernel accept IPv6 routes with IPv4-mapped address as next-hop.
It is possible to configure IP interfaces with IPv4-mapped addresses, and
one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior
to this fix the kernel returned an EINVAL when attempting to add an IPv6
route with an IPv4-mapped address as a nexthop/gateway.
RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops,
thus in order to support that type of address configuration the kernel
needs to allow IPv4-mapped addresses as nexthops.
Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>