David S. Miller [Fri, 2 Aug 2013 01:08:34 +0000 (18:08 -0700)]
esp_scsi: Fix tag state corruption when autosensing.
Meelis Roos reports a crash in esp_free_lun_tag() in the presense
of a disk which has died.
The issue is that when we issue an autosense command, we do so by
hijacking the original command that caused the check-condition.
When we do so we clear out the ent->tag[] array when we issue it via
find_and_prep_issuable_command(). This is so that the autosense
command is forced to be issued non-tagged.
That is problematic, because it is the value of ent->tag[] which
determines whether we issued the original scsi command as tagged
vs. non-tagged (see esp_alloc_lun_tag()).
And that, in turn, is what trips up the sanity checks in
esp_free_lun_tag(). That function needs the original ->tag[] values
in order to free up the tag slot properly.
Fix this by remembering the original command's tag values, and
having esp_alloc_lun_tag() and esp_free_lun_tag() use them.
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Aug 2013 23:42:08 +0000 (16:42 -0700)]
Merge branch 'bond_rcu'
Nikolay Aleksandrov says:
====================
This patchset aims to lay the groundwork, and do the initial conversion to
RCUism. I decided that it'll be much better to make the bonding RCU
conversion gradual, so patches can be reviewed and tested better rather
than having one huge patch (which I did in the beginning, before this).
The first patch is straightforward and it converts the bonding to the
standard list API, simplifying a lot of code, removing unnecessary local
variables and allowing to use the nice rculist API later. It also takes
care of some minor styling issues (re-arranging local variables longest ->
shortest, removing brackets for single statement if/else, leaving new line
before return statement etc.).
The second patch simplifies the conversion by removing unnecessary
read_lock(&bond->curr_slave_lock) in xmit paths that are to be converted
later, because we only care if the pointer is NULL or a slave there, since
we already have bond->lock the slave can't go away.
The third patch simplifies the broadcast xmit function by removing
the use of curr_active_slave and converting to standard list API. Also this
design of the broadcast xmit function avoids a subtle double packet tx race
when converted to RCU.
The fourth patch factors out the code that transmits skb through a slave
with given id (i.e. rr_tx_counter in rr mode, hashed value in xor mode) and
simplifies the active-backup xmit path because bond_dev_queue_xmit always
consumes the skb. The new bond_xmit_slave_id function is used in rr and xor
modes currently, but the plans are to use it in 3ad mode as well thus it's
made global. I've left the function prototype to be 81 chars so I wouldn't
break it, if this is an issue I can always break it in more lines.
The fifth patch introduces RCU by converting attach/detach and release to
RCU. It also converts dereferencing of curr_active_slave to rcu_dereference
although it's not fully converted to RCU, that is needed for the converted
xmit paths. And it converts roundrobin, broadcast, xor and active-backup
xmit paths to RCU. The 3ad and ALB/TLB modes acquire read_lock(&bond->lock)
to make sure that no slave will be removed and to sync properly with
enslave and release as before.
This way for the price of a little complexity, we'll be able to convert
individual parts of the bonding to RCU, and test them easier in the
process. If this patchset is accepted in some form, I'll post followups
in the next weeks that gradually convert the bonding to RCU and remove the
need for the rwlocks.
For performance notes please refer to patch 5 (RCU conversion one).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch does the initial bonding conversion to RCU. After it the
following modes are protected by RCU alone: roundrobin, active-backup,
broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for
reading, and will be dealt with later. curr_active_slave needs to be
dereferenced via rcu in the converted modes because the only thing
protecting the slave after this patch is rcu_read_lock, so we need the
proper barrier for weakly ordered archs and to make sure we don't have
stale pointer. It's not tagged with __rcu yet because there's still work
to be done to remove the curr_slave_lock, so sparse will complain when
rcu_assign_pointer and rcu_dereference are used, but the alternative to use
rcu_dereference_protected would've created much bigger code churn which is
more difficult to test and review. That will be converted in time.
1. Active-backup mode
1.1 Perf recording while doing iperf -P 4
- old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU
in bonding
- new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU
in bonding
1.2. Bandwidth measurements
- old bonding: 16.1 gbps consistently
- new bonding: 17.5 gbps consistently
2. Round-robin mode
2.1 Perf recording while doing iperf -P 4
- old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU
in bonding
- new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU
in bonding
2.2 Bandwidth measurements
- old bonding: 8 gbps (variable due to packet reorderings)
- new bonding: 10 gbps (variable due to packet reorderings)
Of course the latency has improved in all converted modes, and moreover
while
doing enslave/release (since it doesn't affect tx anymore).
Also I've stress tested all modes doing enslave/release in a loop while
transmitting traffic.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: factor out slave id tx code and simplify xmit paths
I factored out the tx xmit code which relies on slave id in
bond_xmit_slave_id. It is global because later it can be used also in
3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
mode is simplified because bond_dev_queue_xmit always consumes the skb.
bond_xmit_xor becomes one line because of bond_xmit_slave_id.
bond_for_each_slave_from is not used in bond_xmit_slave_id because later
when RCU is used we can avoid important race condition by using standard
rculist routines.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need to start from the curr_active_slave as the frame will be
sent to all eligible slaves anyway, so we remove the unnecessary local
variables, checks and comments, and make it use the standard list API.
This has the nice side-effect that later when it's converted to RCU
a race condition will be avoided which could lead to double packet tx.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: remove unnecessary read_locks of curr_slave_lock
In all the cases we already hold bond->lock for reading, so the slave
can't get away and the check != NULL is sufficient. curr_active_slave
can still change after the read_lock is unlocked prior to use of the
dereferenced value, so there's no need for it. It either contains a
valid slave which we use (and can't get away), or it is NULL which is
checked.
In some places the read_lock of curr_slave_lock was left because we need
it not to change while performing some action (e.g. syncing current
active slave's addresses, sending ARP requests through the active slave)
such cases will be dealt with individually while converting to RCU.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: convert to list API and replace bond's custom list
This patch aims to remove struct bonding's first_slave and struct
slave's next and prev pointers, and replace them with the standard Linux
list API. The old macros are converted to list API as well and some new
primitives are available now. The checks if there're slaves that used
slave_cnt have been replaced by the list_empty macro.
Also a few small style fixes, changing longest -> shortest line in local
variable declarations, leaving an empty line before return and removing
unnecessary brackets.
This is the first step to gradual RCU conversion.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3)
destinate to 2001::806 with source address as 2001::804/64. That's because
2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is
where the substantial difference between same prefix configuration and
different prefix configuration :) So packet are still transmitted out to
2001::806 with source address as 2001::804/64.
So bump genid will clear rt in (3), and up layer protocol will eventually
find the right one for themselves.
This problem arised from the discussion in here:
http://marc.info/?l=linux-netdev&m=137404469219410&w=4
Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan passthrough mode is special: it's not possible to switch to or
from it through a netlink command.
But if you try, the command will succeed, which is
confusing.
Validate input and return error to user.
Cc: Sridhar Samudrala <sri@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* /home/v4l/v4l/patchwork: (65 commits)
[media] stk1160: Build as a module if SND is m and audio support is selected
[media] cx23885-video: fix two warnings
[media] cx23885[v4]: Fix interrupt storm when enabling IR receiver
[media] S2255: Removal of unnecessary videobuf_queue_is_busy
[media] media: lirc: Allow lirc dev to talk to rc device
[media] media: rc: Add rc_open/close and use count to rc_dev
[media] coda: add CODA7541 decoding support
[media] coda: split encoder specific parts out of device_run and irq_handler
[media] coda: dynamic IRAM setup for decoder
[media] cx23885: Fix TeVii S471 regression since introduction of ts2020
[media] lirc: make transmit interface consistent
[media] lirc: validate transmission ir data
[media] rc: allowed_protos now is a bit field
[media] redrat3: errors on unplug
[media] DocBook: Fix typo in V4L2_CID_JPEG_COMPRESSION_QUALITY reference
[media] V4L: Merge struct v4l2_async_subdev_list with struct v4l2_subdev
[media] V4L: Rename subdev field of struct v4l2_async_notifier
[media] V4L: Add V4L2_ASYNC_MATCH_OF subdev matching type
[media] V4L: Rename v4l2_async_bus_* to v4l2_async_match_*
[media] V4L: Drop bus_type check in v4l2-async match functions
...
* /home/v4l/v4l/for_upstream:
[media] em28xx: fix assignment of the eeprom data
[media] hdpvr: fix iteration over uninitialized lists in hdpvr_probe()
[media] usbtv: fix dependency
[media] usbtv: Throw corrupted frames away
[media] usbtv: Fix deinterlacing
[media] v4l2: added missing mutex.h include to v4l2-ctrls.h
[media] DocBook: upgrade media_api DocBook version to 4.2
[media] ml86v7667: fix compile warning: 'ret' set but not used
[media] s5p-g2d: Fix registration failure
[media] media: coda: Fix DT driver data pointer for i.MX27
[media] s5p-mfc: Fix input/output format reporting
David S. Miller [Thu, 1 Aug 2013 23:14:29 +0000 (16:14 -0700)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:
====================
this is a pull-request for net-next/master. It consists of two patches
by Fabio Estevam. Them first convert the flexcan driver to use
devm_ioremap_resource(), the second adds return value checking for
clk_prepare_enable().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 1 Aug 2013 14:30:59 +0000 (17:30 +0300)]
bnx2x: Revising locking scheme for MAC configuration
On very rare occasions, repeated load/unload stress test in the presence of
our storage driver (bnx2i/bnx2fc) causes a kernel panic in bnx2x code
(NULL pointer dereference). Stack traces indicate the issue happens during MAC
configuration; thorough code review showed that indeed several races exist
in which one thread can iterate over the list of configured MACs while another
deletes entries from the same list.
This patch adds a varient on the single-writer/Multiple-reader lock mechanism -
It utilizes an already exsiting bottom-half lock, using it so that Whenever
a writer is unable to continue due to the existence of another writer/reader,
it pends its request for future deliverance.
The writer / last readers will check for the existence of such requests and
perform them instead of the original initiator.
This prevents the writer from having to sleep while waiting for the lock
to be accessible, which might cause deadlocks given the locks already
held by the writer.
Another result of this patch is that setting of Rx Mode is now made in
sleepable context - Setting of Rx Mode is made under a bottom-half lock, which
was always nontrivial for the bnx2x driver, as the HW/FW configuration requires
wait for completions.
Since sleep was impossible (due to the sleepless-context), various mechanisms
were utilized to prevent the calling thread from sleep, but the truth was that
when the caller thread (i.e, the one calling ndo_set_rx_mode()) returned, the
Rx mode was still not set in HW/FW.
bnx2x_set_rx_mode() will now overtly schedule for the Rx changes to be
configured by the sp_rtnl_task which hold the RTNL lock and is sleepable
context.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_core_start()
ret = tipc_subscr_start()
ret = tipc_server_start(){
server->enabled = 1;
ret = tipc_open_listening_sock()
}
I.e., the server->enabled flag is unconditionally set to 1, whatever
the return value of tipc_open_listening_sock().
This causes a crash when tipc_core_start() tries to clean up
resources after a failed initialization:
if (ret == failed)
tipc_subscr_stop()
tipc_server_stop(){
if (server->enabled)
tipc_close_conn(){
NULL reference of con->sock-sk
OOPS!
}
}
To avoid this, tipc_server_start() should only set server->enabled
to 1 in case of a succesful socket creation. In case of failure, it
should release all allocated resources before returning.
Problem introduced in commit c5fa7b3cf3cb22e4ac60485fc2dc187fe012910f
("tipc: introduce new TIPC server infrastructure") in v3.11-rc1.
Note that it won't be seen often; it takes a module load under memory
constrained conditions in order to trigger the failure condition.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix system hang due to fast igmp timer rescheduling
After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event")
we try to acquire rtnl in bond_resend_igmp_join_requests but it can be
scheduled with rtnl already held (e.g. when bond_change_active_slave is
called with rtnl) causing a loop of immediate reschedules + calls because
rtnl_trylock fails each time since it's being already held.
For me this issue leads to system hangs very easy:
modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod
bonding;
The fix is to introduce a small (1 jiffy) delay which is enough for the
sections holding rtnl to finish without putting any strain on the system.
Also adjust the timer in bond_change_active_slave to be 1 jiffy, since
most of the time it's called with rtnl already held.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Thu, 1 Aug 2013 03:10:25 +0000 (11:10 +0800)]
net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Thu, 1 Aug 2013 03:10:24 +0000 (11:10 +0800)]
net: fix a compile error when CONFIG_NET_LL_RX_POLL is not set
When CONFIG_NET_LL_RX_POLL is not set, I got:
net/socket.c: In function ‘sock_poll’:
net/socket.c:1165:4: error: implicit declaration of function ‘sk_busy_loop’ [-Werror=implicit-function-declaration]
Fix this by adding a nop when !CONFIG_NET_LL_RX_POLL.
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx4_core: VFs must ignore the enable_64b_cqe_eqe module param
Slaves get the 64B CQE/EQE state from QUERY_HCA, not from the module parameter.
If the parameter is set to zero, the slave outputs an incorrect/irrelevant
warning message that 64B CQEs/EQEs are supported but not enabled (even if the
hypervisor has enabled 64B CQEs/EQEs).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 1 Aug 2013 16:55:00 +0000 (19:55 +0300)]
net/mlx4_core: Don't give VFs MAC addresses which are derived from the PF MAC
If the user has not assigned a MAC address to a VM, then don't give it MAC which
is based on the PF one. The current derivation scheme is wrong and leads to VM
MAC collisions when the number of cards/hypervisors becomes big enough.
Instead, just give it zeros and let them figure out what to do with that.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support PTP using the tilegx mPIPE (IEEE 1588)
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Metcalf [Thu, 1 Aug 2013 15:36:42 +0000 (11:36 -0400)]
tile: support multiple mPIPE shims in tilegx network driver
The initial driver support was for a single mPIPE shim on the chip
(as is the case for the Gx36 hardware). The Gx72 chip has two mPIPE
shims, so we extend the driver to handle that case.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>