git.karo-electronics.de Git - karo-tx-linux.git/log

]> git.karo-electronics.de Git - karo-tx-linux.git/log

Jeff Skirvin [Fri, 9 Mar 2012 06:41:58 +0000 (22:41 -0800)]

isci: Distinguish between remote device suspension cases

For NCQ error conditions among others, there is no need to enable
the link layer hang detect timer.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:57 +0000 (22:41 -0800)]

isci: Remove isci_device reqs_in_process and dev_node from isci_device.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:56 +0000 (22:41 -0800)]

isci: Only set IDEV_GONE in the device stop path.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:56 +0000 (22:41 -0800)]

isci: All pending TCs are terminated when the RNC is invalidated.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:55 +0000 (22:41 -0800)]

isci: Device access in the error path does not depend on IDEV_GONE.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:54 +0000 (22:41 -0800)]

isci: Add suspension cases for RNC INVALIDATING, POSTING states.

The RNC can be any of the states in the loop from suspended to
ready when the API "suspend" or "resume" are called. This change
adds destination states parameters that control the suspension /
resumption action of the RNC statemachine for those transition states.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:54 +0000 (22:41 -0800)]

isci: Redesign device suspension, abort, cleanup.

This commit changes the means by which outstanding I/Os are handled
for cleanup.
The likelihood is that this commit will be broken into smaller pieces,
however that will be a later revision. Among the changes:

- All completion structures have been removed from the tmf and
abort paths.
- Now using one completed I/O list, with the I/O completed in host bit being
used to select error or normal callback paths.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:53 +0000 (22:41 -0800)]

isci: Escalate to I_T_Nexus_Reset when the device is gone.

If LUN reset sees that the device is gone, it returns TMF_RESP_FUNC_FAILED
to cause libsas to escalate to an I_T_Nexus_Reset.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:52 +0000 (22:41 -0800)]

isci: Remote device stop also suspends the RNC and terminates I/O.

Fixing the remote device state machine to suspend and terminate
all outstanding I/O before the device stopped state is reached.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:52 +0000 (22:41 -0800)]

isci: Remote device must be suspended for NCQ cleanup.

When the remote device enters the NCQ error state, the device must
be suspended so that the I/O terminations can take place.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:51 +0000 (22:41 -0800)]

isci: Manage device suspensions during TC terminations.

TCs must be terminated only while the RNC is suspended. This commit
adds remote device suspensions and resumptions in the abort, reset and
termination paths.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:50 +0000 (22:41 -0800)]

isci: Terminate outstanding TCs on TX/RX RNC suspensions.

TCs must only be terminated when RNCs are suspended.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:50 +0000 (22:41 -0800)]

isci: Handle all suspending TC completions

Add comprehensive decode for all TC completions that generate RNC
suspensions.

Note that this commit also removes unconditional resumptions of ATAPI
devices when in the SCI_STP_DEV_ATAPI_ERROR state, and STP devices
when in the SCI_STP_DEV_IDLE state. This is because the SCI_STP_DEV_IDLE
and SCI_STP_DEV_ATAPI state entry functions manage the RNC resumption.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:49 +0000 (22:41 -0800)]

isci: Fixed bug in resumption from RNC Tx/Rx suspend state.

The resumption from the Tx/Rx suspended state should work the same
as the Tx suspended state.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 9 Mar 2012 06:41:48 +0000 (22:41 -0800)]

isci: Manage the link layer hang detect timer for RNC suspensions.

For STP devices under certain protocol conditions, an RNC will not
suspend until the current transfer state is broken with a SYNC/ESC
sequence from the SCU. The SYNC/ESC driven by expiration of the
SCU link layer hang detect timer, which has too small a dynamic
range to support slow SATA devices, so normally it is disabled.

This change enables the timer with the minimum period at the point
when the suspension is requested.

Note that there is potential collateral damage to other open
connections to slow SATA devices on the same port, since there
is no alternative but to enable the LLHANG timer on every phy in
the port for the current suspension request - there is no way to
tell on which phy the RNC in question is currently active.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 23 Feb 2012 09:12:10 +0000 (01:12 -0800)]

isci: fix controller stop

1/ notify waiters when controller stop completes (fixes 10 second stall
unloading the driver)
2/ make sure phy stop is after port and device stop

Cc: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Wed, 15 Feb 2012 21:58:42 +0000 (13:58 -0800)]

isci: refactor initialization for S3/S4

Based on an original implementation by Ed Nadolski and Artur Wojcik

In preparation for S3/S4 support refactor initialization so that
driver-load and resume-from-suspend can share the common init path of
isci_host_init().  Organize the initialization into objects that are
self-contained to the driver (initialized by isci_host_init) versus
those that have some upward registration (initialized at allocation time
asd_sas_phy, asd_sas_port, dma allocations).  The largest change is
moving the the validation of the oem and module parameters from
isci_host_init() to isci_host_alloc().

The S3/S4 approach being taken is that libsas will be tasked with
remembering the state of the domain and the lldd is free to be
forgetful.  In the case of isci we'll just re-init using a subset of the
normal driver load path.

[clean up some unused / mis-indented function definitions in host.h]

Signed-off-by: Ed Nadolski <edmund.nadolski@intel.com>
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Sat, 18 Feb 2012 00:30:47 +0000 (16:30 -0800)]

isci: kill isci_port.domain_dev_list

Another unused field, and isci_port_init is overkill.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Wed, 15 Feb 2012 21:20:31 +0000 (13:20 -0800)]

isci: kill ->status, and ->state_lock in isci_host

They serve no incremental purpose over the existing sas_ha state.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Tom Jackson [Fri, 24 Feb 2012 09:38:49 +0000 (09:38 +0000)]

isci: Don't filter BROADCAST CHANGE primitives

Per the SAS spec, several types of BROADCAST CHANGE primitives
must cause re-discovery of the originating expander.
Only the standard BROADCAST CHANGE primitive was being
sent to the LIBSAS layer. The other BC primitives have been
added to the sci_phy_event_handler()

Signed-off-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Wed, 1 Feb 2012 08:44:14 +0000 (00:44 -0800)]

isci: kill sci_phy_protocol and sci_request_protocol

Holdovers from the initial driver cleanup, replace with enum sas_protocol.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Wed, 1 Feb 2012 08:23:10 +0000 (00:23 -0800)]

isci: kill ->is_direct_attached

domain_device ->parent conveys the same information.

Occurrences of ->is_direct_attached appear next to incomplete open-coded
versions of dev_is_sata(), clean those up as well.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Fri, 10 Feb 2012 09:05:43 +0000 (01:05 -0800)]

isci: improve 'invalid state' warnings

Convert controller state machine warnings to emit the state number (it
missed the number to string conversion, but since these error rarely
happen not much motivation to go further).

Fix up the rnc warnings to use the state name.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jesper Juhl [Sun, 5 Feb 2012 00:37:56 +0000 (01:37 +0100)]

isci: Just #include "host.h" once in host.c

There's no need to include the header twice.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Mar 2012 04:09:12 +0000 (21:09 -0700)]

libsas: suspend / resume support

libsas power management routines to suspend and recover the sas domain
based on a model where the lldd is allowed and expected to be
"forgetful".

sas_suspend_ha - disable event processing allowing the lldd to take down
                 links without concern for causing hotplug events.
                 Regardless of whether the lldd actually posts link down
                 messages libsas notifies the lldd that all
                 domain_devices are gone.

sas_prep_resume_ha - on the way back up before the lldd starts link
                     training clean out any spurious events that were
                     generated on the way down, and re-enable event
                     processing

sas_resume_ha - after the lldd has started and decided that all phys
have posted link-up events this routine is called to let
libsas start it's own timeout of any phys that did not
resume.  After the timeout an lldd can cancel the
                phy teardown by posting a link-up event.

Storage for ex_change_count (u16) and phy_change_count (u8) are changed
to int so they can be set to -1 to indicate 'invalidated'.

Cc: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Mar 2012 04:09:11 +0000 (21:09 -0700)]

scsi, sd: limit the scope of the async probe domain

sd injects and synchronizes probe work on the global kernel-wide domain.
This runs into conflict with PM that wants to perform resume actions in
async context:

[  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
[  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
[  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
[  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
[  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
[  494.611685] Call Trace:
[  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
[  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
[  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
[  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
[  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
[  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
[  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
[  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
[  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()

[  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
[  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
[  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
[  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
[  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
[  853.886633] Call Trace:
[  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
[  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
[  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
[  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
[  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context

Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
be flushed without regard for the state of PM, and allow for the resume
path to handle devices that have transitioned from SDEV_QUIESCE to
SDEV_DEL prior to resume.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
[alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Mar 2012 04:09:10 +0000 (21:09 -0700)]

libsas: drop sata port multiplier infrastructure

On the way to add a new sata_device field, noticed that libsas is
carrying port multiplier infrastructure that is explicitly disabled by
sas_discover_sata(). The aic94xx touches the unused port_no, so leave
that field in case there was some use for it.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Mar 2012 04:09:08 +0000 (21:09 -0700)]

libata: export ata_port suspend/resume infrastructure for sas

Reuse ata_port_{suspend|resume}_common for sas. This path is chosen
over adding coordination between ata-tranport and sas-transport because
libsas wants to revalidate the domain at resume-time at the host level.
It can not validate links have resumed properly until libata has had a
chance to perform its revalidation, and any sane placing of an ata_port
in the sas-transport model would delay it's resumption until after the
host.

Export the common portion of port suspend/resume (bypass pm_runtime),
and allow sas to perform these operations asynchronously (similar to the
libsas async-ata probe implmentation). Async operation is determined by
having an external, rather than stack based, location for storing the
result of the operation.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Mar 2012 04:09:06 +0000 (21:09 -0700)]

libsas: continue revalidation

Continue running revalidation until no more broadcast devices are
discovered.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Tue, 20 Mar 2012 17:58:35 +0000 (10:58 -0700)]

libata: reset once

Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up. In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Carpenter [Thu, 8 Mar 2012 07:26:33 +0000 (10:26 +0300)]

mvsas: remove unused variable in mvs_task_exec()

We don't use "dev" any more after 07ec747a5f ("libsas: remove
ata_port.lock management duties from lldds") and it causes a compile
warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Jeff Skirvin [Sat, 10 Mar 2012 05:55:50 +0000 (05:55 +0000)]

libsas: sas_rediscover_dev did not look at the SMP exec status.

The discovery function "sas_rediscover_dev" had two bugs: 1) it did
not pay attention to the return status from the SMP task execution;
2) the stack variable used for the returned SAS address was compared
against 0 without being initialized.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
[djbw: todo sanitize smp_execute_task return values (see: residue)]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 9 Feb 2012 04:52:40 +0000 (20:52 -0800)]

libsas: trim sas_task of slow path infrastructure

The timer and the completion are only used for slow path tasks (smp, and
lldd tmfs), yet we incur the allocation space and cpu setup time for
every fast path task.

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Wed, 1 Feb 2012 09:42:02 +0000 (01:42 -0800)]

isci: use sas eh strategy handlers

...now that the strategy handlers guarantee eh context and notify
the driver of bus reset.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Wed, 1 Feb 2012 09:12:23 +0000 (01:12 -0800)]

libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler

sas_eh_bus_reset_handler() amounts to sas_phy_reset() without
notification of the reset to the lldd. If this is triggered from
eh-cmnd recovery there may be sas_tasks for the lldd to terminate, so
->lldd_I_T_nexus_reset is warranted.

Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Cc: Jack Wang <jack_wang@usish.com>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
[jacek: modify pm8001_I_T_nexus_reset to return -ENODEV]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Tue, 31 Jan 2012 08:07:27 +0000 (00:07 -0800)]

libsas: add sas_eh_abort_handler

When recovering failed eh-cmnds let the lldd attempt an abort via
scsi_abort_eh_cmnd before escalating.

Acked-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Tue, 31 Jan 2012 07:39:11 +0000 (23:39 -0800)]

libsas: enforce eh strategy handlers only in eh context

The strategy handlers may be called in places that are problematic for
libsas (i.e. sata resets outside of domain revalidation filtering /
libata link recovery), or problematic for userspace (non-blocking ioctl
to sleeping reset functions). However, these routines are also called
for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
as long as we are running in the host's error handler, otherwise arrange
for them to be triggered in eh_context.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Tue, 20 Mar 2012 17:58:38 +0000 (10:58 -0700)]

scsi_transport_sas: fix delete vs scan race

The following crash results from cases where the end_device has been
removed before scsi_sysfs_add_sdev has had a chance to run.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
...
Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a
  [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56
  [<ffffffff8107de15>] ? module_refcount+0x89/0xa0
  [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a
  [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145

...teach sas_rphy_remove to wait for async scanning to quiesce before
removing the end_device.  It seems this is a more general problem [1],
but this patch only addresses sas transport.

[1]: 23edb6e [SCSI] mpt2sas: Do not set sas_device->starget to NULL from
the slave_destroy callback when all the LUNS have been deleted

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Tue, 20 Mar 2012 17:50:27 +0000 (10:50 -0700)]

libsas: fix false positive 'device attached' conditions

Normalize phy->attached_sas_addr to return a zero-address in the case
when device-type == NO_DEVICE or the linkrate is invalid to handle
expanders that put non-zero sas addresses in the discovery response:

sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device)

Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Fri, 6 Apr 2012 23:35:36 +0000 (16:35 -0700)]

scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain. When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh.

Cc: Tejun Heo <tj@kernel.org>
Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Mar 2012 04:09:07 +0000 (21:09 -0700)]

libsas, libata: fix start of life for a sas ata_port

This changes the ordering of initialization and probing events from:
  1/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate ata_port and schedule port probe in DISCE_PROBE
...to:
  1/ allocate ata_port in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  3/ schedule port probe in DISCE_PROBE

This ordering prevents PHYE_SIGNAL_LOSS_EVENTS from sneaking in to
destrory ata devices before they have been fully initialized:

  BUG: unable to handle kernel paging request at 0000000000003b10
  IP: [<ffffffffa0053d7e>] sas_ata_end_eh+0x12/0x5e [libsas]
  ...
  [<ffffffffa004d1af>] sas_unregister_common_dev+0x78/0xc9 [libsas]
  [<ffffffffa004d4d4>] sas_unregister_dev+0x4f/0xad [libsas]
  [<ffffffffa004d5b1>] sas_unregister_domain_devices+0x7f/0xbf [libsas]
  [<ffffffffa004c487>] sas_deform_port+0x61/0x1b8 [libsas]
  [<ffffffffa004bed0>] sas_phye_loss_of_signal+0x29/0x2b [libsas]

...and kills the awkward "sata domain_device briefly existing in the
domain without an ata_port" state.

Reported-by: Michal Kosciowski <michal.kosciowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Mar 2012 04:09:05 +0000 (21:09 -0700)]

libata: make ata_print_id atomic

This variable is incremented from multiple contexts (module_init via
libata-lldds and the libsas discovery thread). Make it atomic to head
off any chance of libsas and libata creating duplicate ids.

Acked-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Tue, 20 Mar 2012 20:24:29 +0000 (13:24 -0700)]

libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready

The check_ready implementation in the expander-attached ata device case
polls on sas_ex_phy_discover().  The effect is that the ex_phy fields
(critically ->attached_sas_addr) can change.  When ata_eh ends and
libsas comes along to revalidate the domain
sas_unregister_devs_sas_addr() can fail to lookup devices to remove, or
fail to re-add an ata device that ata_eh marked as disabled.  So change
the code to skip the sas_address and change count updates when ata_eh is
active.

Cc: Jack Wang <jack_wang@usish.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Bartek Nowakowski <bartek.nowakowski@intel.com>
Tested-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Tue, 20 Mar 2012 17:53:24 +0000 (10:53 -0700)]

libsas: unify domain_device sas_rphy lifetimes

Since the domain_device can out live the scsi_target we need the rphy to
follow suit otherwise we run into issues like:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
  IP: [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
  PGD 0
  Oops: 0000 [#1] SMP
  CPU 1
  Modules linked in: ses enclosure isci libsas scsi_transport_sas fuse sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf microcode pcspkr igb joydev iTCO_wdt ioatdma iTCO_vendor_support i2c_i801 i2c_core dca wmi hed ipv6 pata_acpi ata_generic [last unloaded: scsi_wait_scan]

  Pid: 129, comm: kworker/u:3 Not tainted 3.3.0-rc5-isci+ #1 Intel Corporation SandyBridge Platform/To be filled by O.E.M.
  RIP: 0010:[<ffffffffa011561b>] [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
  RSP: 0018:ffff88042232dd70 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff8804283165b8 RCX: ffff88042232dda0
  RDX: ffff88042232dd78 RSI: ffff8804283165b8 RDI: ffffffffa01188d7
  RBP: ffff88042232ddd0 R08: ffff880388454000 R09: ffff8803edfde1f8
  R10: ffff8803edfde1f8 R11: ffff8803edfde1f8 R12: ffff880428316750
  R13: ffff880388454000 R14: ffff8803f88b31d0 R15: ffff8803f8b21d50
  FS: 0000000000000000(0000) GS:ffff88042ee20000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000050 CR3: 0000000001a05000 CR4: 00000000000406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:3 (pid: 129, threadinfo ffff88042232c000, task ffff88042230c920)
  Stack:
  0000000000000000 ffff880400000018 ffff88042232dde0 ffff88042232dda0
  ffffffffa01188c4 ffff88042ee93af0 ffff88042232ddb0 ffffffff8100e047
  ffff88042232de10 ffff880420e5a2c8 ffff8803f8b21d50 ffff8803edfde1f8
  Call Trace:
  [<ffffffff8100e047>] ? load_TLS+0xb/0xf
  [<ffffffffa01156ad>] async_sas_ata_eh+0x66/0x95 [libsas]
  [<ffffffff810655e1>] async_run_entry_fn+0x9e/0x131

Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Mon, 12 Mar 2012 18:38:26 +0000 (11:38 -0700)]

libsas: fix sas_get_port_device regression

Commit 899fcf4 "[SCSI] libsas: set attached device type and target
protocols for local phys" setup 'phy' to be dereferenced after
list_for_each_entry(phy, &port->phy_list, port_phy_el) (i.e. phy ==
&port->phy_list) resulting in reports like:

BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
IP: [<ffffffffa00ce948>] sas_discover_domain+0x29e/0x4fb [libsas]

...fix by deferring sas_phy_set_target() to the end of
sas_get_port_device().

Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Thomas Jackson [Sat, 18 Feb 2012 02:33:10 +0000 (18:33 -0800)]

libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys

If an expander reports 'PHY VACANT' for a phy index prior to the one
that generated a BCN libsas fails rediscovery. Since a vacant phy is
defined as a valid phy index that will never have an attached device
just continue the search.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Fri, 2 Mar 2012 02:44:25 +0000 (18:44 -0800)]

libata, libsas: introduce sched_eh and end_eh port ops

When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).

Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time. Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.

Cc: Tejun Heo <tj@kernel.org>
Acked-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Maciej Trela [Mon, 5 Mar 2012 01:58:55 +0000 (17:58 -0800)]

libsas: cleanup spurious calls to scsi_schedule_eh

eh is woken up automatically by the presence of failed commands,
scsi_schedule_eh is reserved for cases where there are no failed
commands. This guarantees that host_eh_sceduled is only incremented
when an explicit eh request is made.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
[fixed spurious delete of sas_ata_task_abort]
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Dan Williams [Fri, 9 Mar 2012 19:00:06 +0000 (11:00 -0800)]

libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work

When requeuing work to a draining workqueue the last work instance may
not be idle, so sas_queue_work() must not touch work->entry.  Introduce
sas_work with a drain_node list_head to have a private list for
collecting work deferred due to drain collision.

Fixes reports like:
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810410d4>] process_one_work+0x2e/0x338

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

commit | commitdiff | tree

Vikas Chaudhary [Mon, 27 Feb 2012 11:08:58 +0000 (03:08 -0800)]

[SCSI] qla4xxx: Update driver version to 5.02.00-k15

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Vikas Chaudhary [Mon, 27 Feb 2012 11:08:57 +0000 (03:08 -0800)]

[SCSI] qla4xxx: trivial cleanup

1. Do not initialise globals to 0
2. Fix wrong spelling in debug message
3. Modified debug log messages

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Vikas Chaudhary [Mon, 27 Feb 2012 11:08:56 +0000 (03:08 -0800)]

[SCSI] qla4xxx: Fix sparse warning

Fix following warning:-
drivers/scsi/qla4xxx/ql4_os.c:35:5: warning: symbol 'ql4xdisablesysfsboot' was not declared. Should it be static?

drivers/scsi/qla4xxx/ql4_iocb.c:461:5: warning: symbol 'qla4xxx_send_mbox_iocb' was not declared. Should it be static?

drivers/scsi/qla4xxx/ql4_os.c:3025:6: warning: symbol 'qla4xxx_do_work' was not declared. Should it be static?

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Manish Rangankar [Mon, 27 Feb 2012 11:08:55 +0000 (03:08 -0800)]

[SCSI] qla4xxx: Add support for multiple session per host.

This patch will allow iscsiadm to create multiple session
for the same target on per host.

Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Mike Christie [Mon, 27 Feb 2012 11:08:54 +0000 (03:08 -0800)]

[SCSI] qla4xxx: Export CHAP index as sysfs attribute

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Mike Christie [Mon, 27 Feb 2012 11:08:53 +0000 (03:08 -0800)]

[SCSI] scsi_transport: Export CHAP index as sysfs attribute

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Nilesh Javali [Mon, 27 Feb 2012 11:08:52 +0000 (03:08 -0800)]

[SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry

For offload iSCSI like qla4xxx CHAP entries are stored in FLASH.
This patch adds support to list CHAP entries stored in FLASH and
delete specified CHAP entry from FLASH using iscsi tools.

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Nilesh Javali [Mon, 27 Feb 2012 11:08:51 +0000 (03:08 -0800)]

[SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry

For offload iSCSI like qla4xxx CHAP entries are stored in FLASH.
This patch adds support to list CHAP entries stored in FLASH and
delete specified CHAP entry from FLASH using iscsi tools.

Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com>
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Santosh Nayak [Sun, 26 Feb 2012 14:44:46 +0000 (20:14 +0530)]

[SCSI] pm8001: fix endian issue with code optimization.

1. Fix endian issue.
2. Fix the following warning :
" drivers/scsi/pm8001/pm8001_hwi.c:2932:32: warning: comparison
between ‘enum sas_device_type’ and ‘enum sas_dev_type’".
3. Few code optimization.

Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Santosh Nayak [Sun, 26 Feb 2012 13:35:03 +0000 (19:05 +0530)]

[SCSI] pm8001: Fix possible racing condition.

There is a possble racing scenario.

'process_oq' is called by two routines, as shown below.

pm8001_8001_dispatch = {
         .......

        .isr             = pm8001_chip_isr --> process_oq,// A
        .isr_process_oq  = process_oq,                   //  B
        .....
}

process_oq() --> process_one_iomb() --> mpi_sata_completion()

In 'mpi_sata_completion', "pm8001_ha->lock" is first released.
It means lock is taken before,  which is true for
the context A, as 'pm8001_ha->lock' is taken in 'pm8001_chip_isr()'

But for context B there is no lock taken before and pm8001_ha->lock
is unlocked in 'mpi_sata_completion()'. This may unlock the lock
taken in context A. Possible racing ??

If 'pm8001_ha->lock' is taken in 'process_oq()' instead of
'pm8001_chip_isr' then the above issue can be avoided.

Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Santosh Nayak [Sun, 26 Feb 2012 13:33:30 +0000 (19:03 +0530)]

[SCSI] pm8001: Fix bogus interrupt state flag issue.

Static checker is giving following warning:
" error: calling 'spin_unlock_irqrestore()' with bogus flags"

The code flow is as shown below:
process_oq() --> process_one_iomb --> mpi_sata_completion

In 'mpi_sata_completion'
the first call for 'spin_unlock_irqrestore()' is with flags=0,
which is as good as 'spin_unlock_irq()' ( unconditional interrupt
enabling).

So for better performance 'spin_unlock_irqrestore()' can be replaced
with 'spin_unlock_irq()' and 'spin_lock_irqsave()' can be replaced by
'spin_lock_irq()'.

Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Wayne Boyer [Thu, 23 Feb 2012 19:54:55 +0000 (11:54 -0800)]

[SCSI] ipr: update PCI ID definitions for new adapters

This patch updates some PCI ID definitions for new adapters based on the next
generation 64 bit IOA PCI interface chip.

Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Carpenter [Tue, 21 Feb 2012 07:29:40 +0000 (10:29 +0300)]

[SCSI] qla2xxx: handle default case in qla2x00_request_firmware()

This silences a static checker warning. Also we're always adding new
types of firmware, so it might fix a bug in real life some day.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Andrzej Jakowski [Fri, 10 Feb 2012 09:18:54 +0000 (01:18 -0800)]

[SCSI] isci: improvements in driver unloading routine

This patch fixes scenario where driver removal should be possible
only when driver is in READY state. Also it removes redundant
invocation of routine disabling SCU interrupts - this method is
called somewhere else in driver deinitialization path.

Signed-off-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 10 Feb 2012 09:18:49 +0000 (01:18 -0800)]

[SCSI] isci: improve phy event warnings

isci occasionally spews messages like:

isci 0000:03:00.0: sci_phy_event_handler: PHY starting substate machine received unexpected event_code b3940000

...which is not very helpful, since we don't know which controller,
which phy, the exact state, or a decode of the event.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 10 Feb 2012 09:18:44 +0000 (01:18 -0800)]

[SCSI] isci: debug, provide state-enum-to-string conversions

Debugging the driver requires tracing the state transtions and tracing
state names is less work than decoding numbers.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Tue, 31 Jan 2012 06:53:51 +0000 (22:53 -0800)]

[SCSI] scsi_transport_sas: 'enable' phys on reset

If userspace requests a phy reset, treat that as a request for the phy
to be enabled since that is the effect on hardware.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Tue, 31 Jan 2012 05:40:45 +0000 (21:40 -0800)]

[SCSI] libsas: don't recover end devices attached to disabled phys

If userspace has decided to disable a phy the kernel should honor that
and not inadvertantly re-enable the phy via error recovery. This is
more straightforward in the sata case where link recovery (via
libata-eh) is separate from sas_task cancelling in libsas-eh. Teach
libsas to accept -ENODEV as a successful response from I_T_nexus_reset
('successful' in terms of not escalating further).

This is a more comprehensive fix then "libsas: don't recover 'gone'
devices in sas_ata_hard_reset()", as it is no longer sata-specific.

aic94xx does check the return value from sas_phy_reset() so if the phy
is disabled we proceed with clearing the I_T_nexus.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Thu, 9 Feb 2012 07:20:41 +0000 (23:20 -0800)]

[SCSI] libsas: fixup target_port_protocols for expanders that don't report sata

If discovery returns 0 for target_port_protocols but shows an attached
sata device, just report SAS_PROTOCOL_SATA in the identify data so
userspace can reliably search for sata devices in the domain.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Sun, 29 Jan 2012 01:24:40 +0000 (17:24 -0800)]

[SCSI] libsas: set attached device type and target protocols for local phys

Before:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
none
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
none

After:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
end device
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
sata

Also downgrade the phy_list_lock to _irq instead of _irqsave since
libsas will never call sas_get_port_device with interrupts disbled.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 20 Jan 2012 23:26:03 +0000 (15:26 -0800)]

[SCSI] libsas: revert ata srst

libata issues follow up srsts when the controller has a hard time
recording the signature-fis after a reset, or if the link supports port
multipliers. libsas does not support port multipliers and no current
libsas lldds appear to need help retrieving the signature fis. Revert
it for now to remove confusion.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 20 Jan 2012 23:23:07 +0000 (15:23 -0800)]

[SCSI] libsas: fix lifetime of SAS_HA_FROZEN

Until all sas_tasks are known to no longer be in-flight this flag gates late
completions from colliding with error handling.  However, it must be cleared
prior to the submission of scsi_send_eh_cmnd() requests, otherwise those
commands will never be completed correctly.

This was spotted by slub debug:
=============================================================================
BUG sas_task: Objects remaining on kmem_cache_close()
-----------------------------------------------------------------------------

INFO: Slab 0xffffea001f0eba00 objects=34 used=1 fp=0xffff8807c3aecb00 flags=0x8000000000004080
Pid: 22919, comm: modprobe Not tainted 3.2.0-isci+ #2
Call Trace:
  [<ffffffff810fcdcd>] slab_err+0xb0/0xd2
  [<ffffffff810e1c50>] ? free_percpu+0x31/0x117
  [<ffffffff81100122>] ? kzalloc+0x14/0x16
  [<ffffffff81100122>] ? kzalloc+0x14/0x16
  [<ffffffff81100486>] kmem_cache_destroy+0x11d/0x270
  [<ffffffffa0112bdc>] sas_class_exit+0x10/0x12 [libsas]
  [<ffffffff81078fba>] sys_delete_module+0x1c4/0x23c
  [<ffffffff814797ba>] ? sysret_check+0x2e/0x69
  [<ffffffff8126479e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81479782>] system_call_fastpath+0x16/0x1b
INFO: Object 0xffff8807c3aed280 @offset=21120
INFO: Allocated in sas_alloc_task+0x22/0x90 [libsas] age=4615311 cpu=2 pid=12966
  __slab_alloc.clone.3+0x1d1/0x234
  kmem_cache_alloc+0x52/0x10d
  sas_alloc_task+0x22/0x90 [libsas]
  sas_queuecommand+0x20e/0x230 [libsas]
  scsi_send_eh_cmnd+0xd1/0x30c
  scsi_eh_try_stu+0x4f/0x6b
  scsi_eh_ready_devs+0xba/0x6ef
  sas_scsi_recover_host+0xa35/0xab1 [libsas]
  scsi_error_handler+0x14b/0x5fa
  kthread+0x9d/0xa5
  kernel_thread_helper+0x4/0x10

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Thu, 19 Jan 2012 04:47:01 +0000 (20:47 -0800)]

[SCSI] libsas: async ata scanning

libsas ata error handling is already async but this does not help the
scan case. Move initial link recovery out from under host->scan_mutex,
and delay synchronization with eh until after all port probe/recovery
work has been queued.

Device ordering is maintained with scan order by still calling
sas_rphy_add() in order of domain discovery.

Since we now scan the domain list when invoking libata-eh we need to be
careful to check for fully initialized ata ports.

Acked-by: Jack Wang <jack_wang@usish.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Thu, 19 Jan 2012 04:14:01 +0000 (20:14 -0800)]

[SCSI] libsas: restore scan order

ata devices are always scanned after ssp. Prior to the ata error
handling reworks libsas would tend to scan devices in ascending expander
phy order. Restore this ordering by deferring ssp discovery to a
DISCE_PROBE event, and keep the probe order consistent with the
discovery order, not the placement of sata devices.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 20 Jan 2012 02:43:08 +0000 (18:43 -0800)]

[SCSI] libsas: delete device on sas address changed

If the phy is attached to a new sas address unregister the first address
before processing the new attachment.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 13 Jan 2012 01:57:35 +0000 (17:57 -0800)]

[SCSI] libsas: let libata recover links that fail to transmit initial sig-fis

libsas fails to discover all sata devices in the domain.  If a device fails
negotiation and does not transmit a signature fis the link needs recovery.
libata already understands how to manage slow to come up links, so treat these
conditions as ata device attach events for the purposes of creating an
ata_port.  This allows libata to manage retrying link bring up.

Rediscovery is modified to be careful about checking changes in dev_type.  It
looks like libsas leaks old devices if the sas address changes, but that's a
fix for another patch.

Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Mon, 16 Jan 2012 21:54:28 +0000 (13:54 -0800)]

[SCSI] libsas: fix sas port naming

Make sas-port naming consistent with the expander-attached case whereby
the phy-id is the last digit in the port name. Otherwise we get the
random behavior of the allocation order.

Reported-by: Patrick Thomson <patrick.s.thomson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Mon, 16 Jan 2012 19:56:50 +0000 (11:56 -0800)]

[SCSI] libsas: improve debug statements

It's difficult to determine which domain_device is triggering error recovery,
so convert messages like:

  sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028
  sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029
  ...
  ata7: sas eh calling libata port error handler
  ata8: sas eh calling libata port error handler

...into:

  sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp)
  sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp)
  ...
  sas: ata7: end_device-21:1: dev error handler
  sas: ata8: end_device-20:0:5: dev error handler

which shows attached link rate, device type, and associates a
domain_device with its ata_port id to correlate messages emitted from
libata-eh.

As Doug notes, we can also take the opportunity to clarify expander phy
routing capabilities.

[dgilbert@interlog.com: clarify table2table with 'U']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Maciej Trela [Fri, 13 Jan 2012 21:52:38 +0000 (21:52 +0000)]

[SCSI] libsas: kill spurious sas_put_device

Holdover from a patch rework, prior to the addition of SAS_DEV_DESTROY
we were holding a reference while the destruct was pending in case the
domain was torn down before the desctruct event ran. That case is
covered by SAS_DEV_DESTROY, and the sas_put_device() just corrupts freed
memory, or worse frees the memory while another agent holds a reference.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Thu, 12 Jan 2012 19:47:24 +0000 (11:47 -0800)]

[SCSI] libsas: fix sas_unregister_ports vs sas_drain_work

We need to hold drain_mutex across the unregistration as port down events
queue device removal as chained events, so we need to make sure no other
drainers are active.

[ 1118.673968] WARNING: at kernel/workqueue.c:996 __queue_work+0x11a/0x326()
[ 1118.681982] Hardware name: S2600CP
[ 1118.686193] Modules linked in: isci(-) libsas scsi_transport_sas nls_utf8
ipv6 uinput sg iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ioatdma dca
sd_mod sr_mod cdrom ahci libahci libata [last unloaded: scsi_transport_sas]
[ 1118.709893] Pid: 6831, comm: rmmod Not tainted 3.2.0-isci+ #1
[ 1118.716727] Call Trace:
[ 1118.719867]  [<ffffffff8103e9f5>] warn_slowpath_common+0x85/0x9d
[ 1118.727000]  [<ffffffff8103ea27>] warn_slowpath_null+0x1a/0x1c
[ 1118.733942]  [<ffffffff81056d44>] __queue_work+0x11a/0x326
[ 1118.740481]  [<ffffffff81056f99>] queue_work_on+0x1b/0x22
[ 1118.746925]  [<ffffffff81057106>] queue_work+0x37/0x3e
[ 1118.753105]  [<ffffffffa0120e05>] ? sas_discover_event+0x55/0x82 [libsas]
[ 1118.761094]  [<ffffffff813217c3>] scsi_queue_work+0x42/0x44
[ 1118.767717]  [<ffffffffa0120e19>] sas_discover_event+0x69/0x82 [libsas]
[ 1118.775509]  [<ffffffffa0120f5b>] sas_unregister_dev+0xc3/0xcc [libsas]
[ 1118.783319]  [<ffffffffa0120fae>] sas_unregister_domain_devices+0x4a/0xc8 [libsas]
[ 1118.792731]  [<ffffffffa0120071>] sas_deform_port+0x60/0x1a6 [libsas]
[ 1118.800339]  [<ffffffffa01201ea>] sas_unregister_ports+0x33/0x44 [libsas]
[ 1118.808342]  [<ffffffffa011f7e5>] sas_unregister_ha+0x41/0x6b [libsas]
[ 1118.816055]  [<ffffffffa0134055>] isci_unregister+0x22/0x4d [isci]
[ 1118.823384]  [<ffffffffa0143040>] isci_pci_remove+0x2e/0x60 [isci]

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Wed, 11 Jan 2012 21:13:44 +0000 (13:13 -0800)]

[SCSI] libsas: route local link resets through ata-eh

Similar to the conversion of the transport-class reset we want bsg
initiated resets to be managed by libata.

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Wed, 11 Jan 2012 20:08:36 +0000 (12:08 -0800)]

[SCSI] libsas: fix mixed topology recovery

If we have a domain with sas and sata devices there may still be sas
recovery actions to take after peeling off the commands to send to
libata.

Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Tue, 10 Jan 2012 23:14:09 +0000 (15:14 -0800)]

[SCSI] libsas: close scsi_remove_target() vs libata-eh race

ata_port lifetime in libata follows the host.  In libsas it follows the
scsi_target.  Once scsi_remove_device() has caused all commands to be
completed it allows scsi_remove_target() to immediately proceed to
freeing the ata_port causing bug reports like:

[  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
[  848.400262] general protection fault: 0000 [#1] SMP
[  848.406244] CPU 4
[  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
[  848.432060]
[  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP
[  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
[  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
[  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
[  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
[  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
[  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
[  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
[  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
[  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
[  848.555327] Stack:
[  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
[  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
[  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
[  848.585281] Call Trace:
[  848.588409]  [<ffffffff8126a6e0>] spin_bug+0x26/0x28
[  848.594357]  [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88
[  848.601283]  [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65
[  848.609089]  [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata]
[  848.618331]  [<ffffffff81061813>] ? async_schedule+0x17/0x17
[  848.625060]  [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas]
[  848.632655]  [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125
[  848.639670]  [<ffffffff81057439>] process_one_work+0x207/0x38d
[  848.646577]  [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d
[  848.653681]  [<ffffffff810576f7>] worker_thread+0x138/0x21c
[  848.660305]  [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d
[  848.667493]  [<ffffffff8105b098>] kthread+0x9d/0xa5
[  848.673382]  [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166
[  848.681304]  [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10
[  848.688324]  [<ffffffff814af534>] ? retint_restore_args+0x13/0x13
[  848.695530]  [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b
[  848.702929]  [<ffffffff814b7700>] ? gs_change+0x13/0x13
[  848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89
[  848.732467] RIP  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.738905]  RSP <ffff8807f868dca0>
[  848.743743] ---[ end trace 143161646eee8caa ]---

...so arrange for the ata_port to have the same end of life as the domain
device.

Reported-by: Marcin Tomczak <marcin.tomczak@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Tue, 10 Jan 2012 22:39:13 +0000 (14:39 -0800)]

[SCSI] libsas: mark all domain devices gone if root port disappears

If the top level expander is hot removed, mark all child devices as gone
before unregistration to short circuit futile recovery.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Mon, 9 Jan 2012 18:12:52 +0000 (10:12 -0800)]

[SCSI] libsas: pre-clean commands that won the eh vs completion race

When scrolling forward through the eh list (in a clear_q scenario) it is
possible to encounter commands that won the completion vs eh race. Rather
than sprinkle more "if (!task)" throughout the handler just make a pass
through the list and delete the race winners before handling the rest.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Tue, 13 Dec 2011 04:32:09 +0000 (20:32 -0800)]

[SCSI] isci: remove IDEV_EH hack to disable "discovery-time" ata resets

Prior to commit 61aaff49 "isci: filter broadcast change notifications
during SMP phy resets" we borrowed the MVS_DEV_EH approach from the
mvsas driver for preventing ->lldd_I_T_nexus_reset() events during ata
discovery. This hack was protecting against the old ->phy_reset() in
ata_bus_probe(), but since the conversion to the new error handling this
hack is preventing resets from reaching ata devices.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Thu, 8 Dec 2011 08:37:25 +0000 (00:37 -0800)]

[SCSI] isci: remove bus and reset handlers

Remove ->eh_device_reset_handler() and ->eh_bus_reset_handler() for the
same reason they are not implemented for libata hosts, they cannot be
implemented reliably with ata-eh. ATA error recovery wants to divert
all resets to the eh thread and wait for completion, these handlers may
be invoked from a non-blocking ioctl.

The other path they are called from is libsas-eh, and if we escalate
past I_T_nexus reset we have larger problems i.e. tear down all
in-flight commands in the domain potentially without notification to the
lldd if it has chosen not to implement ->lldd_clear_nexus_port() /
->lldd_clear_nexus_ha().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 18 Nov 2011 02:01:38 +0000 (18:01 -0800)]

[SCSI] isci: ->lldd_ata_check_ready handler

Report to libata whether the link to the given domain_device is up and the
signature fis has been received.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 9 Dec 2011 07:20:44 +0000 (23:20 -0800)]

[SCSI] isci: stop interpreting ->lldd_lu_reset() as an ata soft-reset

Driving resets from libsas-eh is pre-mature as libata will make a
decision about performing a softreset. Currently libata determines
whether to perform a softreset based on ata_eh_followup_srst_needed(),
and none of those conditions apply to isci.

Remove the srst implementation and translate ->lldd_lu_reset() for ata
devices as a request to drive a reset via libata-eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Wed, 30 Nov 2011 19:57:34 +0000 (11:57 -0800)]

[SCSI] isci: fix interpretation of "hard" reset

A hard reset to isci in the direct-attached case is one where the driver
internally manages debouncing the link.  In the sas-expander-attached
case a hard reset is one that clears affiliations.  The driver should
not be prematurely dropping affiliations at run time, that decision (to
force expander hard resets to ata devices) is left to userspace to
manage.  So, arrange for I_T_nexus resets to be sas-link-resets in the
expander-attached case and isci-hard-resets in the direct-attached case.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Wed, 4 Jan 2012 07:26:15 +0000 (23:26 -0800)]

[SCSI] isci: kill isci_port->status

It only tracks whether the port is stopping in order to gate new devices
being discovered while the port is stopping. However, since the check
and subsequent handling is unlocked there is nothing to stop the port
from going down immediately after the check.

Driver is already prepared to handle devices arriving on stale ports,
and those will be cleaned up by an eventual ->lldd_dev_gone()
notification.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Wed, 4 Jan 2012 07:26:08 +0000 (23:26 -0800)]

[SCSI] isci: kill iphy->isci_port lookups

This field is a holdover from the OS abstraction conversion.  The stable
phy to port lookups are done via iphy->ownining_port under scic_lock.
After this conversion to use port->lldd_port the only volatile lookup is
the initial lookup in isci_port_formed().  After that point any lookup
via a successfully notified domain_device is guaranteed to be valid
until the domain_device is destroyed.

Delete ->start_complete as it is only set once and is set as a
consequence of the port going link up, by definition of getting a port
formed event the port is "ready".

While we are correcting port lookups also move the asd_sas_port table
out from under the isci_port.  This is to preclude any temptation to use
container_of() to convert an asd_sas_port to an isci_port, the
association is dynamic and under libsas control.

Tested-by: Maciej Trela <maciej.trela@intel.com>
[dmilburn@redhat.com: fix i686 compile error]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Dec 2011 22:58:24 +0000 (14:58 -0800)]

[SCSI] libsas: don't recover 'gone' devices in sas_ata_hard_reset()

The commands that timeout when a disk is forcibly removed may trigger
libata to attempt recovery of the device. If libsas has decided to
remove the device don't permit ata to continue to issue resets to its
last known phy.

The primary motivation for this patch is hotplug testing by writing 0 to
/sys/class/sas_phy/phyX/enable. Without this check this test leads to
libata issuing a reset and re-enabling the device that wants to be torn
down.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Thu, 22 Dec 2011 05:33:17 +0000 (21:33 -0800)]

[SCSI] libsas: fix sas_find_local_phy(), take phy references

In the direct-attached case this routine returns the phy on which this
device was first discovered.  Which is broken if we want to support
wide-targets, as this phy reference can become stale even though the
port is still active.

In the expander-attached case this routine tries to lookup the phy by
scanning the attached sas addresses of the parent expander, and BUG_ONs
if it can't find it.  However since eh and the libsas workqueue run
independently we can still be attempting device recovery via eh after
libsas has recorded the device as detached.  This is even easier to hit
now that eh is blocked while device domain rediscovery takes place, and
that libata is fed more timed out commands increasing the chances that
it will try to recover the ata device.

Arrange for dev->phy to always point to a last known good phy, it may be
stale after the port is torn down, but it will catch up for wide port
reconfigurations, and never be NULL.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Wed, 21 Dec 2011 23:19:56 +0000 (15:19 -0800)]

[SCSI] libsas: check for 'gone' expanders in smp_execute_task()

No sense in issuing or retrying commands to an expander that has been
removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Wed, 21 Dec 2011 22:55:38 +0000 (14:55 -0800)]

[SCSI] libsas: don't mark expanders as gone when a child device is removed

Commit 56dd2c06 "[SCSI] libsas: Don't issue commands to devices that
have been hot-removed" marked the parent device of an end-device as gone
when all the phys to the end device have been deleted.

The expander device is still present until its parent is removed. This
is a benign change until the smp_execute_task() path is taught to check
->gone.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Fri, 18 Nov 2011 01:59:54 +0000 (17:59 -0800)]

[SCSI] libsas: poll for ata device readiness after reset

Use ata_wait_after_reset() to poll for link recovery after a reset.
This combined with sas_ha->eh_mutex prevents expander rediscovery from
probing phys in an intermediate state. Local discovery does not have a
mechanism to filter link status changes during this timeout, so it
remains the responsibility of lldds to prevent premature port teardown.
Although once all lldd's support ->lldd_ata_check_ready() that could be
used as a gate to local port teardown.

The signature fis is re-transmitted when the link comes back so we
should be revalidating the ata device class, but that is left to a future
patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Sun, 4 Dec 2011 09:06:24 +0000 (01:06 -0800)]

[SCSI] libsas: async ata-eh

Once sas_ata_hard_reset() starts honoring the 'deadline' parameter a
pathological configuration could take 25 seconds per ata device
(serialized) to recover. Run per-port recoveries in parallel.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Jeff Skirvin [Wed, 16 Nov 2011 09:44:13 +0000 (09:44 +0000)]

[SCSI] libsas: add mutex for SMP task execution

SAS does not tag SMP requests, and at least one lldd (isci) does not permit
more than one in-flight request at a time.

[jejb: fix sas_init_dev tab issues while we're at it]
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Jeff Skirvin [Fri, 16 Dec 2011 08:21:21 +0000 (08:21 +0000)]

[SCSI] libsas: Remove redundant phy state notification calls.

In the case of an explicit sas_phy_enable call to disable a phy,
the LLDD provides the calls to sas_phy_disconnected and the
PHYE_LOSS_OF_SIGNAL event.

NOTE: This assumes that the lldd(s) generate the notification, which
appears to be the case, but only verfied on isci.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit | commitdiff | tree

Dan Williams [Sun, 4 Dec 2011 08:06:57 +0000 (00:06 -0800)]

[SCSI] libsas: sas_phy_enable via transport_sas_phy_reset

Execute the link-reset triggered by sas_phy_enable via
transport_sas_phy_reset so that it can be managed by libata.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

Linux kernel for KaRo TX COM modules

RSS Atom