Maciej Trela [Mon, 12 Mar 2012 23:29:30 +0000 (23:29 +0000)]
isci: enable BCN in sci_port_add_phy()
Ensure we enable receiving BCN's from the
hardware when adding phy to isci_port.
Otherwise if we get BCN before the port is
created we won't see any BCN
Signed-off-by: Maciej Trela <maciej.trela@intel.com> Reported-by: Richard Boyd <richard.g.boyd@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Artur Wojcik [Fri, 24 Feb 2012 23:42:45 +0000 (23:42 +0000)]
isci: implement suspend/resume support
Provide a "simple-dev-pm-ops" implementation that shuts down the domain
and the device on suspend, and resumes the device and the domain on
resume. All of the mechanics of restoring domain connectivity are
handled by libsas once isci has notified libsas that all links should be
back up. libsas is in charge of handling links that did not resume, or
resumed out of order.
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> Signed-off-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 2 Mar 2012 01:06:24 +0000 (17:06 -0800)]
isci: fix interrupt disable
There is a (dubious?) lost irq workaround in sci_controller_isr() that
effectively nullifies attempts to disable interrupts. Until the
workaround can be re-evaluated add some infrastructure to prevent the
interrupt handler from inadvertantly re-enabling interrupts.
The failure mode was interrupts continuing to run after the driver had
been removed and its iomappings torn down.
Reported-by: Jacek Danecki <jacek.danecki@intel.com> Tested-by: Jacek Danecki <jacek.danecki@intel.com>
[richard: clear remaining interrupts at the end of reset] Acked-by: Richard Boyd <richard.g.boyd@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 29 Feb 2012 09:07:56 +0000 (01:07 -0800)]
isci: fix 'link-up' events occur after 'start-complete'
The call to wait_for_start() is meant to ensure that all links have been
given a chance to come up before letting the kernel proceed with
probing. However, the implementation is not correctly syncing with the
port configuration agent. In the MPC case the ports are hard-coded, in
the APC case we need to wait for the port-configuration to form ports
from the started phys.
Towards that end increase the timeout for the APC agent to form ports,
and delay start complete until all phys are out of link-training.
Cc: <stable@vger.kernel.org> Cc: Richard Boyd <richard.g.boyd@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 15 Feb 2012 21:58:42 +0000 (13:58 -0800)]
isci: refactor initialization for S3/S4
Based on an original implementation by Ed Nadolski and Artur Wojcik
In preparation for S3/S4 support refactor initialization so that
driver-load and resume-from-suspend can share the common init path of
isci_host_init(). Organize the initialization into objects that are
self-contained to the driver (initialized by isci_host_init) versus
those that have some upward registration (initialized at allocation time
asd_sas_phy, asd_sas_port, dma allocations). The largest change is
moving the the validation of the oem and module parameters from
isci_host_init() to isci_host_alloc().
The S3/S4 approach being taken is that libsas will be tasked with
remembering the state of the domain and the lldd is free to be
forgetful. In the case of isci we'll just re-init using a subset of the
normal driver load path.
[clean up some unused / mis-indented function definitions in host.h]
Signed-off-by: Ed Nadolski <edmund.nadolski@intel.com> Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tom Jackson [Fri, 24 Feb 2012 09:38:49 +0000 (09:38 +0000)]
isci: Don't filter BROADCAST CHANGE primitives
Per the SAS spec, several types of BROADCAST CHANGE primitives
must cause re-discovery of the originating expander.
Only the standard BROADCAST CHANGE primitive was being
sent to the LIBSAS layer. The other BC primitives have been
added to the sci_phy_event_handler()
Signed-off-by: Tom Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 10 Feb 2012 09:05:43 +0000 (01:05 -0800)]
isci: improve 'invalid state' warnings
Convert controller state machine warnings to emit the state number (it
missed the number to string conversion, but since these error rarely
happen not much motivation to go further).
Fix up the rnc warnings to use the state name.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 22 Mar 2012 04:09:12 +0000 (21:09 -0700)]
libsas: suspend / resume support
libsas power management routines to suspend and recover the sas domain
based on a model where the lldd is allowed and expected to be
"forgetful".
sas_suspend_ha - disable event processing allowing the lldd to take down
links without concern for causing hotplug events.
Regardless of whether the lldd actually posts link down
messages libsas notifies the lldd that all
domain_devices are gone.
sas_prep_resume_ha - on the way back up before the lldd starts link
training clean out any spurious events that were
generated on the way down, and re-enable event
processing
sas_resume_ha - after the lldd has started and decided that all phys
have posted link-up events this routine is called to let
libsas start it's own timeout of any phys that did not
resume. After the timeout an lldd can cancel the
phy teardown by posting a link-up event.
Storage for ex_change_count (u16) and phy_change_count (u8) are changed
to int so they can be set to -1 to indicate 'invalidated'.
Cc: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Jacek Danecki <jacek.danecki@intel.com> Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 22 Mar 2012 04:09:11 +0000 (21:09 -0700)]
scsi, sd: limit the scope of the async probe domain
sd injects and synchronizes probe work on the global kernel-wide domain.
This runs into conflict with PM that wants to perform resume actions in
async context:
Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
be flushed without regard for the state of PM, and allow for the resume
path to handle devices that have transitioned from SDEV_QUIESCE to
SDEV_DEL prior to resume.
Acked-by: Alan Stern <stern@rowland.harvard.edu>
[alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 22 Mar 2012 04:09:10 +0000 (21:09 -0700)]
libsas: drop sata port multiplier infrastructure
On the way to add a new sata_device field, noticed that libsas is
carrying port multiplier infrastructure that is explicitly disabled by
sas_discover_sata(). The aic94xx touches the unused port_no, so leave
that field in case there was some use for it.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 22 Mar 2012 04:09:08 +0000 (21:09 -0700)]
libata: export ata_port suspend/resume infrastructure for sas
Reuse ata_port_{suspend|resume}_common for sas. This path is chosen
over adding coordination between ata-tranport and sas-transport because
libsas wants to revalidate the domain at resume-time at the host level.
It can not validate links have resumed properly until libata has had a
chance to perform its revalidation, and any sane placing of an ata_port
in the sas-transport model would delay it's resumption until after the
host.
Export the common portion of port suspend/resume (bypass pm_runtime),
and allow sas to perform these operations asynchronously (similar to the
libsas async-ata probe implmentation). Async operation is determined by
having an external, rather than stack based, location for storing the
result of the operation.
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 20 Mar 2012 17:58:35 +0000 (10:58 -0700)]
libata: reset once
Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up. In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jeff Skirvin [Sat, 10 Mar 2012 05:55:50 +0000 (05:55 +0000)]
libsas: sas_rediscover_dev did not look at the SMP exec status.
The discovery function "sas_rediscover_dev" had two bugs: 1) it did
not pay attention to the return status from the SMP task execution;
2) the stack variable used for the returned SAS address was compared
against 0 without being initialized.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
[djbw: todo sanitize smp_execute_task return values (see: residue)] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 9 Feb 2012 04:52:40 +0000 (20:52 -0800)]
libsas: trim sas_task of slow path infrastructure
The timer and the completion are only used for slow path tasks (smp, and
lldd tmfs), yet we incur the allocation space and cpu setup time for
every fast path task.
Cc: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Wed, 1 Feb 2012 09:12:23 +0000 (01:12 -0800)]
libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
sas_eh_bus_reset_handler() amounts to sas_phy_reset() without
notification of the reset to the lldd. If this is triggered from
eh-cmnd recovery there may be sas_tasks for the lldd to terminate, so
->lldd_I_T_nexus_reset is warranted.
Cc: Xiangliang Yu <yuxiangl@marvell.com> Cc: Luben Tuikov <ltuikov@yahoo.com> Cc: Jack Wang <jack_wang@usish.com> Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
[jacek: modify pm8001_I_T_nexus_reset to return -ENODEV] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 31 Jan 2012 07:39:11 +0000 (23:39 -0800)]
libsas: enforce eh strategy handlers only in eh context
The strategy handlers may be called in places that are problematic for
libsas (i.e. sata resets outside of domain revalidation filtering /
libata link recovery), or problematic for userspace (non-blocking ioctl
to sleeping reset functions). However, these routines are also called
for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
as long as we are running in the host's error handler, otherwise arrange
for them to be triggered in eh_context.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
...teach sas_rphy_remove to wait for async scanning to quiesce before
removing the end_device. It seems this is a more general problem [1],
but this patch only addresses sas transport.
[1]: 23edb6e [SCSI] mpt2sas: Do not set sas_device->starget to NULL from
the slave_destroy callback when all the LUNS have been deleted
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Normalize phy->attached_sas_addr to return a zero-address in the case
when device-type == NO_DEVICE or the linkrate is invalid to handle
expanders that put non-zero sas addresses in the discovery response:
Dan Williams [Fri, 6 Apr 2012 23:35:36 +0000 (16:35 -0700)]
scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)
Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.
A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain. When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.
Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh.
Cc: Tejun Heo <tj@kernel.org> Reported-by: Tom Jackson <thomas.p.jackson@intel.com> Tested-by: Tom Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Thu, 22 Mar 2012 04:09:07 +0000 (21:09 -0700)]
libsas, libata: fix start of life for a sas ata_port
This changes the ordering of initialization and probing events from:
1/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
2/ allocate ata_port and schedule port probe in DISCE_PROBE
...to:
1/ allocate ata_port in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
2/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
3/ schedule port probe in DISCE_PROBE
This ordering prevents PHYE_SIGNAL_LOSS_EVENTS from sneaking in to
destrory ata devices before they have been fully initialized:
Dan Williams [Thu, 22 Mar 2012 04:09:05 +0000 (21:09 -0700)]
libata: make ata_print_id atomic
This variable is incremented from multiple contexts (module_init via
libata-lldds and the libsas discovery thread). Make it atomic to head
off any chance of libsas and libata creating duplicate ids.
Acked-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Tue, 20 Mar 2012 20:24:29 +0000 (13:24 -0700)]
libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready
The check_ready implementation in the expander-attached ata device case
polls on sas_ex_phy_discover(). The effect is that the ex_phy fields
(critically ->attached_sas_addr) can change. When ata_eh ends and
libsas comes along to revalidate the domain
sas_unregister_devs_sas_addr() can fail to lookup devices to remove, or
fail to re-add an ata device that ata_eh marked as disabled. So change
the code to skip the sas_address and change count updates when ata_eh is
active.
Cc: Jack Wang <jack_wang@usish.com> Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Tested-by: Bartek Nowakowski <bartek.nowakowski@intel.com> Tested-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Tom Jackson <thomas.p.jackson@intel.com> Tested-by: Tom Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Mon, 12 Mar 2012 18:38:26 +0000 (11:38 -0700)]
libsas: fix sas_get_port_device regression
Commit 899fcf4 "[SCSI] libsas: set attached device type and target
protocols for local phys" setup 'phy' to be dereferenced after
list_for_each_entry(phy, &port->phy_list, port_phy_el) (i.e. phy ==
&port->phy_list) resulting in reports like:
BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
IP: [<ffffffffa00ce948>] sas_discover_domain+0x29e/0x4fb [libsas]
...fix by deferring sas_phy_set_target() to the end of
sas_get_port_device().
Reported-by: Tom Jackson <thomas.p.jackson@intel.com> Tested-by: Tom Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Thomas Jackson [Sat, 18 Feb 2012 02:33:10 +0000 (18:33 -0800)]
libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys
If an expander reports 'PHY VACANT' for a phy index prior to the one
that generated a BCN libsas fails rediscovery. Since a vacant phy is
defined as a valid phy index that will never have an attached device
just continue the search.
Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 2 Mar 2012 02:44:25 +0000 (18:44 -0800)]
libata, libsas: introduce sched_eh and end_eh port ops
When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).
Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time. Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.
Cc: Tejun Heo <tj@kernel.org> Acked-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Maciej Trela [Mon, 5 Mar 2012 01:58:55 +0000 (17:58 -0800)]
libsas: cleanup spurious calls to scsi_schedule_eh
eh is woken up automatically by the presence of failed commands,
scsi_schedule_eh is reserved for cases where there are no failed
commands. This guarantees that host_eh_sceduled is only incremented
when an explicit eh request is made.
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Maciej Trela <maciej.trela@intel.com>
[fixed spurious delete of sas_ata_task_abort] Signed-off-by: Artur Wojcik <artur.wojcik@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 9 Mar 2012 19:00:06 +0000 (11:00 -0800)]
libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
When requeuing work to a draining workqueue the last work instance may
not be idle, so sas_queue_work() must not touch work->entry. Introduce
sas_work with a drain_node list_head to have a private list for
collecting work deferred due to drain collision.
Fixes reports like:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff810410d4>] process_one_work+0x2e/0x338
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Nilesh Javali [Mon, 27 Feb 2012 11:08:52 +0000 (03:08 -0800)]
[SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
For offload iSCSI like qla4xxx CHAP entries are stored in FLASH.
This patch adds support to list CHAP entries stored in FLASH and
delete specified CHAP entry from FLASH using iscsi tools.
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com> Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Nilesh Javali [Mon, 27 Feb 2012 11:08:51 +0000 (03:08 -0800)]
[SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
For offload iSCSI like qla4xxx CHAP entries are stored in FLASH.
This patch adds support to list CHAP entries stored in FLASH and
delete specified CHAP entry from FLASH using iscsi tools.
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com> Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Santosh Nayak [Sun, 26 Feb 2012 14:44:46 +0000 (20:14 +0530)]
[SCSI] pm8001: fix endian issue with code optimization.
1. Fix endian issue.
2. Fix the following warning :
" drivers/scsi/pm8001/pm8001_hwi.c:2932:32: warning: comparison
between ‘enum sas_device_type’ and ‘enum sas_dev_type’".
3. Few code optimization.
Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com> Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
In 'mpi_sata_completion', "pm8001_ha->lock" is first released.
It means lock is taken before, which is true for
the context A, as 'pm8001_ha->lock' is taken in 'pm8001_chip_isr()'
But for context B there is no lock taken before and pm8001_ha->lock
is unlocked in 'mpi_sata_completion()'. This may unlock the lock
taken in context A. Possible racing ??
If 'pm8001_ha->lock' is taken in 'process_oq()' instead of
'pm8001_chip_isr' then the above issue can be avoided.
Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com> Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Santosh Nayak [Sun, 26 Feb 2012 13:33:30 +0000 (19:03 +0530)]
[SCSI] pm8001: Fix bogus interrupt state flag issue.
Static checker is giving following warning:
" error: calling 'spin_unlock_irqrestore()' with bogus flags"
The code flow is as shown below:
process_oq() --> process_one_iomb --> mpi_sata_completion
In 'mpi_sata_completion'
the first call for 'spin_unlock_irqrestore()' is with flags=0,
which is as good as 'spin_unlock_irq()' ( unconditional interrupt
enabling).
So for better performance 'spin_unlock_irqrestore()' can be replaced
with 'spin_unlock_irq()' and 'spin_lock_irqsave()' can be replaced by
'spin_lock_irq()'.
Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com> Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Wayne Boyer [Thu, 23 Feb 2012 19:54:55 +0000 (11:54 -0800)]
[SCSI] ipr: update PCI ID definitions for new adapters
This patch updates some PCI ID definitions for new adapters based on the next
generation 64 bit IOA PCI interface chip.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Carpenter [Tue, 21 Feb 2012 07:29:40 +0000 (10:29 +0300)]
[SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
This silences a static checker warning. Also we're always adding new
types of firmware, so it might fix a bug in real life some day.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Andrzej Jakowski [Fri, 10 Feb 2012 09:18:54 +0000 (01:18 -0800)]
[SCSI] isci: improvements in driver unloading routine
This patch fixes scenario where driver removal should be possible
only when driver is in READY state. Also it removes redundant
invocation of routine disabling SCU interrupts - this method is
called somewhere else in driver deinitialization path.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Tue, 31 Jan 2012 05:40:45 +0000 (21:40 -0800)]
[SCSI] libsas: don't recover end devices attached to disabled phys
If userspace has decided to disable a phy the kernel should honor that
and not inadvertantly re-enable the phy via error recovery. This is
more straightforward in the sata case where link recovery (via
libata-eh) is separate from sas_task cancelling in libsas-eh. Teach
libsas to accept -ENODEV as a successful response from I_T_nexus_reset
('successful' in terms of not escalating further).
This is a more comprehensive fix then "libsas: don't recover 'gone'
devices in sas_ata_hard_reset()", as it is no longer sata-specific.
aic94xx does check the return value from sas_phy_reset() so if the phy
is disabled we proceed with clearing the I_T_nexus.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Thu, 9 Feb 2012 07:20:41 +0000 (23:20 -0800)]
[SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
If discovery returns 0 for target_port_protocols but shows an attached
sata device, just report SAS_PROTOCOL_SATA in the identify data so
userspace can reliably search for sata devices in the domain.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Fri, 20 Jan 2012 23:26:03 +0000 (15:26 -0800)]
[SCSI] libsas: revert ata srst
libata issues follow up srsts when the controller has a hard time
recording the signature-fis after a reset, or if the link supports port
multipliers. libsas does not support port multipliers and no current
libsas lldds appear to need help retrieving the signature fis. Revert
it for now to remove confusion.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Fri, 20 Jan 2012 23:23:07 +0000 (15:23 -0800)]
[SCSI] libsas: fix lifetime of SAS_HA_FROZEN
Until all sas_tasks are known to no longer be in-flight this flag gates late
completions from colliding with error handling. However, it must be cleared
prior to the submission of scsi_send_eh_cmnd() requests, otherwise those
commands will never be completed correctly.
This was spotted by slub debug:
=============================================================================
BUG sas_task: Objects remaining on kmem_cache_close()
-----------------------------------------------------------------------------
Dan Williams [Thu, 19 Jan 2012 04:47:01 +0000 (20:47 -0800)]
[SCSI] libsas: async ata scanning
libsas ata error handling is already async but this does not help the
scan case. Move initial link recovery out from under host->scan_mutex,
and delay synchronization with eh until after all port probe/recovery
work has been queued.
Device ordering is maintained with scan order by still calling
sas_rphy_add() in order of domain discovery.
Since we now scan the domain list when invoking libata-eh we need to be
careful to check for fully initialized ata ports.
Acked-by: Jack Wang <jack_wang@usish.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Thu, 19 Jan 2012 04:14:01 +0000 (20:14 -0800)]
[SCSI] libsas: restore scan order
ata devices are always scanned after ssp. Prior to the ata error
handling reworks libsas would tend to scan devices in ascending expander
phy order. Restore this ordering by deferring ssp discovery to a
DISCE_PROBE event, and keep the probe order consistent with the
discovery order, not the placement of sata devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Fri, 13 Jan 2012 01:57:35 +0000 (17:57 -0800)]
[SCSI] libsas: let libata recover links that fail to transmit initial sig-fis
libsas fails to discover all sata devices in the domain. If a device fails
negotiation and does not transmit a signature fis the link needs recovery.
libata already understands how to manage slow to come up links, so treat these
conditions as ata device attach events for the purposes of creating an
ata_port. This allows libata to manage retrying link bring up.
Rediscovery is modified to be careful about checking changes in dev_type. It
looks like libsas leaks old devices if the sas address changes, but that's a
fix for another patch.
Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Mon, 16 Jan 2012 21:54:28 +0000 (13:54 -0800)]
[SCSI] libsas: fix sas port naming
Make sas-port naming consistent with the expander-attached case whereby
the phy-id is the last digit in the port name. Otherwise we get the
random behavior of the allocation order.
Reported-by: Patrick Thomson <patrick.s.thomson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
which shows attached link rate, device type, and associates a
domain_device with its ata_port id to correlate messages emitted from
libata-eh.
As Doug notes, we can also take the opportunity to clarify expander phy
routing capabilities.
[dgilbert@interlog.com: clarify table2table with 'U'] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Maciej Trela [Fri, 13 Jan 2012 21:52:38 +0000 (21:52 +0000)]
[SCSI] libsas: kill spurious sas_put_device
Holdover from a patch rework, prior to the addition of SAS_DEV_DESTROY
we were holding a reference while the destruct was pending in case the
domain was torn down before the desctruct event ran. That case is
covered by SAS_DEV_DESTROY, and the sas_put_device() just corrupts freed
memory, or worse frees the memory while another agent holds a reference.
Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Thu, 12 Jan 2012 19:47:24 +0000 (11:47 -0800)]
[SCSI] libsas: fix sas_unregister_ports vs sas_drain_work
We need to hold drain_mutex across the unregistration as port down events
queue device removal as chained events, so we need to make sure no other
drainers are active.
Reported-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Wed, 11 Jan 2012 21:13:44 +0000 (13:13 -0800)]
[SCSI] libsas: route local link resets through ata-eh
Similar to the conversion of the transport-class reset we want bsg
initiated resets to be managed by libata.
Reported-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Wed, 11 Jan 2012 20:08:36 +0000 (12:08 -0800)]
[SCSI] libsas: fix mixed topology recovery
If we have a domain with sas and sata devices there may still be sas
recovery actions to take after peeling off the commands to send to
libata.
Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Tue, 10 Jan 2012 23:14:09 +0000 (15:14 -0800)]
[SCSI] libsas: close scsi_remove_target() vs libata-eh race
ata_port lifetime in libata follows the host. In libsas it follows the
scsi_target. Once scsi_remove_device() has caused all commands to be
completed it allows scsi_remove_target() to immediately proceed to
freeing the ata_port causing bug reports like:
...so arrange for the ata_port to have the same end of life as the domain
device.
Reported-by: Marcin Tomczak <marcin.tomczak@intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Mon, 9 Jan 2012 18:12:52 +0000 (10:12 -0800)]
[SCSI] libsas: pre-clean commands that won the eh vs completion race
When scrolling forward through the eh list (in a clear_q scenario) it is
possible to encounter commands that won the completion vs eh race. Rather
than sprinkle more "if (!task)" throughout the handler just make a pass
through the list and delete the race winners before handling the rest.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Tue, 13 Dec 2011 04:32:09 +0000 (20:32 -0800)]
[SCSI] isci: remove IDEV_EH hack to disable "discovery-time" ata resets
Prior to commit 61aaff49 "isci: filter broadcast change notifications
during SMP phy resets" we borrowed the MVS_DEV_EH approach from the
mvsas driver for preventing ->lldd_I_T_nexus_reset() events during ata
discovery. This hack was protecting against the old ->phy_reset() in
ata_bus_probe(), but since the conversion to the new error handling this
hack is preventing resets from reaching ata devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Thu, 8 Dec 2011 08:37:25 +0000 (00:37 -0800)]
[SCSI] isci: remove bus and reset handlers
Remove ->eh_device_reset_handler() and ->eh_bus_reset_handler() for the
same reason they are not implemented for libata hosts, they cannot be
implemented reliably with ata-eh. ATA error recovery wants to divert
all resets to the eh thread and wait for completion, these handlers may
be invoked from a non-blocking ioctl.
The other path they are called from is libsas-eh, and if we escalate
past I_T_nexus reset we have larger problems i.e. tear down all
in-flight commands in the domain potentially without notification to the
lldd if it has chosen not to implement ->lldd_clear_nexus_port() /
->lldd_clear_nexus_ha().
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Fri, 9 Dec 2011 07:20:44 +0000 (23:20 -0800)]
[SCSI] isci: stop interpreting ->lldd_lu_reset() as an ata soft-reset
Driving resets from libsas-eh is pre-mature as libata will make a
decision about performing a softreset. Currently libata determines
whether to perform a softreset based on ata_eh_followup_srst_needed(),
and none of those conditions apply to isci.
Remove the srst implementation and translate ->lldd_lu_reset() for ata
devices as a request to drive a reset via libata-eh.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Wed, 30 Nov 2011 19:57:34 +0000 (11:57 -0800)]
[SCSI] isci: fix interpretation of "hard" reset
A hard reset to isci in the direct-attached case is one where the driver
internally manages debouncing the link. In the sas-expander-attached
case a hard reset is one that clears affiliations. The driver should
not be prematurely dropping affiliations at run time, that decision (to
force expander hard resets to ata devices) is left to userspace to
manage. So, arrange for I_T_nexus resets to be sas-link-resets in the
expander-attached case and isci-hard-resets in the direct-attached case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Wed, 4 Jan 2012 07:26:15 +0000 (23:26 -0800)]
[SCSI] isci: kill isci_port->status
It only tracks whether the port is stopping in order to gate new devices
being discovered while the port is stopping. However, since the check
and subsequent handling is unlocked there is nothing to stop the port
from going down immediately after the check.
Driver is already prepared to handle devices arriving on stale ports,
and those will be cleaned up by an eventual ->lldd_dev_gone()
notification.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Wed, 4 Jan 2012 07:26:08 +0000 (23:26 -0800)]
[SCSI] isci: kill iphy->isci_port lookups
This field is a holdover from the OS abstraction conversion. The stable
phy to port lookups are done via iphy->ownining_port under scic_lock.
After this conversion to use port->lldd_port the only volatile lookup is
the initial lookup in isci_port_formed(). After that point any lookup
via a successfully notified domain_device is guaranteed to be valid
until the domain_device is destroyed.
Delete ->start_complete as it is only set once and is set as a
consequence of the port going link up, by definition of getting a port
formed event the port is "ready".
While we are correcting port lookups also move the asd_sas_port table
out from under the isci_port. This is to preclude any temptation to use
container_of() to convert an asd_sas_port to an isci_port, the
association is dynamic and under libsas control.
Tested-by: Maciej Trela <maciej.trela@intel.com>
[dmilburn@redhat.com: fix i686 compile error] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Thu, 22 Dec 2011 22:58:24 +0000 (14:58 -0800)]
[SCSI] libsas: don't recover 'gone' devices in sas_ata_hard_reset()
The commands that timeout when a disk is forcibly removed may trigger
libata to attempt recovery of the device. If libsas has decided to
remove the device don't permit ata to continue to issue resets to its
last known phy.
The primary motivation for this patch is hotplug testing by writing 0 to
/sys/class/sas_phy/phyX/enable. Without this check this test leads to
libata issuing a reset and re-enabling the device that wants to be torn
down.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Thu, 22 Dec 2011 05:33:17 +0000 (21:33 -0800)]
[SCSI] libsas: fix sas_find_local_phy(), take phy references
In the direct-attached case this routine returns the phy on which this
device was first discovered. Which is broken if we want to support
wide-targets, as this phy reference can become stale even though the
port is still active.
In the expander-attached case this routine tries to lookup the phy by
scanning the attached sas addresses of the parent expander, and BUG_ONs
if it can't find it. However since eh and the libsas workqueue run
independently we can still be attempting device recovery via eh after
libsas has recorded the device as detached. This is even easier to hit
now that eh is blocked while device domain rediscovery takes place, and
that libata is fed more timed out commands increasing the chances that
it will try to recover the ata device.
Arrange for dev->phy to always point to a last known good phy, it may be
stale after the port is torn down, but it will catch up for wide port
reconfigurations, and never be NULL.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Wed, 21 Dec 2011 22:55:38 +0000 (14:55 -0800)]
[SCSI] libsas: don't mark expanders as gone when a child device is removed
Commit 56dd2c06 "[SCSI] libsas: Don't issue commands to devices that
have been hot-removed" marked the parent device of an end-device as gone
when all the phys to the end device have been deleted.
The expander device is still present until its parent is removed. This
is a benign change until the smp_execute_task() path is taught to check
->gone.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Fri, 18 Nov 2011 01:59:54 +0000 (17:59 -0800)]
[SCSI] libsas: poll for ata device readiness after reset
Use ata_wait_after_reset() to poll for link recovery after a reset.
This combined with sas_ha->eh_mutex prevents expander rediscovery from
probing phys in an intermediate state. Local discovery does not have a
mechanism to filter link status changes during this timeout, so it
remains the responsibility of lldds to prevent premature port teardown.
Although once all lldd's support ->lldd_ata_check_ready() that could be
used as a gate to local port teardown.
The signature fis is re-transmitted when the link comes back so we
should be revalidating the ata device class, but that is left to a future
patch.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Sun, 4 Dec 2011 09:06:24 +0000 (01:06 -0800)]
[SCSI] libsas: async ata-eh
Once sas_ata_hard_reset() starts honoring the 'deadline' parameter a
pathological configuration could take 25 seconds per ata device
(serialized) to recover. Run per-port recoveries in parallel.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Jeff Skirvin [Wed, 16 Nov 2011 09:44:13 +0000 (09:44 +0000)]
[SCSI] libsas: add mutex for SMP task execution
SAS does not tag SMP requests, and at least one lldd (isci) does not permit
more than one in-flight request at a time.
[jejb: fix sas_init_dev tab issues while we're at it] Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Jeff Skirvin [Fri, 16 Dec 2011 08:21:21 +0000 (08:21 +0000)]
[SCSI] libsas: Remove redundant phy state notification calls.
In the case of an explicit sas_phy_enable call to disable a phy,
the LLDD provides the calls to sas_phy_disconnected and the
PHYE_LOSS_OF_SIGNAL event.
NOTE: This assumes that the lldd(s) generate the notification, which
appears to be the case, but only verfied on isci.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Sat, 3 Dec 2011 00:07:01 +0000 (16:07 -0800)]
[SCSI] libsas: execute transport link resets with libata-eh via host workqueue
Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata. Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).
Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds. They are not
prepared for it to loop back into eh.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Tue, 20 Dec 2011 09:03:48 +0000 (01:03 -0800)]
[SCSI] libsas: perform sas-transport resets in shost->workq context
Extend the sas transport class to allow transport users to attach extra
data to a sas_phy (->hostdata). Use this area in libsas to move resets
to workq context in preparation for scheduling ata device resets through
libata-eh.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Thu, 1 Dec 2011 07:23:33 +0000 (23:23 -0800)]
[SCSI] libsas: use libata-eh-reset for sata rediscovery fis transmit failures
Since sata devices can take several seconds to recover the link on reset
the 0.5 seconds that libsas currently waits may not be enough. Instead
if we are rediscovering a phy that was previously attached to a sata
device let libata handle any resets to encourage the device to transmit
the initial fis.
Once sas_ata_hard_reset() and lldds learn how to honor 'deadline' libsas
should stop encountering phys in an intermediate state, until then this
will loop until the fis is transmitted or ->attached_sas_addr gets
cleared, but in the more likely initial discovery case we keep existing
behavior.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Tue, 29 Nov 2011 22:54:28 +0000 (14:54 -0800)]
[SCSI] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata
lldds use the SAS_TASK_NEED_DEV_RESET interface to request that eh
perform a reset. In the sata device case defer the commands that
triggered the reset to libata-eh context so it can perform its pre and
post reset management.
In the sas_ata_post_internal() case the reset request is falling on deaf
ears as the sas_task is immediately destroyed without any reset action.
Since it is currently a nop, and likely superfluous given the conversion
to new-style libata-eh, just drop the request.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Tue, 29 Nov 2011 20:08:50 +0000 (12:08 -0800)]
[SCSI] libsas: let libata handle command timeouts
libsas-eh if it successfully aborts an ata command will hide the timeout
condition (AC_ERR_TIMEOUT) from libata. The command likely completes
with the all-zero task->task_status it started with. Instead, interpret
a TMF_RESP_FUNC_COMPLETE as the end of the sas_task but keep the scmd
around for libata-eh to handle.
Tested-by: Andrzej Jakowski <andrzej.jakowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Mon, 28 Nov 2011 19:29:20 +0000 (11:29 -0800)]
[SCSI] libsas: fix timeout vs completion race
Until we have told the lldd to forget a task a timed out operation can
return from the hardware at any time. Since completion frees the task
we need to make sure that no tasks run their normal completion handler
once eh has decided to manage the task. Similar to
ata_scsi_cmd_error_handler() freeze completions to let eh judge the
outcome of the race.
Task collector mode is problematic because it presents a situation where
a task can be timed out and aborted before the lldd has even seen it.
For this case we need to guarantee that a task that an lldd has been
told to forget does not get queued after the lldd says "never seen it".
With sas_scsi_timed_out we achieve this with the ->task_queue_flush
mutex, rather than adding more time.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Wed, 7 Dec 2011 07:24:42 +0000 (23:24 -0800)]
[SCSI] libsas: prevent double completion of scmds from eh
We invoke task->task_done() to free the task in the eh case, but at this
point we are prepared for scsi_eh_flush_done_q() to finish off the scmd.
Introduce sas_end_task() to capture the final response status from the
lldd and free the task.
Also take the opportunity to kill this warning.
drivers/scsi/libsas/sas_scsi_host.c: In function ‘sas_end_task’:
drivers/scsi/libsas/sas_scsi_host.c:102:3: warning: case value ‘2’ not in enumerated type ‘enum exec_status’ [-Wswitch]
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Mon, 28 Nov 2011 20:08:22 +0000 (12:08 -0800)]
[SCSI] libsas: close error handling vs sas_ata_task_done() race
Since sas_ata does not implement ->freeze(), completions for scmds and
internal commands can still arrive concurrent with
ata_scsi_cmd_error_handler() and sas_ata_post_internal() respectively.
By the time either of those is called libata has committed to completing
the qc, and the ATA_PFLAG_FROZEN flag tells sas_ata_task_done() it has
lost the race.
In the sas_ata_post_internal() case we take on the additional
responsibility of freeing the sas_task to close the race with
sas_ata_task_done() freeing the the task while sas_ata_post_internal()
is in the process of invoking ->lldd_abort_task().
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Williams [Tue, 29 Nov 2011 01:11:33 +0000 (17:11 -0800)]
[SCSI] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done
Prior to the conversion to the new-style libata-eh sas_ata_task_done()
may have been the last opportunity to clean up the scmd, but now
libata-eh explicitly handles this case. It also races against sas-eh.
If a lldd completes a task after SAS_TASK_STATE_ABORTED is set it could
trigger a spurious decrement of shost->host_failed. Current lldds have
the band-aid of checking SAS_TASK_STATE_ABORTED before calling
->task_done(), but better to just let the scmds escalate to libata for
race free cleanup.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>