Nick Cheng [Fri, 18 Jun 2010 07:39:12 +0000 (15:39 +0800)]
[SCSI] arcmsr: Support 1024 scatter-gather list entries and improve AP while FW trapped and behaviors of EHs
1. To support 4M/1024 scatter-gather list entry, reorganize struct
ARCMSR_CDB and struct CommandControlBlock
2. To modify arcmsr_probe
3. In order to help fix F/W issue, add the driver mode for type B card
4. To improve AP's behavior while F/W resets
5. To unify struct MessageUnit_B's members' naming in all OS drivers'
6. To improve error handlers, arcmsr_bus_reset(), arcmsr_abort()
7. To fix the arcmsr_queue_command() in bus reset stage, just let the
commands pass down to FW, don't block
Signed-off-by: Nick Cheng <nick.cheng@areca.com.tw> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Joe Eykholt [Fri, 11 Jun 2010 23:44:57 +0000 (16:44 -0700)]
[SCSI] libfc: fix indefinite rport restart
Remote ports were restarting indefinitely after getting
rejects in PRLI.
Fix by adding a counter of restarts and limiting that with
the port login retry limit as well.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Joe Eykholt [Fri, 11 Jun 2010 23:44:51 +0000 (16:44 -0700)]
[SCSI] libfc: Fix remote port restart problem
This patch somewhat combines two fixes to remote port handing in libfc.
The first problem was that rport work could be queued on a deleted
and freed rport. This is handled by not resetting rdata->event
ton NONE if the rdata is about to be deleted.
However, that fix led to the second problem, described by
Bhanu Gollapudi, as follows:
> Here is the sequence of events. T1 is first LOGO receive thread, T2 is
> fc_rport_work() scheduled by T1 and T3 is second LOGO receive thread and
> T4 is fc_rport_work scheduled by T3.
>
> 1. (T1)Received 1st LOGO in state Ready
> 2. (T1)Delete port & enter to RESTART state.
> 3. (T1)schdule event_work, since event is RPORT_EV_NONE.
> 4. (T1)set event = RPORT_EV_LOGO
> 5. (T1)Enter RESTART state as disc_id is set.
> 6. (T2)remember to PLOGI, and set event = RPORT_EV_NONE
> 6. (T3)Received 2nd LOGO
> 7. (T3)Delete Port & enter to RESTART state.
> 8. (T3)schedule event_work, since event is RPORT_EV_NONE.
> 9. (T3)Enter RESTART state as disc_id is set.
> 9. (T3)set event = RPORT_EV_LOGO
> 10.(T2)work restart, enter PLOGI state and issues PLOGI
> 11.(T4)Since state is not RESTART anymore, restart is not set, and the
> event is not reset to RPORT_EV_NONE. (current event is RPORT_EV_LOGO).
> 12. Now, PLOGI succeeds and fc_rport_enter_ready() will not schedule
> event_work, and hence the rport will never be created, eventually losing
> the target after dev_loss_tmo.
So, the problem here is that we were tracking the desire for
the rport be restarted by state RESTART, which was otherwise
equivalent to DELETE. A contributing factor is that we dropped
the lock between steps 6 and 10 in thread T2, which allows the
state to change, and we didn't completely re-evaluate then.
This is hopefully corrected by the following minor redesign:
Simplify the rport restart logic by making the decision to
restart after deleting the transport rport. That decision
is based on a new STARTED flag that indicates fc_rport_login()
has been called and fc_rport_logoff() has not been called
since then. This replaces the need for the RESTART state.
Only restart if the rdata is still in DELETED state
and only if it still has the STARTED flag set.
Also now, since we clear the event code much later in the
work thread, allow for the possibility that the rport may
have become READY again via incoming PLOGI, and if so,
queue another event to handle that.
In the problem scenario, the second LOGO received will
cause the LOGO event to occur again.
Reported-by: Bhanu Gollapudi <bprakash@broadcom.com> Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Roel Kluin [Fri, 11 Jun 2010 23:44:46 +0000 (16:44 -0700)]
[SCSI] fnic: fnic_scsi.c: clean up
In fnic_abort_cmd() and fnic_device_reset() assign `rport' earlier to make
FNIC_SCSI_DBG() calls cleaner.
In fnic_clean_pending_aborts() `rport' is not used.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Abhijeet Joglekar <abjoglek@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] libfcoe: Check for order and missing critical descriptors for FIP ELS requests
As per FC-BB-5 rev.2, section 7.8.7.1, strict ordering of FIP descriptors
is required for ELS requests. Also, look for missing and duplicate critical
descriptors.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] libfcoe: Host doesnt handle CVL to NPIV ports
Clear virtual link for NPIV ports is now handled by resetting
the matching vnport.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
As per FC-BB-5 rev 2, section 7.8.6.2, malformed FIP frame shall be
discarded. Drop discovery adv, ELS and CLV's with duplicate critical
descriptors.
[Resending after incorporating Joe's review comments]
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Joe Eykholt [Fri, 11 Jun 2010 23:44:20 +0000 (16:44 -0700)]
[SCSI] libfcoe: update FIP FCF D flag from advertisments
Allow the D flag (indicating that keep-alives are not needed) to
be updated dynamically from received FIP advertisements.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Joe Eykholt [Fri, 11 Jun 2010 23:44:15 +0000 (16:44 -0700)]
[SCSI] libfcoe: Use fka_period as periodic timeouts to age out fcf if
keep alives are disabled due to fd_flags set and also
stop updating keep alive values in that case.
Update select fcf time only if fcf is not already selected or
select time is not already determined from parse adv, and then
have select time cleared only once after fcf is selected.
Changed deadline check to time_after_eq() from time_after()
since now next timeout will be on exact 2.5 times FKA followed
by first advertisement.
Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Joe Eykholt [Fri, 11 Jun 2010 23:44:10 +0000 (16:44 -0700)]
[SCSI] libfcoe: fix lenient aging of FCF advertisements
[This patch has several improvements to the code in
the fip timers. It hasn't been tested yet.
I'm sending it out for review. Vasu, perhaps you can
merge this with your patch and test it together.]
The current code allows an advertisement to be used
even if it has been 3 times the FCF keep-alive
advertisement period (FKA) since one was received from
that FCF. The spec. calls for 2.5 times FKA.
Fix this and make sure we detect missed keep-alives promptly.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Resubmitting after incorporating Joe's review comment.
Unsolicited PRLO request is now handled by sending LS_ACC,
and then relogin to the remote port if an N-port login
session exists for that remote port.
Note that this patch should be applied on top of Joe Eykholt's
"Fix remote port restart problem" patch.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Joe Eykholt [Fri, 11 Jun 2010 23:43:59 +0000 (16:43 -0700)]
[SCSI] fcoe: clean up TBD comments in FCoE prototype header
Some old comments in fc_fcoe.h say TBD long after the
standard has been passed by T11. Clean them up.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] libfc: Honor LS_ACC response codes for PRLI
As per FC-LS Rev 1.62 table 46, response codes are handled as follows:
1. If the Req executed is true, PRLI is accepted.
2. If Req executed is not set, if resp code is 5,
PRLI is not retried and port is logged out.
3. If resp code is anything apart from 1 or 5, PRLI is retired
upto max retry count.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Retry upto max_rport_retry_count when a target responds with
LS_RJT for a PRLI request.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Host does not send discovery solicitation messages if Disc. Adv
from FCF are dropped. It restarts sending solicitation only
after receiving a Discovery Adv. from FCF. Fix is to restart
solicitation immediately after CVL processing.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] libfcoe: Avoid hang when receiving non-critical descriptors
Avoid infinite loop while processing FIP ELS or discovery
advertisement with non-critical descriptors.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Acked-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Joe Eykholt [Fri, 11 Jun 2010 23:43:33 +0000 (16:43 -0700)]
[SCSI] libfcoe: FIP link keep-alive should continue while logged off
A check in fcoe_ctlr_send_keep_alive() returns if there's no
port_id for the local port. This could miss a keep alive if
we just did a host reset and have logged off and will log back in.
Return only if we are doing the port keep alive, in which case
we need to be logged in.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Wayne Boyer [Thu, 10 Jun 2010 21:46:34 +0000 (14:46 -0700)]
[SCSI] ipr: move setting of the allow_restart flag for vsets and disks
A problem was found where the call to scsi_add_device() fails intermittently
for an adapter. This is caused when __scsi_add_device() returns -ENODEV as
a result of not calling scsi_probe_and_add_lun() since the call to
scsi_host_scan_allowed() fails. scsi_host_scan_allowed() fails because the
adapter state is set to SHOST_RECOVERY instead of SHOST_RUNNING. The state of
the adapter is being set to SHOST_RECOVERY by scsi_eh_scmd_add() during
error handling.
This problem is avoided by moving the setting of the allow_restart flag to
later in the device initialization sequence. This prevents further error
handling if we get a NOT_READY response from a TUR command by causing
scsi_check_sense() to return SUCCESS. Therefore, scsi_eh_scmd_add() will
not run and the adapter state will remain as SHOST_RUNNING.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Dan Carpenter [Thu, 10 Jun 2010 07:53:05 +0000 (09:53 +0200)]
[SCSI] be2iscsi: fix null dereference on error path
"phba" is always null here so we can't dereference it.
Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Dan Carpenter [Thu, 10 Jun 2010 07:52:21 +0000 (09:52 +0200)]
[SCSI] be2iscsi: fix memory leak on error path
I added a kfree(pwrb_arr) in front of the return.
Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Wayne Boyer [Wed, 9 Jun 2010 15:24:55 +0000 (08:24 -0700)]
[SCSI] ipr: add writeq definition if needed
Compiling the driver will fail on 32 bit powerpc and other
architectures where writeq is not defined. This patch adds a
definition for writeq.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Mike Christie [Wed, 9 Jun 2010 08:30:08 +0000 (03:30 -0500)]
[SCSI] be2iscsi: fix disconnection cleanup
This patch fixes 4 bugs in the connection connect/disconnect
cleanup path.
1. If beiscsi_open_conn fails beiscsi_free_ep was always being
called, and if beiscsi_open_conn failed because beiscsi_get_cid
failed then we would free an unallocated cid.
2. If beiscsi_ep_connect failed due to a beiscsi_open_conn failure
it was leaking iscsi_endpoints.
3. beiscsi_ep_disconnect was leaking iscsi_endpoints.
beiscsi_ep_disconnect should free the iscsi_endpoint. We cannot
do it in beiscsi_conn_stop because that is only called for
iscsi connection cleanup. If beiscsi_ep_connect returns
success, but then the poll function fails or the connect
times out then beiscsi_ep_disconnect will be called to clean
up the ep. The conn_stop callout will not be called in that path.
4. beiscsi_conn_stop was freeing the iscsi_endpoint then accessing
it a couple lines later.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Tue, 8 Jun 2010 22:32:13 +0000 (18:32 -0400)]
[SCSI] lpfc 8.3.14: Update Driver version to 8.3.14
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Tue, 8 Jun 2010 22:31:54 +0000 (18:31 -0400)]
[SCSI] lpfc 8.3.14: SCSI and SLI API fixes
- Fixed accounting of allocated SCSI buffers when post sgl fails.
- Restrict scsi buffer allocation based on LUN count (sdev_cnt).
- Create __lpfc_sli_free_rpi that doesn't take out the hbalock.
- Modify lpfc_sli_free_rpi to call __lpfc_sli_free_rpi.
- Call __lpfc_sli_free_rpi in lpfc_cleanup_pending_mbox.
- Do not swap the strings returned in mailbox commands and do
not swap byte aligned data in VPD.
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Tue, 8 Jun 2010 22:31:37 +0000 (18:31 -0400)]
[SCSI] lpfc 8.3.14: FCoE Discovery Fixes
- Prevent unregistring of unused FCF when FLOGI is pending.
- Prevent point to point discovery on a FCoE HBA.
- Fixed FCF discovery failure after swapping FCoE port by
switching over to fast failover method when no FCF matches in-use FCF.
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Tue, 8 Jun 2010 22:31:21 +0000 (18:31 -0400)]
[SCSI] lpfc 8.3.14: PCI fixes and enhancements
- Allow enabling MSI-X intterupts with fewer vectors than requested
by looking at the return value from pci_enable_msix.
- Implemented driver PCI AER error handling routines for supporting
AER error recovering on SLI4 devices.
- Remove redundant SLI_ACTIVE checks
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
1. MSI-X interrupt support
2. Driver changes to support new maxRAID controller FW version. The
changes are mainly done to handle async notification changes done in
newer controller FW version.
3. Added state change notifications to notify applications of controller
states.
Signed-off-by: Anil Ravindranath <anil_ravindranath@pmc-sierra.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Mon, 7 Jun 2010 19:24:54 +0000 (15:24 -0400)]
[SCSI] lpfc 8.3.13: Update Driver Version to 8.3.13
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Mon, 7 Jun 2010 19:24:45 +0000 (15:24 -0400)]
[SCSI] lpfc 8.3.13: Add TX Queue Support for SLI4 ELS commands.
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Mon, 7 Jun 2010 19:24:29 +0000 (15:24 -0400)]
[SCSI] lpfc 8.3.13: Misc fixes
- Change the Max receive size on CIN FCFs to 0x800
- (From linux community) Check boundary before checking for NULL.
- Update last completion time for completed I/O to prevent heartbeat.
- Add Balius PCI Device IDs
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Mon, 7 Jun 2010 19:24:12 +0000 (15:24 -0400)]
[SCSI] lpfc 8.3.13: SCSI specific changes
- Fix hba_queue_depth to reflect actual available XRIs
- Add support for new SLER specific firmware status codes.
- Free SCSI buffer when iotag allocation fails.
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Mon, 7 Jun 2010 19:23:35 +0000 (15:23 -0400)]
[SCSI] lpfc 8.3.13: Initialization code clean up and fixes.
- Add poll or wait flag parameter to hba_init_link and hba_down_link.
- (From Linux Community) Make return with ENXIO negative.
- Remove unused INB code from driver.
- Prevent block_magmt_io from returning until mailbox is inactive.
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Mon, 7 Jun 2010 19:23:17 +0000 (15:23 -0400)]
[SCSI] lpfc 8.3.13: FC Discovery Fixes and enhancements.
- Retry PLOGI up to 48 times when LS_RJT reason is
"Unable to supply requested data."
- When dev loss timeout occures do not change state if there
is an outstanding REG_LOGIN.
- Add logic to ignore REG_LOGIN completion if discovery is
restarted while waiting for REG_LOGIN.
- Only change state on REG_LOGIN completion if still in
state waiting for REG_LOGIN completion.
- Only send ADISCs to FCP-2 Targets (not Initiators).
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Wayne Boyer [Fri, 4 Jun 2010 17:26:50 +0000 (10:26 -0700)]
[SCSI] ipr: add endian swap enablement for 64 bit adapters
A change in the hardware design of the chip for the new adapters changes the
default endianness of MMIO operations. This patch adds a register definition
which when written to with a predefined value will change the endianness
back to what the driver expects.
This patch also fixes two problems found during testing.
First, the first reserved field in the ipr_hostrcb64_fabirc_desc structure only
reserved one byte. The correct amount to reserve is 2 bytes.
Second, the reserved field of the ipr_hostrcb64_error structure only reserved
2 bytes. The correct amount to reserve is 16 bytes.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Santosh Vernekar [Fri, 28 May 2010 22:08:25 +0000 (15:08 -0700)]
[SCSI] qla2xxx: Handle outstanding mbx cmds on hung f/w scenarios.
Outstanding mailbox commands, have no way to recover on f/w hung, and we
timeout on waiting for mbx response. This in turn affects the recovery process
as follows:
- We might already be in dpc while waiting for mbx to complete, so recovery for
that pci function will never get invoked. Reset Timeout (10 sec) is far less
than mbx timeout (30 sec).
- Other mbx cmds will get stuck due to serial mbx access.
Solution is to identify fw-hung scenario and handle outstanding mbx commands to
have an early-exit instead of waiting for response.
Other mbx commands waiting for access will also do an early-exit if fw-hung is
still applicable.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Andrew Vasquez [Fri, 28 May 2010 22:08:22 +0000 (15:08 -0700)]
[SCSI] qla2xxx: Make the FC port capability mutual exclusive.
In case of both target and initiator capabilities reported by fc port,
the fc port port capability is made mutualy exclusive with priority given
for target capabilities.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] qla2xxx: Fix cpu-affinity usage for non-capable ISPs.
The TMFs used for pre-24xx ISPs incorrectly assumed 'cpu' tag
data could be valid. These chips have no multi-q/cpu-affinity
support. This corrects an oops seen on ISP23xx parts.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
We have 32 (MAXSGENTRIES) scatter gather elements embedded
in the command. With all these, the total command size is
about 576 bytes. However, the last entry in the block fetch table
is 35. (the block fetch table contains the number of 16-byte chunks
the firmware needs to fetch for a given number of scatter gather
elements.) 35 * 16 = 560 bytes, which isn't enough. It needs to be
36. (36 * 16 == 576) or, MAXSGENTRIES + 4. (plus 4 because there's a
bunch of stuff at the front of the command before the first scatter
gather element that takes up 4 * 16 bytes.) Without this fix, the
controller may have to perform two DMA operations to fetch the
command since the first one may not get the whole thing.
Signed-off-by: Don Brace <brace@beardog.cce.hp.com> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Ryan Kuester [Mon, 26 Apr 2010 23:11:54 +0000 (18:11 -0500)]
[SCSI] mptsas: fix hangs caused by ATA pass-through
I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
pass-through commands, in particular by smartctl.
First, my version of the symptoms. On an LSI SAS1068E B3 HBA running
01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
occasional task, bus, and host resets, some of which lead to hard faults of
the HBA requiring a reboot. Abusively looping the smartctl command,
# while true; do smartctl -a /dev/sdb > /dev/null; done
dramatically increases the frequency of these failures to nearly one per
minute. A high IO load through the HBA while looping smartctl seems to
improve the chance of a full scsi host reset or a non-recoverable hang.
I reduced what smartctl was doing down to a simple test case which
causes the hang with a single IO when pointed at the sd interface. See
the code at the bottom of this e-mail. It uses an SG_IO ioctl to issue
a single pass-through ATA identify device command. If the buffer
userspace gives for the read data has certain alignments, the task is
issued to the HBA but the HBA fails to respond. If run against the sg
interface, neither the test code nor smartctl causes a hang.
sd and sg handle the SG_IO ioctl slightly differently. Unless you
specifically set a flag to do direct IO, sg passes a buffer of its own,
which is page-aligned, to the block layer and later copies the result
into the userspace buffer regardless of its alignment. sd, on the other
hand, always does direct IO unless the userspace buffer fails an
alignment test at block/blk-map.c line 57, in which case a page-aligned
buffer is created and used for the transfer.
The alignment test currently checks for word-alignment, the default
setup by scsi_lib.c; therefore, userspace buffers of almost any
alignment are given directly to the HBA as DMA targets. The LSI 1068
hardware doesn't seem to like at least a couple of the alignments which
cross a page boundary (see the test code below). Curiously, many
page-boundary-crossing alignments do work just fine.
So, either the hardware has an bug handling certain alignments or the
hardware has a stricter alignment requirement than the driver is
advertising. If stricter alignment is required, then in no case should
misaligned buffers from userspace be allowed through without being
bounced or at least causing an error to be returned.
It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
a stricter alignment requirement. If it does, sd does the right thing and
bounces misaligned buffers (see block/blk-map.c line 57). The following
patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the wrong
place for this code, but it gets my idea across.
Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Eric Moore [Thu, 22 Apr 2010 16:47:40 +0000 (10:47 -0600)]
[SCSI] mpt2sas: DIF Type 2 Protection Support
Adding DIF Type 2 protection support, as well as turning on 32 byte cdb's,
and setting the cdb length for > 16 byte in the SCSI_IO->control parameter.
Signed-off-by: Martin Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Moore <eric.moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
srp_task needs to be setup before request_irq. The patch below fixes the oops.
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Christof Schmitt [Mon, 21 Jun 2010 08:11:33 +0000 (10:11 +0200)]
[SCSI] zfcp: Update status read mempool
Commit 64deb6efdc5504ce97b5c1c6f281fffbc150bd93 changed the way status
read buffers are handled but forgot to adjust the mempool to the new
size. Add the call to resize the mempool after the exchange config
data. Also use the define instead of the hard coded number in the fsf
callback for consistency.
Christof Schmitt [Mon, 21 Jun 2010 08:11:32 +0000 (10:11 +0200)]
[SCSI] zfcp: Do not wait for SBALs on stopped queue
Trying to read the FC host statistics on an offline adapter results in
a 5 seconds wait. Reading the statistics tries to issue an exchange
port data request which first waits up to 5 seconds for an entry in
the request queue.
Change the strategy for getting a free SBAL to exit when the queue is
stopped. Reading the statistics will then fail without the wait.
Wayne Boyer [Thu, 3 Jun 2010 23:02:21 +0000 (16:02 -0700)]
[SCSI] ipr: fix resource path display and formatting
It was possible to overflow the buffer used to print out the formatted
version of the resource path. The fix is to limit the number of
bytes that get formatted.
This patch also updates the ipr_show_resource_path function to display the
resource address for devices that are attached to adapters that don't
support resource paths.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
wimax/i2400m: fix missing endian correction read in fw loader
net8139: fix a race at the end of NAPI
pktgen: Fix accuracy of inter-packet delay.
pkt_sched: gen_estimator: add a new lock
net: deliver skbs on inactive slaves to exact matches
ipv6: fix ICMP6_MIB_OUTERRORS
r8169: fix mdio_read and update mdio_write according to hw specs
gianfar: Revive the driver for eTSEC devices (disable timestamping)
caif: fix a couple range checks
phylib: Add support for the LXT973 phy.
net: Print num_rx_queues imbalance warning only when there are allocated queues
Linus Torvalds [Fri, 11 Jun 2010 21:18:47 +0000 (14:18 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: The file argument for fsync() is never null
Btrfs: handle ERR_PTR from posix_acl_from_xattr()
Btrfs: avoid BUG when dropping root and reference in same transaction
Btrfs: prohibit a operation of changing acl's mask when noacl mount option used
Btrfs: should add a permission check for setfacl
Btrfs: btrfs_lookup_dir_item() can return ERR_PTR
Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs
Btrfs: unwind after btrfs_start_transaction() errors
Btrfs: btrfs_iget() returns ERR_PTR
Btrfs: handle kzalloc() failure in open_ctree()
Btrfs: handle error returns from btrfs_lookup_dir_item()
Btrfs: Fix BUG_ON for fs converted from extN
Btrfs: Fix null dereference in relocation.c
Btrfs: fix remap_file_pages error
Btrfs: uninitialized data is check_path_shared()
Btrfs: fix fallocate regression
Btrfs: fix loop device on top of btrfs
Linus Torvalds [Fri, 11 Jun 2010 21:15:44 +0000 (14:15 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: clear bridge resource range if BIOS assigned bad one
PCI: hotplug/cpqphp, fix NULL dereference
Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/"
PCI: change resource collision messages from KERN_ERR to KERN_INFO
Yinghai Lu [Thu, 3 Jun 2010 20:43:03 +0000 (13:43 -0700)]
PCI: clear bridge resource range if BIOS assigned bad one
Yannick found that video does not work with 2.6.34. The cause of this
bug was that the BIOS had assigned the wrong range to the PCI bridge
above the video device. Before 2.6.34 the kernel would have shrunk
the size of the bridge window, but since d65245c PCI: don't shrink bridge resources
the kernel will avoid shrinking BIOS ranges.
So zero out the old range if we fail to claim it at boot time; this will
cause us to allocate a new range at startup, restoring the 2.6.34
behavior.
Jiri Slaby [Wed, 9 Jun 2010 20:31:13 +0000 (22:31 +0200)]
PCI: hotplug/cpqphp, fix NULL dereference
There are devices out there which are PCI Hot-plug controllers with
compaq PCI IDs, but are not bridges, hence have pdev->subordinate
NULL. But cpqphp expects the pointer to be non-NULL.
Add a check to the probe function to avoid oopses like:
BUG: unable to handle kernel NULL pointer dereference at 00000050
IP: [<f82e3c41>] cpqhpc_probe+0x951/0x1120 [cpqphp]
*pdpt = 0000000033779001 *pde = 0000000000000000
...
Bjorn Helgaas [Thu, 3 Jun 2010 19:47:18 +0000 (13:47 -0600)]
PCI: change resource collision messages from KERN_ERR to KERN_INFO
We can often deal with PCI resource issues by moving devices around. In
that case, there's no point in alarming the user with messages like these.
There are many bug reports where the message itself is the only problem,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/413419 .
Dan Carpenter [Sat, 29 May 2010 09:49:07 +0000 (09:49 +0000)]
Btrfs: The file argument for fsync() is never null
The "file" argument for fsync is never null so we can remove this check.
What drew my attention here is that 7ea8085910e: "drop unused dentry
argument to ->fsync" introduced an unconditional dereference at the
start of the function and that generated a smatch warning.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Sage Weil [Mon, 17 May 2010 17:15:27 +0000 (17:15 +0000)]
Btrfs: avoid BUG when dropping root and reference in same transaction
If btrfs_ioctl_snap_destroy() deletes a snapshot but finishes
with end_transaction(), the cleaner kthread may come in and
drop the root in the same transaction. If that's the case, the
root's refs still == 1 in the tree when btrfs_del_root() deletes
the item, because commit_fs_roots() hasn't updated it yet (that
happens during the commit).
This wasn't a problem before only because
btrfs_ioctl_snap_destroy() would commit the transaction before dropping
the dentry reference, so the dead root wouldn't get queued up until
after the fs root item was updated in the btree.
Since it is not an error to drop the root reference and the root in the
same transaction, just drop the BUG_ON() in btrfs_del_root().
Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Shi Weihua [Tue, 18 May 2010 00:51:54 +0000 (00:51 +0000)]
Btrfs: prohibit a operation of changing acl's mask when noacl mount option used
when used Posix File System Test Suite(pjd-fstest) to test btrfs,
some cases about setfacl failed when noacl mount option used.
I simplified used commands in pjd-fstest, and the following steps
can reproduce it.
------------------------
# cd btrfs-part/
# mkdir aaa
# setfacl -m m::rw aaa <- successed, but not expected by pjd-fstest.
------------------------
I checked ext3, a warning message occured, like as:
setfacl: aaa/: Operation not supported
Certainly, it's expected by pjd-fstest.
So, i compared acl.c of btrfs and ext3. Based on that, a patch created.
Fortunately, it works.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Shi Weihua [Tue, 18 May 2010 00:50:32 +0000 (00:50 +0000)]
Btrfs: should add a permission check for setfacl
On btrfs, do the following
------------------
# su user1
# cd btrfs-part/
# touch aaa
# getfacl aaa
# file: aaa
# owner: user1
# group: user1
user::rw-
group::rw-
other::r--
# su user2
# cd btrfs-part/
# setfacl -m u::rwx aaa
# getfacl aaa
# file: aaa
# owner: user1
# group: user1
user::rwx <- successed to setfacl
group::rw-
other::r--
------------------
but we should prohibit it that user2 changing user1's acl.
In fact, on ext3 and other fs, a message occurs:
setfacl: aaa: Operation not permitted
This patch fixed it. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>