Stephen Cameron [Thu, 23 Apr 2015 14:32:22 +0000 (09:32 -0500)]
hpsa: allow lockup detected to be viewed via sysfs
expose a detected lockup via sysfs
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@Suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Stephen Cameron [Thu, 23 Apr 2015 14:32:16 +0000 (09:32 -0500)]
hpsa: hpsa decode sense data for io and tmf
In hba mode, we could get sense data in descriptor format so
we need to handle that.
It's possible for CommandStatus to have value 0x0D
"TMF Function Status", which we should handle. We will get
this from a P1224 when aborting a non-existent tag, for
example. The "ScsiStatus" field of the errinfo field
will contain the TMF function status value.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Stephen Cameron [Thu, 23 Apr 2015 14:32:11 +0000 (09:32 -0500)]
hpsa: decrement h->commands_outstanding in fail_all_outstanding_cmds
make tracking of outstanding commands more robust
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@Suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Stephen Cameron [Thu, 23 Apr 2015 14:32:06 +0000 (09:32 -0500)]
hpsa: clean up aborts
Do not send aborts to logical devices that do not support aborts
Instead of relying on what the Smart Array claims for supporting logical
drives, simply try an abort and see how it responds at device discovery
time. This way devices that do support aborts (e.g. MSA2000) can work
and we do not waste time trying to send aborts to logical drives that do
not support them (important for high IOPS devices.)
While rescanning devices only test whether devices support aborts
the first time we encounter a device rather than every time.
Some Smart Arrays required aborts to be sent with tags in
the wrong endian byte order. To avoid having to know about
this, we would send two aborts with tags with each endian order.
On high IOPS devices, this turns out to be not such a hot idea.
So we now have a list of the devices that got the tag backwards,
and we only send it one way.
If all available commands are outstanding and the abort handler
is invoked, the abort handler may not be able to allocate a command
and may busy-wait excessivly. Reserve a small number of commands
for the abort handler and limit the number of concurrent abort
requests to the number of reserved commands.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@Suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Allow driver initiated commands to have a timeout. It does not
yet try to do anything with timeouts on such commands.
We are sending a reset in order to get rid of a command we want to abort.
If we make it return on the same reply queue as the command we want to abort,
the completion of the aborted command will not race with the completion of
the reset command.
Rename hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd(), since
this function is the interface for issuing commands to the controller and
not the "core" of that implementation. Add a parameter to it which allows
the caller to specify the reply queue to be used. Modify existing callers
to specify the default reply queue.
Rename __hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd_core(),
since this routine is the "core" implementation of the "do simple command"
function and there is no longer any other function with a similar name.
Modify the existing callers of this routine (other than
hpsa_scsi_do_simple_cmd()) to instead call hpsa_scsi_do_simple_cmd(), since
it will now accept the reply_queue paramenter, and it provides a controller
lock-up check. (Also, tweak two related message strings to make them
distinct from each other.)
Submitting a command to a locked up controller always results in a timeout,
so check for controller lock-up before submitting.
This is to enable fixing a race between command completions and
abort completions on different reply queues in a subsequent patch.
We want to be able to specify which reply queue an abort completion
should occur on so that it cannot race the completion of the command
it is trying to abort.
The following race was possible in theory:
1. Abort command is sent to hardware.
2. Command to be aborted simultaneously completes on another
reply queue.
3. Hardware receives abort command, decides command has already
completed and indicates this to the driver via another different
reply queue.
4. driver processes abort completion finds that the hardware does not know
about the command, concludes that therefore the command cannot complete,
returns SUCCESS indicating to the mid-layer that the scsi_cmnd may be
re-used.
5. Command from step 2 is processed and completed back to scsi mid
layer (after we already promised that would never happen.)
Fix by forcing aborts to complete on the same reply queue as the command
they are aborting.
Piggybacking device rescanning functionality onto the lockup
detection thread is not a good idea because if the controller
locks up during device rescanning, then the thread could get
stuck, then the lockup isn't detected. Use separate work
queues for device rescanning and lockup detection.
Detect controller lockup in abort handler.
After a lockup is detected, return DO_NO_CONNECT which results in immediate
termination of commands rather than DID_ERR which results in retries.
Modify detect_controller_lockup() to return the result, to remove the need for
a separate check.
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Signed-off-by: Webb Scales <webbnh@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
We had a mix of formats used for specifying controller, bus, target,
and lun address of devices.
change to the format used by the scsi midlayer and upper layer (2:3:0:0)
so you can easily follow the information from hpsa to scsi midlayer
to sd upper layer.
Also add this information:
- product ID
- vendor ID
- RAID level
- SSD Smath Path capable and enabled
- exposure level (sg-only)
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@Suse.de> Signed-off-by: Robert Elliott <elliott@hp.com> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Stephen Cameron [Thu, 23 Apr 2015 14:31:47 +0000 (09:31 -0500)]
hpsa: add masked physical devices into h->dev[] array
Cache the ioaccel handle so that when we need to abort commands sent
down the ioaccel2 path, we can look up the LUN ID in h->dev[] instead of
having to do I/O to the controller.
Add a field to elements in h->dev[] to keep track of how the device is exposed
to the SCSI mid layer: Not at all, without an upper level driver
(no_uld_attach) or normally exposed.
Since masked physical devices are now present in h->dev[] array
it would be perfectly possible to do
and bring them online. This was previously not allowed for masked
physical devices.
Ensure that the mapping of physical disks to logical drives gets updated in a
consistent way when a RAID migration occurs and is not touched until updates
to it are complete.
now instead of doing CISS_REPORT_PHYSICAL to get the LUNID for
the physical disk in hpsa_get_pdisk_of_ioaccel2(), just get
it out of h->dev[] where we already have it cached.
do not touch phys_disk[] for ioaccel enabled logical drives during rescan
Reviewed-by: Scott Teel <scott.teel@pmcs.com> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Hannes Reinecke <hare@Suse.de> Signed-off-by: Don Brace <don.brace@pmcs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Hannes Reinecke [Fri, 24 Apr 2015 11:18:41 +0000 (13:18 +0200)]
advansys: Remove call to dma_cache_sync()
Only required if the dma buffer has been allocated via
dma_alloc_noncoherent(), which this one is not.
With that call removed we can now also compile on ARM.
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Hannes Reinecke [Fri, 24 Apr 2015 11:18:20 +0000 (13:18 +0200)]
advansys: use host_reset
The advansys_reset() function is actually a host reset, not a
bus reset. And there is no need to have a 'last_reset'
value; the same value exists in struct Scsi_Host.
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
T10 PI is just another optional feature, LLDDs should work without
the infrastructure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
John Soni Jose [Sat, 25 Apr 2015 02:48:25 +0000 (08:18 +0530)]
be2iscsi : Bump the driver version
Bump the driver version
[jejb: resolve conflict with 4627de9 MAINTAINERS, be2iscsi: change email domain] Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
John Soni Jose [Sat, 25 Apr 2015 02:48:13 +0000 (08:18 +0530)]
be2iscsi : Logout of FW Boot Session
Once be2iscsi driver is loaded and operational close Boot
session established by FW.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
John Soni Jose [Sat, 25 Apr 2015 02:47:45 +0000 (08:17 +0530)]
be2iscsi : Fix memory check before unmapping.
Check DMA memory before it is unmapped.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
John Soni Jose [Sat, 25 Apr 2015 02:47:31 +0000 (08:17 +0530)]
be2iscsi : Fix memory leak in the unload path
Driver was not freeing the DMA memory allocated for EQ/CQ in the
unload path. This patch frees the DMA memory during the driver unload.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
John Soni Jose [Sat, 25 Apr 2015 02:47:19 +0000 (08:17 +0530)]
be2iscsi : Fix the PCI request region reserving.
Reserve device PCI I/O and Memory resources.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
John Soni Jose [Sat, 25 Apr 2015 02:46:57 +0000 (08:16 +0530)]
be2iscsi : Fix the retry count for boot targets
Increment the retry count to get the boot target info when
port async event is received by the driver. Update sysfs enteries
with the boot target parameters.
Signed-off-by: Minh Tran <minhduc.tran@emulex.com> Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : swap whole register in megasas_register_aen
Swap the whole 32 bits we read from the hardware instead of swapping
just the 16bits we care about in place later.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
The fusion HBAs don't really use the instance template like the other
variants, as it branches off at a much higher level. So instead of
trying to squeeze megasas_fire_cmd_fusion into the wrong calling
convention call it locally with argument data types that match what
is passed.
[jejb: fix up 32 bit compile failure] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : add missing byte swaps to the sriov code
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : bytewise or should be done on native endian variables
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : move endianness conversion into caller of megasas_get_seq_num
Converting structure fields in place is always a bad idea, and in this case
by moving it into the only caller we also only have to do a single byte
swap as most fields of this structure are never used.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : add endianness conversions for all ones
Add noop conversions for all ones to make sparse happy.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
This adds endianness annotations to all data structures, and a few
variables directly referencing them.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : megasas_complete_outstanding_ioctls() can be static
drivers/scsi/megaraid/megaraid_sas_base.c:1701:6: sparse: symbol 'megasas_complete_outstanding_ioctls' was not declared. Should it be static?
From: Christoph Hellwig <hch@lst.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : Support for Avago's Single server High Availability product
This patch will add support for Single Server High Availability(SSHA) cluster
support. Here is the short decsription of changes done to add support for
SSHA-
1) Host will send system's Unique ID based on DMI_PRODUCT_UUID to firmware.
2) Toggle the devhandle in LDIO path for Remote LDs.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : Add release date and update driver version
This patch will upgrade the driver version and add back the release date and
sysfs hook for the same. Some internal applications uses sysfs parameter for
release date, so they were broken because of removal of release date from
sysfs.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : Modify driver's meta data to reflect Avago
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : Use Block layer tag support for internal command indexing
megaraid_sas driver will use block layer provided tag for indexing internal
MPT frames to get any unique MPT frame tied with tag. Each IO request
submitted from SCSI mid layer will get associated MPT frame from MPT framepool
(retrieved and return back using spinlock inside megaraid_sas driver's
submission/completion call back). Getting MPT frame from MPT Frame pool is
very expensive operation because of associated spin lock operation (spinlock
overhead increase on multi NUMA node). This type of locking in driver is very
expensive call considering each IO request need - Acquire and Release of the
same lock.
With this support, in IO path driver will directly provide the unique command
index(which is based on block layer tag) and will get the MPT frame tied to
the tag and this way driver can get rid off lock, which synchronizes the
access to MPT frame pool while fetching and returning MPT frame from the pool.
This support in driver provides siginificant performance improvement(on multi
NUMA node system)on latest upstream with SCSI.MQ as well as on existing linux
distributions.
Here is the data for test executed at Avago-
- IO Tool- FIO
- 4 Socket SMC server. (4 NUMA node server)
- 12 SSDs in JBOD mode .
- 4K Rand READ, QD=32
- SCSI MQ x86_64 (Latest Upstream kernel)
- upto 300% Performance Improvement.
If IOs are running on single Node, perfromance gain is less, but as soon as
increase number of nodes, performance improvement is significant. IOs running
on all 4 NUMA nodes, with this patch applied IOPs observed was 1170K vs 344K
IOPs seen without this patch.
Logically, there are two parts of this patch- 1) Block layer tag support 2)
changes in calling convention of return_cmd. part 2 will revert the changes
done by patch- 90dc9d9 megaraid_sas : MFI MPT linked list corruption fix
because changes done in part 1 has fixed the problem of MFI MPT linked list
corruption. part 2 is very much dependent on part 1, so we decided to have
single patch for these two logical changes.
[jejb: remove chatty printk pointed out by hch] Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
megaraid_sas : Add separate function for setting up IRQs
This patch will create separate functions for- 1) setting up IRQs for MSI-x
interrupts 2) setting up IRQs for legacy interrupts 3) freeing up IRQs. and
enable interrupts after adapter's initialization. The reason behind
initialising adapter earlier is: by that time firmware is operational and can
send interrupts, so better to use interrupt based interface to send internal
DCMD to firmware instead of using polling method, since MFI frames' pool size
is reduced and polling method does not free up MFI frame for fusion adapters,
so sending more DCMDs with polled method may cause MFI frames's pool go out of
frames and end up failing DCMD.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Dan Carpenter [Tue, 14 Apr 2015 14:32:19 +0000 (17:32 +0300)]
csiostor: fix an error code in csio_hw_init()
We should return -ENOMEM if kzalloc() fails here instead of returning
success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Change driver version to follow the same format as other Chelsio drivers.
Added missing release date back to cxgb4i and version string back to libcxgbi.
Signed-off-by: Karen Xie <kxie@chelsio.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Dan Carpenter [Mon, 19 Jan 2015 14:41:12 +0000 (17:41 +0300)]
sd: fix an error return in probe()
If device_add() fails then it should return the error code but instead
the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
James Bottomley [Sun, 26 Apr 2015 18:52:46 +0000 (11:52 -0700)]
scsi_scan: fix queue depth initialisation problem
Currently we blindly use the value of cmd_per_lun as the initial setting for
queue_depth. This fails miserably (hangs the system) if it is zero, which is
the default value for anything uninitialised in the template. The net result
is that every host template has to set a value for cmd_per_lun. Instead, use
a default value of 1 if the actual value is unset. This should pave the way
for removing cmd_per_lun from all the templates and eventually from SCSI
itself.
Signed-off-by: James Bottomley <JBottomley@Odin.com>
James Smart [Fri, 15 May 2015 17:10:58 +0000 (13:10 -0400)]
MAINTAINERS: Revise lpfc maintainers for Avago Technologies ownership of Emulex
The old email addresses will go away very soon. Revising with new addresses.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com> Signed-off-by: James Smart <james.smart@avagotech.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Minh Tran [Fri, 15 May 2015 06:16:17 +0000 (23:16 -0700)]
MAINTAINERS, be2iscsi: change email domain
be2iscsi change of ownership from Emulex to Avago Technologies recently. We
like to get the following updates in: changed "Emulex" to "Avago
Technologies", changed email addresses from "emulex.com" to "avagotech.com",
updated MAINTAINER list for be2iscsi driver.
Mark Hounschell [Wed, 13 May 2015 08:49:09 +0000 (10:49 +0200)]
sd: Disable support for 256 byte/sector disks
256 bytes per sector support has been broken since 2.6.X,
and no-one stepped up to fix this.
So disable support for it.
Signed-off-by: Mark Hounschell <dmarkh@cfl.rr.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <JBottomley@Odin.com>
storvsc: Set the SRB flags correctly when no data transfer is needed
Set the SRB flags correctly when there is no data transfer. Without this
change some IHV drivers will fail valid commands such as TEST_UNIT_READY.
Cc: <stable@vger.kernel.org> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
The 3w-9xxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
The 3w-xxxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
The 3w-sas driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Torsten Luettgert <ml-lkml@enda.eu> Tested-by: Bernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de> Cc: stable@vger.kernel.org Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Mike Christie [Tue, 21 Apr 2015 03:42:24 +0000 (22:42 -0500)]
SCSI: add 1024 max sectors black list flag
This works around a issue with qnap iscsi targets not handling large IOs
very well.
The target returns:
VPD INQUIRY: Block limits page (SBC)
Maximum compare and write length: 1 blocks
Optimal transfer length granularity: 1 blocks
Maximum transfer length: 4294967295 blocks
Optimal transfer length: 4294967295 blocks
Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
Maximum unmap LBA count: 8388607
Maximum unmap block descriptor count: 1
Optimal unmap granularity: 16383
Unmap granularity alignment valid: 0
Unmap granularity alignment: 0
Maximum write same length: 0xffffffff blocks
Maximum atomic transfer length: 0
Atomic alignment: 0
Atomic transfer length granularity: 0
and it is *sometimes* able to handle at least one IO of size up to 8 MB. We
have seen in traces where it will sometimes work, but other times it
looks like it fails and it looks like it returns failures if we send
multiple large IOs sometimes. Also it looks like it can return 2 different
errors. It will sometimes send iscsi reject errors indicating out of
resources or it will send invalid cdb illegal requests check conditions.
And then when it sends iscsi rejects it does not seem to handle retries
when there are command sequence holes, so I could not just add code to
try and gracefully handle that error code.
The problem is that we do not have a good contact for the company,
so we are not able to determine under what conditions it returns
which error and why it sometimes works.
So, this patch just adds a new black list flag to set targets like this to
the old max safe sectors of 1024. The max_hw_sectors changes added in 3.19
caused this regression, so I also ccing stable.
Reported-by: Christian Hesse <list@eworm.de> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Andy Lutomirski [Sun, 26 Apr 2015 23:47:59 +0000 (16:47 -0700)]
x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.
Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.
This was exposed by a recent vDSO cleanup.
Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull intel drm fixes from Dave Airlie.
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: vlv: fix save/restore of GFX_MAX_REQ_COUNT reg
drm/i915: Workaround to avoid lite restore with HEAD==TAIL
drm/i915: cope with large i2c transfers
Pull intel iommu updates from David Woodhouse:
"This lays a little of the groundwork for upcoming Shared Virtual
Memory support — fixing some bogus #defines for capability bits and
adding the new ones, and starting to use the new wider page tables
where we can, in anticipation of actually filling in the new fields
therein.
It also allows graphics devices to be assigned to VM guests again.
This got broken in 3.17 by disallowing assignment of RMRR-afflicted
devices. Like USB, we do understand why there's an RMRR for graphics
devices — and unlike USB, it's actually sane. So we can make an
exception for graphics devices, just as we do USB controllers.
Finally, tone down the warning about the X2APIC_OPT_OUT bit, due to
persistent requests. X2APIC_OPT_OUT was added to the spec as a nasty
hack to allow broken BIOSes to forbid us from using X2APIC when they
do stupid and invasive things and would break if we did.
Someone noticed that since Windows doesn't have full IOMMU support for
DMA protection, setting the X2APIC_OPT_OUT bit made Windows avoid
initialising the IOMMU on the graphics unit altogether.
This means that it would be available for use in "driver mode", where
the IOMMU registers are made available through a BAR of the graphics
device and the graphics driver can do SVM all for itself.
So they started setting the X2APIC_OPT_OUT bit on *all* platforms with
SVM capabilities. And even the platforms which *might*, if the
planets had been aligned correctly, possibly have had SVM capability
but which in practice actually don't"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: support extended root and context entries
iommu/vt-d: Add new extended capabilities from v2.3 VT-d specification
iommu/vt-d: Allow RMRR on graphics devices too
iommu/vt-d: Print x2apic opt out info instead of printing a warning
iommu/vt-d: kill bogus ecap_niotlb_iunits()
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"This has a mixture of merge window cleanups and bugfixes"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: st: add include for pinctrl
i2c: mux: use proper dev when removing "channel-X" symlinks
i2c: digicolor: remove duplicate include
i2c: Mark adapter devices with pm_runtime_no_callbacks
i2c: pca-platform: fix broken email address
i2c: mxs: fix broken email address
i2c: rk3x: report number of messages transmitted
Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Filipe hit two problems in my block group cache patches. We finalized
the fixes last week and ran through more tests"
* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: prevent list corruption during free space cache processing
Btrfs: fix inode cache writeout
Dave Airlie [Mon, 27 Apr 2015 00:35:15 +0000 (10:35 +1000)]
Merge tag 'drm-intel-next-fixes-2015-04-25' of git://anongit.freedesktop.org/drm-intel into drm-fixes
three fixes for i915.
* tag 'drm-intel-next-fixes-2015-04-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: vlv: fix save/restore of GFX_MAX_REQ_COUNT reg
drm/i915: Workaround to avoid lite restore with HEAD==TAIL
drm/i915: cope with large i2c transfers
Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Another set of mainly bugfixes and a couple of cleanups. No new
functionality in this round.
Highlights include:
Stable patches:
- Fix a regression in /proc/self/mountstats
- Fix the pNFS flexfiles O_DIRECT support
- Fix high load average due to callback thread sleeping
Bugfixes:
- Various patches to fix the pNFS layoutcommit support
- Do not cache pNFS deviceids unless server notifications are enabled
- Fix a SUNRPC transport reconnection regression
- make debugfs file creation failure non-fatal in SUNRPC
- Another fix for circular directory warnings on NFSv4 "junctioned"
mountpoints
- Fix locking around NFSv4.2 fallocate() support
- Truncating NFSv4 file opens should also sync O_DIRECT writes
- Prevent infinite loop in rpcrdma_ep_create()
Features:
- Various improvements to the RDMA transport code's handling of
memory registration
- Various code cleanups"
* tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
fs/nfs: fix new compiler warning about boolean in switch
nfs: Remove unneeded casts in nfs
NFS: Don't attempt to decode missing directory entries
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
NFS: Rename idmap.c to nfs4idmap.c
NFS: Move nfs_idmap.h into fs/nfs/
NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
NFS: Add a stub for GETDEVICELIST
nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
nfs: fix DIO good bytes calculation
nfs: Fetch MOUNTED_ON_FILEID when updating an inode
sunrpc: make debugfs file creation failure non-fatal
nfs: fix high load average due to callback thread sleeping
NFS: Reduce time spent holding the i_mutex during fallocate()
NFS: Don't zap caches on fallocate()
xprtrdma: Make rpcrdma_{un}map_one() into inline functions
xprtrdma: Handle non-SEND completions via a callout
xprtrdma: Add "open" memreg op
xprtrdma: Add "destroy MRs" memreg op
xprtrdma: Add "reset MRs" memreg op
...
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
Merge tag 'pm+acpi-4.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management and ACPI updates from Rafael Wysocki:
"These are fixes mostly (intel_pstate, ACPI core, ACPI EC driver,
cpupower tool), a new CPU ID for the Intel RAPL driver and one
intel_pstate driver improvement that didn't make it to my previous
pull requests due to timing.
Specifics:
- Fix a build warning in the intel_pstate driver showing up in
non-SMP builds (Borislav Petkov)
- Change one of the intel_pstate's P-state selection parameters for
Baytrail and Cherrytrail CPUs to significantly improve performance
at the cost of a small increase in energy consumption (Kristen
Carlson Accardi)
- Fix a NULL pointer dereference in the ACPI EC driver due to an
unsafe list walk in the query handler removal routine (Chris
Bainbridge)
- Get rid of a false-positive lockdep warning in the ACPI container
hot-remove code (Rafael J Wysocki)
- Prevent the ACPI device enumeration code from creating device
objects of a wrong type in some cases (Rafael J Wysocki)
- Add Skylake processors support to the Intel RAPL power capping
driver (Brian Bian)
- Drop the stale MAINTAINERS entry for the ACPI dock driver that is
regarded as part of the ACPI core and maintained along with it now
(Chao Yu)
- Fix cpupower tool breakage caused by a library API change in libpci
3.3.0 (Lucas Stach)"
* tag 'pm+acpi-4.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / scan: Add a scan handler for PRP0001
ACPI / scan: Annotate physical_node_lock in acpi_scan_is_offline()
ACPI / EC: fix NULL pointer dereference in acpi_ec_remove_query_handler()
MAINTAINERS: remove maintainship entry of docking station driver
powercap / RAPL: Add support for Intel Skylake processors
cpufreq: intel_pstate: Fix an annoying !CONFIG_SMP warning
intel_pstate: Change the setpoint for Atom params
cpupower: fix breakage from libpci API change
Pull crypto fixes from Herbert Xu:
"This push fixes a build problem with img-hash under non-standard
configurations and a serious regression with sha512_ssse3 which can
lead to boot failures"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA
crypto: x86/sha512_ssse3 - fixup for asm function prototype change
Merge tag 'platform-drivers-x86-v4.1-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver updates from Darren Hart:
"This series includes significant updates to the toshiba_acpi driver
and the reintroduction of the dell-laptop keyboard backlight additions
I had to revert previously. Also included are various fixes for
typos, warnings, correctness, and minor bugs.
Specifics:
dell-laptop:
- add support for keyboard backlight.
toshiba_acpi:
- adaptive keyboard, hotkey, USB sleep and charge, and backlight
updates. Update sysfs documentation.
toshiba_bluetooth:
- fix enabling/disabling loop on recent devices
apple-gmux:
- lock iGP IO to protect from vgaarb changes
* tag 'platform-drivers-x86-v4.1-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (25 commits)
toshiba_acpi: Do not register vendor backlight when acpi_video bl is available
MAINTAINERS: Add me on list of Dell laptop drivers
platform: x86: dell-laptop: Add support for keyboard backlight
Documentation/ABI: Update sysfs-driver-toshiba_acpi entry
toshiba_acpi: Fix pr_* messages from USB Sleep Functions
toshiba_acpi: Update and fix USB Sleep and Charge modes
wmi: Use bool function return values of true/false not 1/0
toshiba_bluetooth: Fix enabling/disabling loop on recent devices
toshiba_bluetooth: Clean up *_add function and disable BT device at removal
toshiba_bluetooth: Add three new functions to the driver
toshiba_acpi: Fix the enabling of the Special Functions
toshiba_acpi: Use the Hotkey Event Type function for keymap choosing
toshiba_acpi: Add Hotkey Event Type function and definitions
x86/wmi: delete unused wmi_data_lock mutex causing gcc warning
apple-gmux: lock iGP IO to protect from vgaarb changes
MAINTAINERS: Add missing Toshiba devices and add myself as maintainer
toshiba_acpi: Update events in toshiba_acpi_notify
intel-oaktrail: Fix trivial typo in comment
thinkpad_acpi: off by one in adaptive_keyboard_hotkey_notify_hotkey()
thinkpad_acpi: signedness bugs getting current_mode
...
Merge tag 'chrome-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
Pull chrome platform updates from Olof Johansson:
"Here's a set of updates to the Chrome OS platform drivers for this
merge window.
Main new things this cycle is:
- Driver changes to expose the lightbar to users. With this, you can
make your own blinkenlights on Chromebook Pixels.
- Changes in the way that the atmel_mxt trackpads are probed. The
laptop driver is trying to be smart and not instantiate the devices
that don't answer to probe. For the trackpad that can come up in
two modes (bootloader or regular), this gets complicated since the
driver already knows how to handle the two modes including the
actual addresses used. So now the laptop driver needs to know more
too, instantiating the regular address even if the bootloader one
is the probe that passed.
- mfd driver improvements by Javier Martines Canillas, and a few
bugfixes from him, kbuild and myself"
* tag 'chrome-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
platform/chrome: chromeos_laptop - instantiate Atmel at primary address
platform/chrome: cros_ec_lpc - Depend on X86 || COMPILE_TEST
platform/chrome: cros_ec_lpc - Include linux/io.h header file
platform/chrome: fix platform_no_drv_owner.cocci warnings
platform/chrome: cros_ec_lightbar - fix duplicate const warning
platform/chrome: cros_ec_dev - fix Unknown escape '%' warning
platform/chrome: Expose Chrome OS Lightbar to users
platform/chrome: Create sysfs attributes for the ChromeOS EC
mfd: cros_ec: Instantiate ChromeOS EC character device
platform/chrome: Add Chrome OS EC userspace device interface
platform/chrome: Add cros_ec_lpc driver for x86 devices
mfd: cros_ec: Add char dev and virtual dev pointers
mfd: cros_ec: Use fixed size arrays to transfer data with the EC
Merge tag 'cris-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris
Pull arch/cris updates from Jesper Nilsson:
"Some much needed love for the CRIS-port.
There's a bunch of changes this time, giving the CRISv32 port a bit of
modern makeover with device-tree, irq domain and gpiolib support, and
more switchover to generic frameworks.
Some small fixes and removal of the theoretical SMP support brings up
the rear"
* tag 'cris-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris:
cris: fix integer overflow in ELF_ET_DYN_BASE
CRISv32: use GENERIC_SCHED_CLOCK
CRISv32: use MMIO clocksource
CRISv32: use generic clockevents
CRIS: use generic headers via Kbuild
CRIS: use generic cmpxchg.h
CRIS: use generic atomic.h
CRIS: use generic atomic bitops
CRISv10: remove redundant macros from system.h
CRIS: remove SMP code
CRISv32: don't enable irqs in INIT_THREAD
CRISv32: handle multiple signals
CRISv32: prevent bogus restarts on sigreturn
CRISv32: don't attempt syscall restart on irq exit
Add binding documentation for CRIS
CRIS: add Axis 88 board device tree
CRISv32: add device tree support
CRISv32: add irq domains support
CRIS: enable GPIOLIB
Merge tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
- fix for mm_dec_nr_pmds() from Scott.
- fixes for oopses seen with KVM + THP from Aneesh.
- build fixes from Aneesh & Shreyas.
* tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled
powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error
powerpc/mm/thp: Return pte address if we find trans_splitting.
powerpc/mm/thp: Make page table walk safe against thp split/collapse
KVM: PPC: Remove page table walk helpers
KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second batch of KVM changes from Paolo Bonzini:
"This mostly includes the PPC changes for 4.1, which this time cover
Book3S HV only (debugging aids, minor performance improvements and
some cleanups). But there are also bug fixes and small cleanups for
ARM, x86 and s390.
The task_migration_notifier revert and real fix is still pending
review, but I'll send it as soon as possible after -rc1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits)
KVM: arm/arm64: check IRQ number on userland injection
KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi
KVM: VMX: Preserve host CR4.MCE value while in guest mode.
KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
KVM: PPC: Book3S HV: Streamline guest entry and exit
KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
KVM: PPC: Book3S HV: Use decrementer to wake napping threads
KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
KVM: PPC: Book3S HV: Minor cleanups
KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
KVM: PPC: Book3S HV: Add ICP real mode counters
KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
KVM: PPC: Book3S HV: Add guest->host real mode completion counters
KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
...
platform/chrome: chromeos_laptop - instantiate Atmel at primary address
The new Atmel MXT driver expects i2c client's address contain the
primary (main address) of the chip, and calculates the expected
bootloader address form the primary address. Unfortunately chrome_laptop
does probe the devices and if touchpad (or touchscreen, or both) comes
up in bootloader mode the i2c device gets instantiated with the
bootloader address which confuses the driver.
To work around this issue let's probe the primary address first. If the
device is not detected at the primary address we'll probe alternative
addresses as "dummy" devices. If any of them are found, destroy the
dummy client and instantiate client with proper name at primary address
still.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Al Viro [Fri, 24 Apr 2015 19:47:07 +0000 (15:47 -0400)]
RCU pathwalk breakage when running into a symlink overmounting something
Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something. And in such cases we end up
with excessive mntput().
Cc: stable@vger.kernel.org # since 2.6.39 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
direct-io: only inc/dec inode->i_dio_count for file systems
do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.
For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:
In other setups, Robert Elliott reported seeing good performance
improvements:
https://lkml.org/lkml/2015/4/3/557
The more applications accessing the device, the worse it gets.
Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Elliott, Robert (Server Storage) <elliott@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Johannes Berg [Wed, 22 Apr 2015 09:55:14 +0000 (11:55 +0200)]
fs/9p: fix readdir()
Al Viro's IOV changes broke 9p readdir() because the new code
didn't abort the read when it returned nothing. The original
code checked if the combined error/length was <= 0 but in the
new code that accidentally got changed to just an error check.
Add back the return from the function when nothing is read.
Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: e1200fe68f20 ("9p: switch p9_client_read() to passing struct iov_iter *") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Chris Mason [Fri, 24 Apr 2015 18:00:00 +0000 (11:00 -0700)]
Btrfs: prevent list corruption during free space cache processing
__btrfs_write_out_cache is holding the ctl->tree_lock while it prepares
a list of bitmaps to record in the free space cache. It was dropping
the lock while it worked on other components, which made a window for
free_bitmap() to free the bitmap struct without removing it from the
list.
This changes things to hold the lock the whole time, and also makes sure
we hold the lock during enospc cleanup.
Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>