[SCSI] mvsas: add support for Task collector mode and fixed relative bugs
1. Add support for Task collector mode.
2. Fixed relative collector mode bug:
- I/O failed when disks is on two ports
- system hang when hotplug disk
- system hang when unplug disk during run IO
3. Unlock ap->lock within .lldd_execute_task for direct mode to
improve performance
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Martin George [Tue, 26 Apr 2011 12:57:05 +0000 (18:27 +0530)]
[SCSI] scsi_dh_alua: Attach to UNAVAILABLE/OFFLINE AAS devices
The SCSI ALUA handler currently fails to attach to devices
reporting an UNAVAILABLE/OFFLINE AAS. But given that an
UNAVAILABLE/OFFLINE AAS can transition to other states
like ACTIVE/OPTIMIZED, ACTIVE/NON-OPTIMIZED, etc. as per
SPC4, this ALUA handler behavior should be rectified so
as to attach to devices which also report an
UNAVAILABLE/OFFLINE AAS.
Signed-off-by: Martin George <marting@netapp.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] bnx2fc: call scsi_done if session goes to not ready from ready
If the session is not ready yet, we ask the SCSI-ml to retry. However, if the
session is just uploaded, we should not retry, but instead call scsi_done to
fail the IO.
fc_remove_host() waits for flushing the workqueue, but it is stuck at flushing
the first work. The first work doesnt complete, because it is waiting for async
layer to complete the IOs. The async layer cannot complete the IO as the
terminate_rport_io for the second work was not called, which will be called
only when the first work completes. Hence the deadlock. To resolve this
deadlock, the workqueue allocation has been modified from
create_singlethread_workqueue() to alloc_workqueue().
In addition, fc_terminate_rport_io() should be called before the
scsi_flush_work() to avoid the similar deadlock as above.
scsi fc alloc queue. move terminate rport io before flush
James Smart [Sat, 16 Apr 2011 15:03:52 +0000 (11:03 -0400)]
[SCSI] lpfc 8.3.23: Update driver version to 8.3.23
Update driver version to 8.3.22
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Sat, 16 Apr 2011 15:03:43 +0000 (11:03 -0400)]
[SCSI] lpfc 8.3.23: BSG additions and fixes
- Fixed the mixed declarations and codes which violate ISO C90
(declarations in subsections that assign at declaration)
- Add BSG data transfer size protection in mailbox command pass-through path
- Invoke BSG job_done while holding spinlock to fix deadlock
- Added support for checking SLI_CONFIG subcommands
- Fixed bug in BSG mailbox size check to non-embedded external buffer
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Sat, 16 Apr 2011 15:03:33 +0000 (11:03 -0400)]
[SCSI] lpfc 8.3.23: Fixes related to new hardware
Fixes related to new hardware
- Restrict driver to look at BAR2 or BAR4 only for if_type 0.
- Allow SLI4 with FCOE_MODE not set for new SLI4 FC adapters.
- Add Temporary RPI field to the ELS request WQE.
- Do not override CT field in issue_els_flogi for SLI4 IF type 2
- For RQ_CREATE_V2 mbx cmd: fill in the rqe_size and page_size for RQ_CREATE.
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Sat, 16 Apr 2011 15:03:17 +0000 (11:03 -0400)]
[SCSI] lpfc 8.3.23: Miscellaneous fixes
Miscellaneous fixes
- Do not limit RPI Count to a minimum of 64
- Fix FCFI incorrect on received unsolicited frames.
- Save the FCFI returned in the REG_FCFI mailbox command if it was successful.
- Fixed Vports not sending FDISC after lips.
- Align based on the SLI4_PAGE_SIZE.
- Fixed double byte swap on received RRQ.
- Fixed mask size for the wq_id mask from 0x7F to 0x7FFF.
- Clear FC_FABRIC flag when NPIV LOGO completes (and add a log message).
- Modified driver to skip round robin only when ulpStatus==LOCAL_REJECT
and word4=SEQUENCE_TIMEOUT to prevent FLOGI to disconnected FCF.
- Don't add rport if driver unloading
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
James Smart [Sat, 16 Apr 2011 15:03:04 +0000 (11:03 -0400)]
[SCSI] lpfc 8.3.23: Debugfs enhancements
Debugfs enhancements
- Added iDiag support for new adapters.
- Added queue entry access methods.
- Fix host/port index in decimal
- Added Doorbell register access methods.
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] bfa: Move debugfs initialization before bfa init.
Move the initialization of debugfs before bfa init, to enable us to
collect driver/firmware traces if init fails. Also add a printk to
display message on bfa_init failure.
Signed-off-by: Krishna Gudipati <kgudipat@brocade.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch includes fixes for two issues releated to firmware download
implementation: 1) Merged memory leak fix provided by Jesper Juhl
<jj@chaosbits.net>. Basically we need to call release_firmware() after
request_firmware(). 2) fixed issues with the firmware download interface
as pointed out by Rolf Eike Beer <eike@sf-mail.de> in linux-scsi. Rearranged
the code and fixed related function protypes.
Signed-off-by: Jing Huang <huangj@brocade.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Wayne Boyer [Tue, 12 Apr 2011 17:29:02 +0000 (10:29 -0700)]
[SCSI] ipr: improve interrupt service routine performance
During performance testing on P7 machines it was observed that the interrupt
service routine was doing unnecessary MMIO operations.
This patch rearranges the logic of the routine and moves some of the code out
of the main routine. The result is that there are now fewer MMIO operations in
the performance path of the code.
As a result of the above change, an existing condition was exposed where the
driver could get an "unexpected" hrrq interrupt. The original code would flag
the interrupt as unexpected and then reset the adapter. After further analysis
it was confirmed that this condition can occasionally occur and that the
interrupt can safely be ignored. Additional code in this patch detects this
condition, clears the interrupt and allows the driver to continue without
resetting the adapter.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] scsi_dh_rdac : decide whether to send mode select based on operating mode
Based on the operating modes, handler decides whether to send mode
select or not. Purpose here is to reduce io-shipping as much as
possible whenever there is an option.
Wayne Boyer [Thu, 7 Apr 2011 19:12:30 +0000 (12:12 -0700)]
[SCSI] ipr: remove unneeded volatile declarations
This patch removes three volatile declarations based on some feedback and code
analysis.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] mpt2sas : WarpDrive New product SSS6200 support added
This patch has Support for the new solid state device product SSS6200
from LSI and relavent features w.r.t SSS6200.
The major feature added in this driver is supporting Direct-I/O to the
SSS6200 storage.There are some additional changes done to avoid exposing
the RAID member disks to the OS and hiding/exposing drives based on the
OEM Specific Flag in Manufacturing Page10 (this is required to handle
specific changes in the SSS6200 firmware).
Each and every changes are listed below.
1. Hiding IR related messages.
For SSS6200, the driver is modified not to print IR related events.
Even if the debugging is enabled the IR related messages will not be displayed.
In some places if there is a need to display a message related to IR the
string "IR" is replaced with string "DD" and the string "volume" is replaced
with "direct drive". But the function names are not changed hence there are
some places where the reference to volume can be seen if debug level is set.
2. Removed RAID transport support
In Linux the user can retrieve RAID volume information from the sysfs directory.
This support is removed for SSS6200.
3. Direct I/O support.
The driver tries to enable direct I/O when a volume is reported to the driver
by the firmware through IRCC events and the driver does this just before
reporting to the OS, hence all the OS issued I/O can go through direct path
if they can, The first validation is to see whether the manufacturing page10
flag is set to expose all drives always. If that is set, the driver will not
enable direct I/O and displays the message "DDIO" is disabled globally as
drives are exposed. The driver checks whether there is more than one volume
in the controller, if so the direct I/O will be disabled globally for all
volumes in the controller and the message displayed will be "DDIO is disabled
globally as number of drives > 1.
If retrieving number of PD is failed the driver will not enable direct I/O
and displays the message Failure in computing number of drives DDIO disabled.
If memory allocation for RAIDVolumePage0 is failed, the driver will not enable
direct I/O and displays the message Memory allocation failure for
RVPG0 DDIO disabled. If retrieving RAIDVolumePage0 is failed the driver will
not enable direct I/O and displays the message Failure in retrieving
RVPG0 DDIO disabled
If the number of PD in a volume is greater than 8, then the direct I/O will
be disabled.
If any of individual drives handle retrieval is failed then the DD-IO will
be disabled.
If the volume is not RAID0 or if the block size is not 512 then the DD-IO will
be disabled.
If the volume size is greater than 2TB then the DD-IO will be disabled.
If the driver is not able to find a valid stripe exponent using the configured
stripe size then the DD-IO will be disabled
When the DD-IO is enabled the driver will check every I/O request issued to
the storage and checks whether the request is either
READ6/WRITE6/READ10/WRITE10, if it is and if the complete I/O transfer
is within a stripe size then the I/O is redirected to
the drive directly instead of the volume.
On completion of every I/O, if the completion is failure means if the reply
is address reply with a reply frame associated with it, then the type of I/O
will be checked, if the I/O is direct then the I/O will be retried to
the volume once.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Reviewed-by: Eric Moore <eric.moore@lsi.com> Reviewed-by: Sathya Prakash <sathya.prakash@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] fusion: do not check serial_number in the abort handler
The SCSI midlayer stops all command processing when in error handling, which
means there is no chance for command reuse when the abort handler is called.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] mpt2sas: do not check serial_number in the abort handler
The SCSI midlayer stops all command processing when in error handling, which
means there is no chance for command reuse when the abort handler is called.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: "Moore, Eric" <Eric.Moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] usb-storage: do not increment cmd->serial_number
The isd200 sub-driver increments the command serial number despite not
using it at all in it's routine for sending internal scsi commands.
Remove the increment to prepare for removing the serial_number field.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Vasu Dev [Fri, 1 Apr 2011 23:06:45 +0000 (16:06 -0700)]
[SCSI] fcoe: have fcoe log off and lport destroy before ndo_fcoe_disable
Currently fcoe interface cleanup is done after ndo_fcoe_disable
and that prevents logoff going out to the peer, so this patch
moves all netdev cleanup and its releasing inside
fcoe_interface_cleanup to have log off before ndo_fcoe_disable
disables the fcoe.
This patch also fixes asymmetric rtnl locking around fcoe_if_destroy,
as currently this function requires rtnl held by its caller
and then have this func drops the lock, instead now don't have
any processing under rtnl inside fcoe_if_destroy, this required
moving few func to get build working again.
Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Vasu Dev [Fri, 1 Apr 2011 23:06:40 +0000 (16:06 -0700)]
[SCSI] libfc: rec tov value and REC_TOV_CONST units usages is incorrect
Added REC_TOV_CONST intent was to have rec tov as e_d_tov + 1s
but currently it is e_d_tov + 1ms since e_d_tov is stored in ms
unit.
Also returned rec tov by get_fsp_rec_tov is in ms and this ms tov
is used as-is with fc_fcp_timer_set expecting jiffies tov.
Fixed this by having get_fsp_rec_tov return rec tov in jiffies
as e_d_tov + 1s and then use jiffies tov w/ fc_fcp_timer_set.
Also some cleanup, no need to cache get_fsp_rec_tov return value
in local rec_tov at various places.
Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Vasu Dev [Fri, 1 Apr 2011 23:06:35 +0000 (16:06 -0700)]
[SCSI] libfc: remove duplicate ema_list init
As ema_list is already initialized by libfc_host_alloc.
Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Yi Zou [Fri, 1 Apr 2011 23:06:30 +0000 (16:06 -0700)]
[SCSI] libfcoe: fix wrong comment in fcoe_transport_detach
fix typo of '_attach' -> '_detach' in the comment.
Reported-by: Frank Zhang <frank_1.zhang@intel.com> Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Yi Zou [Fri, 1 Apr 2011 23:06:25 +0000 (16:06 -0700)]
[SCSI] libfcoe: fix possible buffer overflow in fcoe_transport_show
possible buffer overflow in fcoe_transport_show when reaching the end of
buffer and crossing PAGE_SIZE boundary.
Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Yi Zou [Fri, 1 Apr 2011 23:06:19 +0000 (16:06 -0700)]
[SCSI] libfcoe: clean up netdev mapping properly when the transport goes away
When rmmoving the underlying fcoe transport driver module by force when
it's attached and in use, the correspoding netdev mapping should be
cleaned up properly as well, otherwise the lookup for a given netdev
for the transport would still return non NULL pointer, causing "unable
to handle paging request" bug.
Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Robert Love [Fri, 1 Apr 2011 23:06:14 +0000 (16:06 -0700)]
[SCSI] libfc: Move host_lock usage into ramp_up/down routines
The host_lock is still used to protect the can_queue
value in the Scsi_Host, but it doesn't need to be held
and released by each caller. This patch moves the lock
usage into the fc_fcp_can_queue_ramp_up and
fc_fcp_can_queue_ramp_down routines.
Signed-off-by: Robert Love <robert.w.love@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] esp, scsi_tgt_lib, fcoe: use list_move() instead of list_del()/list_add() combination
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Yi Zou [Fri, 1 Apr 2011 23:06:04 +0000 (16:06 -0700)]
[SCSI] fcoe: remove unnecessary module state check
The check of module state being MODULE_STATE_LIVE is no longer needed for the
individual fcoe transport driver, e.g., fcoe.ko, as sysfs entries now go to
libfcoe now, if it reaches fcoe.ko, it has to be already registered. The module
state check for libfcoe will guard the possible race condition of sysfs being
writable before module_init function is called and after module_exit.
Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
These checks were initially added to avoid a lockdep
false positive when dealing with the s_active, rtnl
and fcoe_config_mutex mutexes. Recently the create,
destroy, enable and disable sysfs entries were moved
from fcoe.ko to libfcoe.ko. With this change the mutex
usage was shuffled around and the lockdep false
positive stopped happening. We can now remove these
checks.
Signed-off-by: Robert Love <robert.w.love@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This code was incorrectly ported from fcoe.c when the
fcoe transport infrastructure was put into place. It
was originally needed in fcoe.c when dealing with
the rtnl mutex. In that code it was only needed to
avoid a lockdep false positive. In libfcoe we don't
deal with the rtnl mutex, we don't get the lockdep
false positive and therefore we don't need these
checks.
Signed-off-by: Robert Love <robert.w.love@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Wayne Boyer [Thu, 31 Mar 2011 16:56:10 +0000 (09:56 -0700)]
[SCSI] ipr: fix synchronous request flags for better performance
In testing it was noticed that Extended Delay after Reset flag was being set
for gscsi and volume set devices. This had a negative effect on performance
for volume sets. The fix is to only set the flag for gscsi devices.
Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Chad Dupuis [Wed, 30 Mar 2011 18:46:32 +0000 (11:46 -0700)]
[SCSI] qla2xxx: Log fcport state transitions when debug messages are enabled.
Add the inline function qla2x00_set_port_state() so that when a fcport state
transition happens we can log the state transition if debug messages are
enabled.
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Madhuranath Iyengar <Madhu.Iyengar@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Andrew Vasquez [Wed, 30 Mar 2011 18:46:31 +0000 (11:46 -0700)]
[SCSI] qla2xxx: Verify login-state has transitioned to PRLI-completed.
Before driver's own internal state is marked as PLOGI/PRLI
complete. This additional check closes a window seen with
dual-personality initiator/target devices where a driver's
PLOGI/PRLI request occurs within the window after the target's
PLOGI request has completed, but prior to the target's PRLI
arriving and processed by the firmware. Without this additional
check, the firmware will return port-information stating that the
port neither supports target nor initiator functions, causing the
driver to register the rport prematurely to the FC-transport
without the proper 'roles' being set.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: Madhuranath Iyengar <Madhu.Iyengar@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Mike Hernandez [Wed, 30 Mar 2011 18:46:20 +0000 (11:46 -0700)]
[SCSI] qla2xxx: Include request queue ID in the upper 16-bits of the I/O handle for Abort I/O IOCBs.
The upper 16-bits of the handle for all I/O in multi-queue supported
drivers carries the ID of the request queue it was submitted on. When
using Abort I/O IOCB, the driver needs to also populate the upper
16-bits in the handle_to_abort field so the fw can correlate with the
actual I/O.
Signed-off-by: Mike Hernandez <michael.hernandez@qlogic.com> Signed-off-by: Madhuranath Iyengar <Madhu.Iyengar@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Andrew Vasquez [Wed, 30 Mar 2011 18:46:18 +0000 (11:46 -0700)]
[SCSI] qla2xxx: Check for a match before attempting to set FCP-priority information.
Modifying qla24xx_get_fcp_prio() to return a 'found' status
allows the driver to short circuit the 'set FCP-priority' call
and reduce the amount of noise generated in the messages file:
scsi(5): Unable to activate fcp priority, ret=0x102
scsi(5): Unable to activate fcp priority, ret=0x102
Also make qla24xx_get_fcp_prio() static.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Madhuranath Iyengar <Madhu.Iyengar@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: only force kblockd unplugging from the schedule() path
block: cleanup the block plug helper functions
block, blk-sysfs: Use the variable directly instead of a function call
block: move queue run on unplug to kblockd
block: kill queue_sync_plugs()
block: readd plug trace event
block: add callback function for unplug notification
block: add comment on why we save and disable interrupts in flush_plug_list()
block: fixup block IO unplug trace call
block: remove block_unplug_timer() trace point
block: splice plug list to local context
Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix compilation warnings when compiling with gcc 4.5
UBIFS: fix oops when R/O file-system is fsync'ed
vfs: fix incorrect dentry_update_name_case() BUG_ON() test
The case we should be verifying when updating the dentry name is that
the _parent_ inode (the directory) semaphore is held, not the semaphore
for the dentry itself. It's the directory locking that rename and
readdir() etc all care about.
The comment just above even says so - but then the BUG_ON() still
checked the dentry inode itself.
Very few people noticed, because this helper function really isn't used
for very much, so you had to be using ncpfs to ever hit it.
I think I should just remove the BUG_ON (the function really has just
one user), but let's run with it fixed for a while before getting rid of
it entirely.
Reported-and-tested-by: Bongani Hlope <bonganih@bankservafrica.com> Reported-and-tested-by: Bernd Feige <bernd.feige@uniklinik-freiburg.de> Cc: Petr Vandrovec <petr@vandrovec.name>, Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
block: only force kblockd unplugging from the schedule() path
For the explicit unplugging, we'd prefer to kick things off
immediately and not pay the penalty of the latency to switch
to kblockd. So let blk_finish_plug() do the run inline, while
the implicit-on-schedule-out unplug will punt to kblockd.
It's a bit of a mess currently. task->plug is being cleared
and reset in __blk_finish_plug(), and blk_finish_plug() is
testing for a NULL plug which cannot happen even from schedule()
anymore since it uses blk_needs_flush_plug() to determine
whether to call into this function at all.
Ben Hutchings [Thu, 14 Apr 2011 22:22:21 +0000 (15:22 -0700)]
mm/thp: use conventional format for boolean attributes
The conventional format for boolean attributes in sysfs is numeric ("0" or
"1" followed by new-line). Any boolean attribute can then be read and
written using a generic function. Using the strings "yes [no]", "[yes]
no" (read), "yes" and "no" (write) will frustrate this.
[akpm@linux-foundation.org: use kstrtoul()]
[akpm@linux-foundation.org: test_bit() doesn't return 1/0, per Neil] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Tested-by: David Rientjes <rientjes@google.com> Cc: NeilBrown <neilb@suse.de> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Thu, 14 Apr 2011 22:22:20 +0000 (15:22 -0700)]
ramfs: fix memleak on no-mmu arch
On no-mmu arch, there is a memleak during shmem test. The cause of this
memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2
which makes iput() can't free that pages.
The simple test file is like this:
int main(void)
{
int i;
key_t k = ftok("/etc", 42);
for ( i=0; i<100; ++i) {
int id = shmget(k, 10000, 0644|IPC_CREAT);
if (id == -1) {
printf("shmget error\n");
}
if(shmctl(id, IPC_RMID, NULL ) == -1) {
printf("shm rm error\n");
return -1;
}
}
printf("run ok...\n");
return 0;
}
And the result:
root:/> free
total used free shared buffers
Mem: 60320 17912 42408 0 0
-/+ buffers: 17912 42408
root:/> shmem
run ok...
root:/> free
total used free shared buffers
Mem: 60320 19096 41224 0 0
-/+ buffers: 19096 41224
root:/> shmem
run ok...
root:/> free
total used free shared buffers
Mem: 60320 20296 40024 0 0
-/+ buffers: 20296 40024
...
After this patch the test result is:(no memleak anymore)
root:/> free
total used free shared buffers
Mem: 60320 16668 43652 0 0
-/+ buffers: 16668 43652
root:/> shmem
run ok...
root:/> free
total used free shared buffers
Mem: 60320 16668 43652 0 0
-/+ buffers: 16668 43652
Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8a5ec0ba "Lockless (and preemptless) fastpaths for slub" makes use
of this_cpu_cmpxchg_double() which needs this_cpu_cmpxchg16b_emu() on
x86_64. Implementing cmpxchg16b emulation for UML would introduce too
much complexity. So just disable it.
Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Sergei Trofimovich <slyich@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1de1502c ("x86, um: now we can get rid of trivial uml headers")
removed accidentally bug.h which broke UML's call tracer and bug
handler.
Without asm-generic/bug.h UML uses BUG() from arch/x86/ which makes use
of ud2. UML cannot use ud2, it raises SIGILL in user mode. As UML has
a different stack for handling signals the call trace will be cut off.
Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Sergei Trofimovich <slyich@gmail.com> Tested-by: Sergei Trofimovich <slyich@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hans J. Koch [Thu, 14 Apr 2011 22:22:16 +0000 (15:22 -0700)]
MAINTAINERS: change mail adress of Hans J. Koch
My old mail address doesn't exist anymore. This patch changes all
occurences in MAINTAINERS to my new address.
Signed-off-by: Hans J. Koch <hjk@hansjkoch.de> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RapidIO/mpc85xx: fix possible mport registration problems
Fix a possible problem with mport registration left non-cleared after
fsl_rio_setup() exits on link error. Abort mport initialization if
registration failed.
This patch is applicable to 2.6.39-rc1 only. The problem does not exist
for earlier versions.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an almost-revert of commit 93b43fa ("oom: give the dying task a
higher priority").
That commit dramatically improved oom killer logic when a fork-bomb
occurs. But I've found that it has nasty corner case. Now cpu cgroup has
strange default RT runtime. It's 0! That said, if a process under cpu
cgroup promote RT scheduling class, the process never run at all.
If an admin inserts a !RT process into a cpu cgroup by setting
rtruntime=0, usually it runs perfectly because a !RT task isn't affected
by the rtruntime knob. But if it promotes an RT task via an explicit
setscheduler() syscall or an OOM, the task can't run at all. In short,
the oom killer doesn't work at all if admins are using cpu cgroup and don't
touch the rtruntime knob.
Eventually, kernel may hang up when oom kill occur. I and the original
author Luis agreed to disable this logic.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two years later, I've found obvious meaningless code fragment and
restored original intention by following commit.
2010 Jun 04; commit bb21c7ce; vmscan: fix do_try_to_free_pages()
return value when priority==0
But, the logic didn't works when 32bit highmem system goes hibernation
and Minchan slightly changed the algorithm and fixed it .
2010 Sep 22: commit d1908362: vmscan: check all_unreclaimable
in direct reclaim path
But, recently, Andrey Vagin found the new corner case. Look,
struct zone {
..
int all_unreclaimable;
..
unsigned long pages_scanned;
..
}
zone->all_unreclaimable and zone->pages_scanned are neigher atomic
variables nor protected by lock. Therefore zones can become a state of
zone->page_scanned=0 and zone->all_unreclaimable=1. In this case, current
all_unreclaimable() return false even though zone->all_unreclaimabe=1.
This resulted in the kernel hanging up when executing a loop of the form
as described in
http://www.gossamer-threads.com/lists/linux/kernel/1348725#1348725
Is this ignorable minor issue? No. Unfortunately, x86 has very small dma
zone and it become zone->all_unreclamble=1 easily. and if it become
all_unreclaimable=1, it never restore all_unreclaimable=0. Why? if
all_unreclaimable=1, vmscan only try DEF_PRIORITY reclaim and
a-few-lru-pages>>DEF_PRIORITY always makes 0. that mean no page scan at
all!
Eventually, oom-killer never works on such systems. That said, we can't
use zone->pages_scanned for this purpose. This patch restore
all_unreclaimable() use zone->all_unreclaimable as old. and in addition,
to add oom_killer_disabled check to avoid reintroduce the issue of commit d1908362 ("vmscan: check all_unreclaimable in direct reclaim path").
Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Ellerman [Thu, 14 Apr 2011 22:22:10 +0000 (15:22 -0700)]
mm: check that we have the right vma in __access_remote_vm()
In __access_remote_vm() we need to check that we have found the right
vma, not the following vma before we try to access it. Otherwise we
might call the vma's access routine with an address which does not fall
inside the vma.
It was discovered on a current kernel but with an unreleased driver,
from memory it was strace leading to a kernel bad access, but it
obviously depends on what the access implementation does.
Looking at other access implementations I only see:
The spufs one looks like it might behave badly given the wrong vma, it
assumes vma->vm_file->private_data is a spu_context, and looks like it
would probably blow up pretty quickly if it wasn't.
generic_access_phys() only uses the vma to check vm_flags and get the
mm, and then walks page tables using the address. So it should bail on
the vm_flags check, or at worst let you access some other VM_IO mapping.
And bin_access() just proxies to another access implementation.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5520e89 ("brk: fix min_brk lower bound computation for COMPAT_BRK")
tried to get the whole logic of brk randomization for legacy
(libc5-based) applications finally right.
It turns out that the way to detect whether brk has actually been
randomized in the end or not introduced by that patch still doesn't work
for those binaries, as reported by Geert:
: /sbin/init from my old m68k ramdisk exists prematurely.
:
: Before the patch:
:
: | brk(0x80005c8e) = 0x80006000
:
: After the patch:
:
: | brk(0x80005c8e) = 0x80005c8e
:
: Old libc5 considers brk() to have failed if the return value is not
: identical to the requested value.
I don't like it, but currently see no better option than a bit flag in
task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2
case.
I found it difficult to make sense of transparent huge pages without
having any counters for its actions. Add some counters to vmstat for
allocation of transparent hugepages and fallback to smaller pages.
Optional patch, but useful for development and understanding the system.
Contains improvements from Andrea Arcangeli and Johannes Weiner
Joe Perches [Thu, 14 Apr 2011 22:22:05 +0000 (15:22 -0700)]
MAINTAINERS: update various tty patterns
Commits 4a6514e6d0 ("tty: move obsolete and broken tty drivers to
drivers/staging/tty/") and a6afd9f3e8 ("tty: move a number of tty drivers
from drivers/char/ to drivers/tty/") moved files around.
Update patterns and orphan some files that were moved to staging.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexander Clouter <alex@digriz.org.uk> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CC [M] lib/test-kstrtox.o
lib/test-kstrtox.c: In function 'test_kstrtou64_ok':
lib/test-kstrtox.c:318: warning: this decimal constant is unsigned only in ISO C90
...
Antonio Ospite [Thu, 14 Apr 2011 22:21:59 +0000 (15:21 -0700)]
leds/leds-regulator.c: fix handling of already enabled regulators
Make the driver aware of the initial status of the regulator.
The leds-regulator driver was ignoring the initial status of the
regulator; this resulted in rdev->use_count being incremented to 2 after
calling regulator_led_set_value() in the .probe method when a regulator
was already enabled at insmod time, which made it impossible to ever
disable the regulator.
Signed-off-by: Antonio Ospite <ospite@studenti.unina.it> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Antonio Ospite <ospite@studenti.unina.it> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Liam Girdwood <lrg@slimlogic.co.uk> Cc: Daniel Ribeiro <drwyrm@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memory hotplug case involves calling to build_all_zonelists() which
in turns calls in to setup_zone_pageset(). The latter is marked
__meminit while build_all_zonelists() itself has no particular
annotation. build_all_zonelists() is only handed a non-NULL pointer in
the case of memory hotplug through an existing __meminit path, so the
setup_zone_pageset() reference is always safe.
The options as such are either to flag build_all_zonelists() as __ref (as
per __build_all_zonelists()), or to simply discard the __meminit
annotation from setup_zone_pageset().
Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Timo Warns [Thu, 14 Apr 2011 22:21:56 +0000 (15:21 -0700)]
fs/partitions/ldm.c: fix oops caused by corrupted partition table
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions.
A kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.
The patch validates the value of vblk_size.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Richard Russon <rich@flatcap.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Kiper [Thu, 14 Apr 2011 22:21:53 +0000 (15:21 -0700)]
mm: optimize pfn calculation in online_page()
If CONFIG_FLATMEM is enabled pfn is calculated in online_page() more than
once. It is possible to optimize that and use value established at
beginning of that function.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Thu, 14 Apr 2011 22:21:52 +0000 (15:21 -0700)]
memcg: fix mem_cgroup_rotate_reclaimable_page()
commit 3f58a8294333 ("move memcg reclaimable page into tail of inactive
list") added inline keyword twice in its prototype.
CC arch/x86/kernel/asm-offsets.s
In file included from include/linux/swap.h:8,
from include/linux/suspend.h:4,
from arch/x86/kernel/asm-offsets.c:12:
include/linux/memcontrol.h:220: error: duplicate `inline'
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>