Joe Carnuccio [Tue, 12 Jan 2010 21:02:46 +0000 (13:02 -0800)]
[SCSI] qla2xxx: Enhance EEH support and enable AER support.
qla2xxx: EEH added call to pci_restore_state.
qla2xxx: EEH added delay in slot reset routine.
qla2xxx: EEH moved call to pci_save_state(), see (1).
qla2xxx: EEH additional changes for RHEL5.5.
qla2xxx: EEH added function call, removed function call, see (2).
(1) In qla2xxx_probe_one the call to pci_save_state() has been
moved to after the call to qla2xxx_request_irqs().
(2) Add call to pci_disable_pcie_error_reporting() in remove_one.
Delete call to pci_cleanup_aer_uncorrect_error_status() in pci_resume.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Douglas Gilbert [Sun, 3 Jan 2010 18:51:15 +0000 (13:51 -0500)]
[SCSI] skip sense logging for some ATA PASS-THROUGH cdbs
Further to the lsml thread titled:
"does scsi_io_completion need to dump sense data for ata pass through (ck_cond =
1) ?"
This is a patch to skip logging when the sense data is
associated with a SENSE_KEY of "RECOVERED_ERROR" and the
additional sense code is "ATA PASS-THROUGH INFORMATION
AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH
commands when CK_COND=1 (in the cdb). It indicates that
the sense data contains ATA registers.
Smartmontools uses such commands on ATA disks connected via
SAT. Periodic checks such as those done by smartd cause
nuisance entries into logs that are:
- neither errors nor warnings
- pointless unless the cdb that caused them are also logged
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Kashyap, Desai [Wed, 16 Dec 2009 13:31:58 +0000 (19:01 +0530)]
[SCSI] mptfusion: Added sysfs expander manufacture information at the time of expander add.
Added new function mptsas_exp_manufacture_info, which will
obtain the REPORT_MANUFACTURING, and fill the details into the
sas_expander_device object when the expander port is created.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There is a 'ioprio' field in the BIO and the Request structure.
check this priority field and set MPI_SCSIIO_CONTROL_HEADOFQ
to pass down I/O priority.
An enhancement to the LSI Disk Array Controller firmware is being
developed to look at the Head Of Queue bit to allow I/Os with the HOQ bit
set to be processed before I/Os which do not have the HOQ bit set.
In order to set the HOQ bit, the mpt fusion driver needs to look at the
'ioprio' field in the request structure associated with the scsi command.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Finn Thain [Sat, 5 Dec 2009 01:30:42 +0000 (12:30 +1100)]
[SCSI] mac_esp: fix PIO mode, take 2
The mac_esp PIO algorithm no longer works in 2.6.31 and crashes my Centris
660av. So here's a better one.
Also, force async with esp_set_offset() rather than esp_slave_configure().
One of the SCSI drives I tested still doesn't like the PIO mode and fails
with "esp: esp0: Reconnect IRQ2 timeout" (the same drive works fine in
PDMA mode).
This failure happens when esp_reconnect_with_tag() tries to read in two
tag bytes but the chip only provides one (0x20). I don't know what causes
this. I decided not to waste any more time trying to fix it because the
best solution is to rip out the PIO mode altogether and use the DMA
engine.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Hannes Reinecke [Tue, 15 Dec 2009 08:26:06 +0000 (09:26 +0100)]
[SCSI] scsi_transport_fc: Remove capping from dev_loss_tmo
Currently dev_loss_tmo is capped by SCSI_DEVICE_BLOCK_MAX_TIMEOUT.
This causes problem with multipathing when the 'no_path_retry' setting
exceeds the dev_loss_tmo setting, as then the system might run into
a deadlock when all paths have been removed temporarily for longer
than dev_loss_tmo.
The principal reasons for the capping has been that we should
not allow a remote port to remain in status 'blocked' indefinitely,
so the capping is there to ensure that the port status is being reset
eventually.
However, the fast_io_fail_tmo will also move the remote port out of
the 'blocked' state, so for any HBA driver implementing both the
capping should really be on the fast_io_fail_tmo, and not on the
dev_loss_tmo.
This patch implements just that, ie the fast_io_fail_tmo is capped
to SCSI_DEVICE_BLOCK_TIMEOUT and the capping is removed from
dev_loss_tmo when fast_io_fail_tmo is set.
This allows us to synchronize the dev_loss_tmo setting to the
'no_path_retry' setting from multipathing thus avoiding the deadlock.
Signed-off-by: Hannes Reinecke <hare@suse.de> Acked-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Erik Ekman [Mon, 14 Dec 2009 20:21:56 +0000 (21:21 +0100)]
[SCSI] fusion: fix warning when not using procfs
Fixes the following warning:
drivers/message/fusion/mptbase.c:129: warning: 'mpt_proc_root_dir' defined but not used
also moves it from public data section since it is static.
Signed-off-by: Erik Ekman <erik@kryo.se> Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Bart Van Assche [Fri, 4 Dec 2009 19:43:37 +0000 (20:43 +0100)]
[SCSI] ibmvscsi: fix a typo in a source code comment
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Hannes Reinecke [Fri, 15 Jan 2010 12:07:34 +0000 (13:07 +0100)]
[SCSI] aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree
When removing several devices aic79xx will occasionally Oops
in ahd_handle_nonpkt_busfree during rescan. Looking at the
code I found that we're indeed not checking if the scb in
question is NULL. So check for it before accessing it.
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Swen Schillig [Thu, 14 Jan 2010 16:19:02 +0000 (17:19 +0100)]
[SCSI] zfcp: Set hardware timeout as requested by BSG request.
The hardware used with zfcp provides a timer for CT and ELS requests
instead of an abort capability for these commands. To correctly handle
the FC BSG timeouts, pass the timeout from the BSG requests to the
hardware.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Swen Schillig [Thu, 14 Jan 2010 16:19:01 +0000 (17:19 +0100)]
[SCSI] zfcp: Introduce bsg_timeout callback.
Introduce a zfcp callback for timeouts triggered from FC BSG. With
zfcp, the underlying hardware cannot abort CT or ELS requests, so
there is nothing to do when the block layer timeout expires. To avoid
interference with the block layer timeout, simply indicate that the
block layer timer should be reset. The timer running in the hardware
for the pending CT or ELS request will return the request when it
expires.
Swen Schillig [Thu, 14 Jan 2010 16:19:00 +0000 (17:19 +0100)]
[SCSI] scsi_transport_fc: Allow LLD to reset FC BSG timeout
The hardware used with zfcp cannot abort a currently pending CT or ELS
request. Therefore we need the option to postpone the timeout
triggered request abort within the fc layer, since there is nothing
zfcp can do to stop the request at this point.
Cc: James Smart <James.Smart@emulex.com> Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Christof Schmitt [Wed, 13 Jan 2010 16:52:37 +0000 (17:52 +0100)]
[SCSI] zfcp: Fix linebreak in hba trace
Advance the correct pointer when inserting the linebreak for the HBA
trace. It was missing in the output since the pointer to the output
buffer was never advanced, and the linebreak character was overwritten
later.
Christof Schmitt [Wed, 13 Jan 2010 16:52:36 +0000 (17:52 +0100)]
[SCSI] zfcp: Issue zfcp_fc_wka_port_put after FC CT BSG request
The patch "zfcp: Simplify handling of ct and els requests"
accidentally removed the call to zfcp_fc_wka_port_put for FC CT BSG
requests, thus not issuing a "close" request for the WKA ports.
Introduce a CT specific handler to first call zfcp_fc_wka_port_put and
then continue with the generic handler when returning from FC CT BSG
requests.
Harish Zunjarrao [Tue, 12 Jan 2010 20:59:50 +0000 (12:59 -0800)]
[SCSI] fc-transport: Use packed modifier for fc_bsg_request structure.
The 32bit kernel does not add padding bytes in the fc_bsg_request structure
whereas the 64bit kernel adds padding bytes in the fc_bsg_request structure.
Due to this, structure elements gets mismatched with 32bit application and
64bit kernel.To resolve this, used packed modifier to avoid adding padding bytes. Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Andrew Vasquez [Tue, 12 Jan 2010 20:59:48 +0000 (12:59 -0800)]
[SCSI] qla2xxx: Correct FCP2 recovery handling.
The driver did not account for non-tape devices needing to employ
proper FCP2 recovery. Driver now checks the FCP2-capable flag
only, rather than using a midlayer-determined flag (TYPE_TAPE).
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Boaz Harrosh [Tue, 15 Dec 2009 15:25:43 +0000 (17:25 +0200)]
[SCSI] scsi_lib: Fix bug in completion of bidi commands
Because of the terrible structuring of scsi-bidi-commands
it breaks some of the life time rules of a scsi-command.
It is now not allowed to free up the block-request before
cleanup and partial deallocation of the scsi-command. (Which
is not so for none bidi commands)
The right fix to this problem would be to make bidi command
a first citizen by allocating a scsi_sdb pointer at scsi command
just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated
as part of the get/put_command (Again like the prot_sdb) and the
current decoupling of scsi_cmnd and blk-request should be kept.
For now make sure scsi_release_buffers() is called before the
call to blk_end_request_all() which might cause the suicide of
the block requests. At best the leak of bidi buffers, at worse
a crash, as there is a race between the existence of the bidi_request
and the free of the associated bidi_sdb.
The reason this was never hit before is because only OSD has the potential
of doing asynchronous bidi commands. (So does bsg but it is never used)
And OSD clients just happen to do all their bidi commands synchronously, up
until recently.
CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] mptsas: Fix issue with chain pools allocation on katmai
Since commit 9d2e9d66a3f032667934144cd61c396ba49f090d
mptsas driver fails to allocate memory for the MPT chain buffers
for second LSI adapter on PPC440SPe Katmai platform:
...
ioc1: LSISAS1068E B3: Capabilities={Initiator}
mptbase: ioc1: ERROR - Unable to allocate Reply, Request, Chain Buffers!
mptbase: ioc1: ERROR - didn't initialize properly! (-3)
mptsas: probe of 0002:31:00.0 failed with error -3
This commit increased MPT_FC_CAN_QUEUE value but initChainBuffers()
doesn't differentiate between SAS and FC causing increased allocation
for SAS case, too. Later pci_alloc_consistent() fails to allocate
increased chain buffer pool size for SAS case.
Provide a fix by looking at the bus type and using appropriate
MPT_SAS_CAN_QUEUE value while calculation of the number of chain
buffers.
Signed-off-by: Anatolij Gustschin <agust@denx.de> Acked-by: Kashyap Desai <kashyap.desai@lsi.com> Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
[SCSI] aacraid: fix File System going into read-only mode
These particular problems were reported by Cisco and SAP and customers
as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and
SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build
to IBM and Cisco. Cisco and IBM tested it for more than 15 days and
they reported that they did not see the issue so far. Before the fix,
Cisco used to see the issue within 5 days. We generated a patch for
SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the
patch and gave a test build to SAP. SAP tested and reported that the
build is working properly.
We also tested in our lab using the tools "dishogsync", which is IO
stress tool and the tool was provided by Cisco.
Issue1: File System going into read-only mode
Root cause: The driver tends to not free the memory (FIB) when the
management request exits prematurely. The accumulation of such
un-freed memory causes the driver to fail to allocate anymore memory
(FIB) and hence return 0x70000 value to the upper layer, which puts
the file system into read only mode.
Fix details: The fix makes sure to free the memory (FIB) even if the
request exits prematurely hence ensuring the driver wouldn't run out
of memory (FIBs).
Issue2: False Raid Alert occurs
When the Physical Drives and Logical drives are reported as deleted or
added, even though there is no change done on the system
Root cause: Driver IOCTLs is signaled with EINTR while waiting on
response from the lower layers. Returning "EINTR" will never initiate
internal retry.
Fix details: The issue was fixed by replacing "EINTR" with
"ERESTARTSYS" for mid-layer retries.
Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
KOSAKI Motohiro [Sat, 16 Jan 2010 01:01:18 +0000 (17:01 -0800)]
page allocator: update NR_FREE_PAGES only when necessary
commit f2260e6b (page allocator: update NR_FREE_PAGES only as necessary)
made one minor regression. if __rmqueue() was failed, NR_FREE_PAGES stat
go wrong. this patch fixes it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reported-by: Huang Shijie <shijie8@gmail.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 16 Jan 2010 20:34:56 +0000 (12:34 -0800)]
Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c: Do not use device name after device_unregister
i2c/pca: Don't use *_interruptible
i2c-ali1563: Remove sparse warnings
i2c: Test off by one in {piix4,vt596}_transaction()
i2c-core: Storage class should be before const qualifier
Linus Torvalds [Sat, 16 Jan 2010 20:31:42 +0000 (12:31 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, uv: Ensure hub revision set for all ACPI modes.
x86, uv: Add function retrieving node controller revision number
x86: xen: 64-bit kernel RPL should be 0
x86: kernel_thread() -- initialize SS to a known state
x86/agp: Fix agp_amd64_init and agp_amd64_cleanup
x86: SGI UV: Fix mapping of MMIO registers
x86: mce.h: Fix warning in header checks
Linus Torvalds [Sat, 16 Jan 2010 20:27:47 +0000 (12:27 -0800)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Check if /dev/null can be used as the -o gcc argument
perf tools: Move QUIET_STDERR def to before first use
perf: Stop stack frame walking off kernel addresses boundaries
Linus Torvalds [Sat, 16 Jan 2010 20:27:25 +0000 (12:27 -0800)]
Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing/filters: Add comment for match callbacks
tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching
lib: Introduce strnstr()
tracing/filters: Fix MATCH_END_ONLY filter matching
tracing/filters: Fix MATCH_FRONT_ONLY filter matching
ftrace: Fix MATCH_END_ONLY function filter
tracing/x86: Derive arch from bits argument in recordmcount.pl
ring-buffer: Add rb_list_head() wrapper around new reader page next field
ring-buffer: Wrap a list.next reference with rb_list_head()
Mark Brown [Sat, 16 Jan 2010 01:01:40 +0000 (17:01 -0800)]
revert "drivers/video/s3c-fb.c: fix clock setting for Samsung SoC Framebuffer"
Fix divide by zero and broken output. Commit 600ce1a0fa ("fix clock
setting for Samsung SoC Framebuffer") introduced a mandatory refresh
parameter to the platform data for the S3C framebuffer but did not
introduce any validation code, causing existing platforms (none of which
have refresh set) to divide by zero whenever the framebuffer is
configured, generating warnings and unusable output.
Ben Dooks noted several problems with the patch:
- The platform data supplies the pixclk directly and should already
have taken care of the refresh rate.
- The addition of a window ID parameter doesn't help since only the
root framebuffer can control the pixclk.
- pixclk is specified in picoseconds (rather than Hz) as the patch
assumed.
and suggests reverting the commit so do that. Without fixing this no
mainline user of the driver will produce output.
[akpm@linux-foundation.org: don't revert the correct bit] Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: InKi Dae <inki.dae@samsung.com> Cc: Kyungmin Park <kmpark@infradead.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Ben Dooks <ben-linux@fluff.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Sat, 16 Jan 2010 01:01:39 +0000 (17:01 -0800)]
nommu: fix shared mmap after truncate shrinkage problems
Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen
over the end of a truncation. The problem is that
ramfs_nommu_check_mappings() checks that the reduced file size against the
VMA tree, but not the vm_region tree.
The following sequence of events can cause the problem:
Mapping 'a' creates a vm_region covering 32KB of the file. Mapping 'b'
sees that the vm_region from 'a' is covering the region it wants and so
shares it, pinning it in memory.
Mapping 'a' then goes away and the file is truncated to the end of VMA
'b'. However, the region allocated by 'a' is still in effect, and has
_not_ been reduced.
Mapping 'c' is then created, and because there's a vm_region covering the
desired region, get_unmapped_area() is _not_ called to repeat the check,
and the mapping is granted, even though the pages from the latter half of
the mapping have been discarded.
However:
d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
Mapping 'd' should work, and should end up sharing the region allocated by
'a'.
To deal with this, we shrink the vm_region struct during the truncation,
lest do_mmap_pgoff() take it as licence to share the full region
automatically without calling the get_unmapped_area() file op again.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Sat, 16 Jan 2010 01:01:36 +0000 (17:01 -0800)]
nommu: fix race between ramfs truncation and shared mmap
Fix the race between the truncation of a ramfs file and an attempt to make
a shared mmap of region of that file.
The problem is that do_mmap_pgoff() calls f_op->get_unmapped_area() to
verify that the file region is made of contiguous pages and to find its
base address - but there isn't any locking to guarantee this region until
vma_prio_tree_insert() is called by add_vma_to_mm().
Note that moving the functionality into f_op->mmap() doesn't help as that
is also called before vma_prio_tree_insert().
Instead make ramfs_nommu_check_mappings() grab nommu_region_sem whilst it
does its checks. This means that this function will wait whilst mmaps
take place.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Sat, 16 Jan 2010 01:01:32 +0000 (17:01 -0800)]
nommu: fix SYSV SHM for NOMMU
Commit c4caa778157dbbf04116f0ac2111e389b5cd7a29 ("file
->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()")
broke SYSV SHM for NOMMU by taking away the pointer to
shm_get_unmapped_area() from shm_file_operations.
Put it back conditionally on CONFIG_MMU=n.
file->f_ops->get_unmapped_area() is used to find out the base address for a
mapping of a mappable chardev device or mappable memory-based file (such as a
ramfs file). It needs to be called prior to file->f_ops->mmap() being called.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to prototype mismatch, print_block_size() will sprintf() into
*attribute instead of *buf, hence user space will read the initial
zeros from *buf:
$ hexdump /sys/devices/system/memory/block_size_bytes 0000000 0000 0000 0000 0000 0000008
After patch:
cat /sys/devices/system/memory/block_size_bytes
0x8000000
Current mem_cgroup_force_empty() only ensures mem->res.usage == 0 on
success. But this doesn't guarantee memcg's LRU is really empty, because
there are some cases in which !PageCgrupUsed pages exist on memcg's LRU.
For example:
- Pages can be uncharged by its owner process while they are on LRU.
- race between mem_cgroup_add_lru_list() and __mem_cgroup_uncharge_common().
So there can be a case in which the usage is zero but some of the LRUs are not empty.
OTOH, mem_cgroup_del_lru_list(), which can be called asynchronously with
rmdir, accesses the mem_cgroup, so this access can cause a problem if it
races with rmdir because the mem_cgroup might have been freed by rmdir.
Actually, I saw a bug which seems to be caused by this race.
The problem here is pages on LRU may contain pointer to stale memcg. To
make res->usage to be 0, all pages on memcg must be uncharged or moved to
another(parent) memcg. Moved page_cgroup have already removed from
original LRU, but uncharged page_cgroup contains pointer to memcg withou
PCG_USED bit. (This asynchronous LRU work is for improving performance.)
If PCG_USED bit is not set, page_cgroup will never be added to memcg's
LRU. So, about pages not on LRU, they never access stale pointer. Then,
what we have to take care of is page_cgroup _on_ LRU list. This patch
fixes this problem by making mem_cgroup_force_empty() visit all LRUs
before exiting its loop and guarantee there are no pages on its LRU.
Jeff Mahoney [Sat, 16 Jan 2010 01:01:26 +0000 (17:01 -0800)]
virtio: fix section mismatch warnings
Fix fixes the following warnings by renaming the driver structures to be
suffixed with _driver.
WARNING: drivers/virtio/virtio_balloon.o(.data+0x88): Section mismatch in reference from the variable virtio_balloon to the function .devexit.text:virtballoon_remove()
WARNING: drivers/char/hw_random/virtio-rng.o(.data+0x88): Section mismatch in reference from the variable virtio_rng to the function .devexit.text:virtrng_remove()
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Sat, 16 Jan 2010 01:01:25 +0000 (17:01 -0800)]
vmscan: kswapd: don't retry balance_pgdat() if all zones are unreclaimable
Commit f50de2d3 (vmscan: have kswapd sleep for a short interval and double
check it should be asleep) can cause kswapd to enter an infinite loop if
running on a single-CPU system. If all zones are unreclaimble,
sleeping_prematurely return 1 and kswapd will call balance_pgdat() again.
but it's totally meaningless, balance_pgdat() doesn't anything against
unreclaimable zone!
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reported-by: Will Newton <will.newton@gmail.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Will Newton <will.newton@gmail.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David John [Sat, 16 Jan 2010 01:01:23 +0000 (17:01 -0800)]
smp_call_function_any(): pass the node value to cpumask_of_node()
The change in acpi_cpufreq to use smp_call_function_any causes a warning
when it is called since the function erroneously passes the cpu id to
cpumask_of_node rather than the node that the cpu is on. Fix this.
Roland Dreier [Sat, 16 Jan 2010 01:01:22 +0000 (17:01 -0800)]
kernel.h: add BUILD_BUG_ON_NOT_POWER_OF_2()
Add BUILD_BUG_ON_NOT_POWER_OF_2()
When code relies on a constant being a power of 2:
#define FOO 512 /* must be a power of 2 */
it would be nice to be able to do:
BUILD_BUG_ON(!is_power_of_2(FOO));
However applying an inline function does not result in a compile-time
constant that can be used with BUILD_BUG_ON(), so trying that gives
results in:
error: bit-field '<anonymous>' width not an integer constant
As suggested by akpm, rather than monkeying around with is_power_of_2()
and risking gcc warts about constant expressions, just create a macro
BUILD_BUG_ON_NOT_POWER_OF_2() to encapsulate this common requirement.
Signed-off-by: Roland Dreier <rolandd@cisco.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: David Dillow <dave@thedillows.org> Cc: "Robert P. J. Day" <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/page_alloc: fix the range check for backward merging
The current check for 'backward merging' within add_active_range() does
not seem correct. start_pfn must be compared against
early_node_map[i].start_pfn (and NOT against .end_pfn) to find out whether
the new region is backward-mergeable with the existing range.
Signed-off-by: Kazuhisa Ichikawa <ki@epsilou.com> Acked-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andi Kleen [Sat, 16 Jan 2010 01:01:17 +0000 (17:01 -0800)]
kfifo: document everywhere that size has to be power of two
On my first try using them I missed that the fifos need to be power of
two, resulting in a runtime bug. Document that requirement everywhere
(and fix one grammar bug)
Andi Kleen [Sat, 16 Jan 2010 01:01:15 +0000 (17:01 -0800)]
kfifo: sanitize *_user error handling
Right now for kfifo_*_user it's not easily possible to distingush between
a user copy failing and the FIFO not containing enough data. The problem
is that both conditions are multiplexed into the same return code.
Avoid this by moving the "copy length" into a separate output parameter
and only return 0/-EFAULT in the main return value.
I didn't fully adapt the weird "record" variants, those seem
to be unused anyways and were rather messy (should they be just removed?)
I would appreciate some double checking if I did all the conversions
correctly.
Andi Kleen [Sat, 16 Jan 2010 01:01:12 +0000 (17:01 -0800)]
kfifo: use void * pointers for user buffers
The pointers to user buffers are currently unsigned char *, which requires
a lot of casting in the caller for any non-char typed buffers. Use void *
instead.
Randy Dunlap [Sat, 16 Jan 2010 01:01:11 +0000 (17:01 -0800)]
tty.h: make tty_port_get() static inline
I get a few dozen of these warnings when using
gcc (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2):
In file included from mmotm-2010-0113-1217/init/do_mounts.c:5:
mmotm-2010-0113-1217/include/linux/tty.h: In function 'tty_port_get':
mmotm-2010-0113-1217/include/linux/tty.h:469: warning: '______f' is static but declared in inline function 'tty_port_get' which is not static
so make the function static inline.
[akpm@linux-foundation.org: may as well convert tty_port_users() also] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tamas Vincze [Sat, 16 Jan 2010 01:01:10 +0000 (17:01 -0800)]
edac: i5000_edac critical fix panic out of bounds
EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
This happens because FERR_NF_FBD bit 28 is not updated on i5000. Due to
that, both bits 28 and 29 may be equal to one, returning channel = 3. As
this value is invalid, EDAC core generates the panic.
john stultz [Sat, 16 Jan 2010 01:01:09 +0000 (17:01 -0800)]
m68knommu: fix invalid flags on coldfire pit clocksource
The m68knommu coldfire pit clocksource looks like it was incorrectly
marked as a continuous clocksource. Running with it marked as a
continuous clocksource could cause hangs when the system switches to
highres mode or enables nohz.
This patch removes the CLOCK_SOURCE_IS_CONTINUOUS flag on the coldfire pit
clocksource. This will disallow systems using this clocksource from
entering oneshot mode (disabling highres timers and nohz).
Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Greg Ungerer <gerg@snapgear.com> Cc: Steven King <sfking@fdwdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hui Zhu [Sat, 16 Jan 2010 01:01:07 +0000 (17:01 -0800)]
markup_oops.pl: fix error with x86
When I try to use markup_oops.pl in x86, I always get:
cat 1 | perl markup_oops.pl ./vmlinux
objdump: --start-address: bad number: NaN
No matching code found
This is because in line:
if ($line =~ /EIP is at ([a-zA-Z0-9\_]+)\+0x([0-9a-f]+)\/[a-f0-9]/) {
$function = $1;
$func_offset = $2;
}
$func_offset will get a number like "0x2"
But in follow code:
my $decodestart = Math::BigInt->from_hex("0x$target") -
Math::BigInt->from_hex("0x$func_offset");
It add other ox to ox2. Then this value will be set to NaN.
So I made a small patch to fix it.
Signed-off-by: Hui Zhu <teawater@gmail.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Erik-Jan Post [Sat, 16 Jan 2010 01:01:06 +0000 (17:01 -0800)]
viafb: fix acceleration for some chips
Fix a regression in hardware acceleration which made the accelerated
framebuffer unusable on some chips. These need extra initialization and
an extra flag which is no longer needed/available on current chips.
Signed-off-by: Erik-Jan Post <ej.lfs@xs4all.nl> Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Scott Fang <ScottFang@viatech.com.cn> Cc: Joseph Chan <JosephChan@via.com.tw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Although I'd consider this a hardware bug, as there is hardware out that
for whatever reason does not support hardware cursors on LCD output we
have to care about it in the driver. This fixes a regression (invisible
cursor) introduced by:
viafb: cleanup viafb_cursor
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Reported-by: Julian Wollrath <jwollrath@web.de> Tested-by: Julian Wollrath <jwollrath@web.de> Cc: Scott Fang <ScottFang@viatech.com.cn> Cc: Joseph Chan <JosephChan@via.com.tw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Márton Németh [Sat, 16 Jan 2010 19:43:13 +0000 (20:43 +0100)]
i2c-ali1563: Remove sparse warnings
Remove the following sparse warnings (see "make C=1"):
* drivers/i2c/busses/i2c-ali1563.c:91:3: warning: do-while statement
is not a compound statement
* drivers/i2c/busses/i2c-ali1563.c:161:3: warning: do-while statement
is not a compound statement
Signed-off-by: Márton Németh <nm127@freemail.hu> Signed-off-by: Jean Delvare <khali@linux-fr.org>
Linus Torvalds [Sat, 16 Jan 2010 18:44:38 +0000 (10:44 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: enable 36bit physical address for hardware status page
drm/i915: fix eDP pipe mask
drm/i915: fix pixel color depth setting on eDP
drm/i915: parse eDP panel color depth from VBT block
drm/i915: disable LVDS downclock by default
drm/i915: Fix the incorrect cursor A bit definition in DSPFW2 register
drm/i915: Remove chatty execbuf failure message.
drm/i915: remove loop in Ironlake interrupt handler
drm/i915: Don't wait interruptible for possible plane buffer flush
drm/i915: try another possible DDC bus for the SDVO device with multiple outputs
drm/i915: Read the response after issuing DDC bus switch command
drm/i915: Don't use the child device parsed from VBT to setup HDMI/DP
drm/i915: Fix resume regression on MSI Wind U100 w/o KMS
drm/i915: Fix Ironlake M/N/P ranges to match the spec
drm/i915: Use find_pll function to calculate DPLL setting for LVDS downclock
drm/i915: Add HP nx9020/SamsungSX20S to ACPI LID quirk list
drm/i915: disable TV hotplug status check
Trivial conflicts in drivers/gpu/drm/i915/i915_drv.c due to i915
non-modeset suspend fix with different comment.
Linus Torvalds [Fri, 15 Jan 2010 22:53:24 +0000 (14:53 -0800)]
Merge branch 'for-linus/samsung' of git://git.fluff.org/bjdooks/linux
* 'for-linus/samsung' of git://git.fluff.org/bjdooks/linux:
ARM: MINI2440: Fixup __initdata usage
ARM: MINI2440: Fix crash on boot due to improper __initdata qualifier
ARM: SMDK6410: Specify no GPIO for B_PWR_5V regulator
ARM: S3C: NAND: Check the existence of nr_map before copying
Linus Torvalds [Fri, 15 Jan 2010 22:51:57 +0000 (14:51 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: sentelic - fix left/right horizontal scroll mapping
Input: pmouse - move Sentelic probe down the list
Input: add compat support for sysfs and /proc capabilities output
Input: i8042 - add Dritek quirk for Acer Aspire 5610.
Input: xbox - do not use GFP_KERNEL under spinlock
Input: psmouse - fix Synaptics detection when protocol is disabled
Input: bcm5974 - report ABS_MT events
Input: davinci_keyscan - add device_enable method to platform data
Input: evdev - be less aggressive about sending SIGIO notifies
Input: atkbd - fix canceling event_work in disconnect
Input: serio - fix potential deadlock when unbinding drivers
Input: gf2k - fix &&/|| confusion in gf2k_connect()
Linus Torvalds [Fri, 15 Jan 2010 22:51:39 +0000 (14:51 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6:
alpha: cpumask_of_node() should handle -1 as a node
alpha: add myself as a maintainer, and drop mention of 2.4
Eric Paris [Fri, 15 Jan 2010 17:12:25 +0000 (12:12 -0500)]
inotify: only warn once for inotify problems
inotify will WARN() if it finds that the idr and the fsnotify internals
somehow got out of sync. It was only supposed to do this once but due
to this stupid bug it would warn every single time a problem was
detected.
Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Fri, 15 Jan 2010 17:12:24 +0000 (12:12 -0500)]
inotify: do not reuse watch descriptors
Since commit 7e790dd5fc937bc8d2400c30a05e32a9e9eef276 ("inotify: fix
error paths in inotify_update_watch") inotify changed the manor in which
it gave watch descriptors back to userspace. Previous to this commit
inotify acted like the following:
inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 2
but after this patch inotify would return watch descriptors like so:
inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 1
which I saw as equivalent to opening an fd where
open(file) = 1;
close(1);
open(file) = 1;
seemed perfectly reasonable. The issue is that quite a bit of userspace
apparently relies on the behavior in which watch descriptors will not be
quickly reused. KDE relies on it, I know some selinux packages rely on
it, and I have heard complaints from other random sources such as debian
bug 558981.
Although the man page implies what we do is ok, we broke userspace so
this patch almost reverts us to the old behavior. It is still slightly
racey and I have patches that would fix that, but they are rather large
and this will fix it for all real world cases. The race is as follows:
- task1 creates a watch and blocks in idr_new_watch() before it updates
the hint.
- task2 creates a watch and updates the hint.
- task1 updates the hint with it's older wd
- task removes the watch created by task2
- task adds a new watch and will reuse the wd originally given to task2
it requires moving some locking around the hint (last_wd) but this should
solve it for the real world and be -stable safe.
As a side effect this patch papers over a bug in the lib/idr code which
is causing a large number WARN's to pop on people's system and many
reports in kerneloops.org. I'm working on the root cause of that idr
bug seperately but this should make inotify immune to that issue.
Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>