block: Turn bvec_k{un,}map_irq() into static inline functions
Convert bvec_k{un,}map_irq() from macros to static inline functions if
!CONFIG_HIGHMEM, so we can easier detect mistakes like the one fixed in 93055c31045a2d5599ec613a0c6cdcefc481a460 ("ps3disk: passing wrong variable =
to
bvec_kunmap_irq()")
Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
The detailed root cause is as follows.
Assuming that there are two partition, sda1 and sda2.
1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
is 0 and sda2's one is 1.
2. A bio belongs to sda1 is issued and is merged into the request mentioned on
step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
from sda2 region to sda1 region. However the two partition's
hd_struct->in_flight are not changed.
The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.
When reloading partition tables, quiesce IO to ensure that no
request references to the partition struct exists. When it is safe
to free the partition table, the IO for that device is restarted
again.
block: Make the integrity mapped property a bio flag
Previously we tracked whether the integrity metadata had been remapped
using a request flag. This was fine for low-level retries. However, if
an I/O was redriven by upper layers we would end up remapping again,
causing the retry to fail.
Deprecate the REQ_INTEGRITY flag and introduce BIO_MAPPED_INTEGRITY
which enables filesystems to notify lower layers that the bio in
question has already been remapped.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Physical block size was declared unsigned int to accomodate the maximum
size reported by READ CAPACITY(16). Make sure we use the right type in
the related functions.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Vivek Goyal [Fri, 1 Oct 2010 19:16:42 +0000 (21:16 +0200)]
blkio-throttle: Fix possible multiplication overflow in iops calculations
o User can specify max iops value of 32bit (UINT_MAX), through cgroup
interface. If a user has specified say 4294967294 (UNIT_MAX - 2), then
on 32bit platform, following multiplication can overflow.
io_allowed = (tg->iops[rw] * jiffy_elapsed_rnd)
o Explicitly cast the multiplication to 64bit and then perform division and
then check whether result is still great then UNINT_MAX.
Vivek Goyal [Fri, 1 Oct 2010 12:51:14 +0000 (14:51 +0200)]
blkio-throttle: Fix link failure failure on i386
o Randy Dunlap reported following linux-next failure. This patch fixes it.
on i386:
blk-throttle.c:(.text+0x1abb8): undefined reference to `__udivdi3'
blk-throttle.c:(.text+0x1b1dc): undefined reference to `__udivdi3'
o bytes_per_second interface is 64bit and I was continuing to do 64 bit
division even on 32bit platform without help of special macros/functions
hence the failure.
Vivek Goyal [Fri, 1 Oct 2010 12:49:49 +0000 (14:49 +0200)]
blkio: Recalculate the throttled bio dispatch time upon throttle limit change
o Currently any cgroup throttle limit changes are processed asynchronousy and
the change does not take affect till a new bio is dispatched from same group.
o It might happen that a user sets a redicuously low limit on throttling.
Say 1 bytes per second on reads. In such cases simple operations like mount
a disk can wait for a very long time.
o Once bio is throttled, there is no easy way to come out of that wait even if
user increases the read limit later.
o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
the bio dispatch time according to new limits.
o Can't take queueu lock under blkcg_lock, hence after the change I wake
up the dispatch thread again which recalculates the time. So there are some
variables being synchronized across two threads without lock and I had to
make use of barriers. Hoping I have used barriers correctly. Any review of
memory barrier code especially will help.
Vivek Goyal [Fri, 1 Oct 2010 12:49:48 +0000 (14:49 +0200)]
blkio: Add root group to td->tg_list
o Currently all the dynamically allocated groups, except root grp is added
to td->tg_list. This was not a problem so far but in next patch I will
travel through td->tg_list to process any updates of limits on the group.
If root group is not in tg_list, then root group's updates are not
processed.
o It is better to root group also to tg_list instead of doing special
processing for it during limit updates.
Vivek Goyal [Fri, 1 Oct 2010 12:49:44 +0000 (14:49 +0200)]
blkio: deletion of a cgroup was causes oops
o Now a cgroup list of blkg elements can contain blkg from multiple policies.
Before sending an unlink event, make sure blkg belongs to they policy. If
policy does not own the blkg, do not send update for this blkg.
Vivek Goyal [Fri, 1 Oct 2010 12:49:41 +0000 (14:49 +0200)]
blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
Currently throttling related files were visible even if user had disabled
throttling using config options. It was switching off background throttling
of bio but not the cgroup files. This patch fixes it.
Malahal Naineni [Fri, 1 Oct 2010 12:45:27 +0000 (14:45 +0200)]
block: set the bounce_pfn to the actual DMA limit rather than to max memory
The bounce_pfn of the request queue in 64 bit systems is set to the
current max_low_pfn. Adding more memory later makes this incorrect.
Memory allocated beyond this boot time max_low_pfn appear to require
bounce buffers (bounce buffers are actually not allocated but used in
calculating segments that may result in "over max segments limit"
errors).
Mark Lord [Sat, 25 Sep 2010 09:17:22 +0000 (11:17 +0200)]
Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
Ensure that 'sysctl_hung_task_timeout_secs' is defined
even when CONFIG_DETECT_HUNG_TASK is not set.
This way we can safely reference it without need for
ifdefs in the code elsewhere. eg. in block/blk-exec.c
Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
block: set the bounce_pfn to the actual DMA limit rather than to max memory
The bounce_pfn of the request queue in 64 bit systems is set to the
current max_low_pfn. Adding more memory later makes this incorrect.
Memory allocated beyond this boot time max_low_pfn appear to require
bounce buffers (bounce buffers are actually not allocated but used in
calculating segments that may result in "over max segments limit"
errors).
Mark Lord [Fri, 24 Sep 2010 13:51:13 +0000 (09:51 -0400)]
block: Prevent hang_check firing during long I/O
During long I/O operations, the hang_check timer may fire,
trigger stack dumps that unnecessarily alarm the user.
Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to complete
So, if hang_check is armed, we should wake up periodically
to prevent it from triggering. This patch uses a wake-up interval
equal to half the hang_check timer period, which keeps overhead low enough.
Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Fsync performance for small files achieved by cfq on high-end disks is
lower than what deadline can achieve, due to idling introduced between
the sync write happening in process context and the journal commit.
Moreover, when competing with a sequential reader, a process writing
small files and fsync-ing them is starved.
This patch fixes the two problems by:
- marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
flag set,
- force all queues that have REQ_NOIDLE requests to be put in the noidle
tree.
Having the queue associated to the fsync-ing process and the one associated
to journal commits in the noidle tree allows:
- switching between them without idling,
- fairness vs. competing idling queues, since they will be serviced only
after the noidle tree expires its slice.
Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
init/do_mounts.c:71: error: implicit declaration of function 'dev_to_part'
init/do_mounts.c:71: warning: initialization makes pointer from integer without a cast
init/do_mounts.c:73: error: dereferencing pointer to incomplete type
init/do_mounts.c:76: error: dereferencing pointer to incomplete type
init/do_mounts.c:76: error: dereferencing pointer to incomplete type
init/do_mounts.c:102: error: implicit declaration of function 'part_pack_uuid'
init/do_mounts.c:104: error: 'block_class' undeclared (first use in this function)
Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
When a new disk is being discovered, add_disk() first ties the bdev to gendisk
(via register_disk()->blkdev_get()) and only after that calls
bdi_register_bdev(). Because register_disk() also creates disk's kobject, it
can happen that userspace manages to open and modify the device's data (or
inode) before its BDI is properly initialized leading to a warning in
__mark_inode_dirty().
Fix the problem by registering BDI early enough.
This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=16312
Cc: stable@kernel.org Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
o Actual implementation of throttling policy in block layer. Currently it
implements READ and WRITE bytes per second throttling logic. IOPS throttling
comes in later patches.
blk-cgroup: Prepare the base for supporting more than one IO control policies
o This patch prepares the base for introducing new IO control policies.
Currently all the code is written knowing there is only one policy
and that is proportional bandwidth. Creating infrastructure for newer
policies to come in.
o Also there were many functions which were generated using macro. It was
very confusing. Got rid of those.
blk-cgroup: Kill the header printed at the start of blkio.weight_device file
o Kill extra "dev weight" header which is printed when somebody reads
blkio.weight_device file. This really seems to be out of convention. No other
blkio files are printing any header at the start of file. I think it is ok
to just print values and how to interpret values should be part of
documentation.
core: match_dev_by_uuid() should not be marked __init
It is also called outside the scope of init functions. Stephen
reports:
WARNING: init/mounts.o(.text+0x21a): Section mismatch in reference from the function name_to_dev_t() to the function .init.text:match_dev_by_uuid()
The function name_to_dev_t() references
the function __init match_dev_by_uuid().
This is often because name_to_dev_t lacks a __init
annotation or the annotation of match_dev_by_uuid is wrong.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Namhyung Kim [Thu, 16 Sep 2010 03:55:57 +0000 (12:55 +0900)]
sg: fix a warning in blk_rq_aligned() call
2nd argument of blk_rq_aligned() has changed to 'unsigned long' by
the previous commit 'block: fix an address space warning in blk-map.c'.
That commit neglected to update a user of that function.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Will Drewry [Tue, 31 Aug 2010 20:47:07 +0000 (15:47 -0500)]
init: add support for root devices specified by partition UUID
This is the third patch in a series which adds support for
storing partition metadata, optionally, off of the hd_struct.
One major use for that data is being able to resolve partition
by other identities than just the index on a block device. Device
enumeration varies by platform and there's a benefit to being able
to use something like EFI GPT's GUIDs to determine the correct
block device and partition to mount as the root.
This change adds that support to root= by adding support for
the following syntax:
root=PARTUUID=hex-uuid
Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Will Drewry [Tue, 31 Aug 2010 20:47:05 +0000 (15:47 -0500)]
block, partition: add partition_meta_info to hd_struct
I'm reposting this patch series as v4 since there have been no additional
comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
are against Linus's tree: 2bfc96a127bc1cc94d26bfaa40159966064f9c8c
(2.6.36-rc3).
Would this patchset be suitable for inclusion in an mm branch?
This changes adds a partition_meta_info struct which itself contains a
union of structures that provide partition table specific metadata.
This change leaves the union empty. The subsequent patch includes an
implementation for CONFIG_EFI_PARTITION-based metadata.
Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Namhyung Kim [Wed, 15 Sep 2010 11:08:27 +0000 (13:08 +0200)]
block: fix an address space warning in blk-map.c
Change type of 2nd parameter of blk_rq_aligned() into unsigned long
and remove unnecessary casting. Now we can call it with 'uaddr'
instead of 'ubuf' in __blk_rq_map_user() so that it can remove
following warnings from sparse:
block/blk-map.c:57:31: warning: incorrect type in argument 2 (different address spaces)
block/blk-map.c:57:31: expected void *addr
block/blk-map.c:57:31: got void [noderef] <asn:1>*ubuf
However blk_rq_map_kern() needs one more local variable to handle it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
zfcp: Report scatter gather limit for DIX protection information
When sending DIX integrity segments with an I/O request, the
restriction for the maximum number of segments is still the same for
the zfcp hardware. Report the new sg_prot_tablesize for the SCSI host,
so that the number of integrity segments plus the number of data
segments is not larger than the hardware limit. This results in using
half of the hardware segments for integrity data and the other half
for regular data.
block/scsi: Provide a limit on the number of integrity segments
Some controllers have a hardware limit on the number of protection
information scatter-gather list segments they can handle.
Introduce a max_integrity_segments limit in the block layer and provide
a new scsi_host_template setting that allows HBA drivers to provide a
value suitable for the hardware.
Add support for honoring the integrity segment limit when merging both
bios and requests.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
Linus Torvalds [Sun, 22 Aug 2010 18:27:36 +0000 (11:27 -0700)]
Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PIT: free irq source id in handling error path
KVM: destroy workqueue on kvm_create_pit() failures
KVM: fix poison overwritten caused by using wrong xstate size
Linus Torvalds [Sun, 22 Aug 2010 18:03:27 +0000 (11:03 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (58 commits)
drm/i915,intel_agp: Add support for Sandybridge D0
drm/i915: fix render pipe control notify on sandybridge
agp/intel: set 40-bit dma mask on Sandybridge
drm/i915: Remove the conflicting BUG_ON()
drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/
drm/i915/suspend: Flush register writes before busy-waiting.
i915: disable DAC on Ironlake also when doing CRT load detection.
drm/i915: wait for actual vblank, not just 20ms
drm/i915: make sure eDP PLL is enabled at the right time
drm/i915: fix VGA plane disable for Ironlake+
drm/i915: eDP mode set sequence corrections
drm/i915: add panel reset workaround
drm/i915: Enable RC6 on Ironlake.
drm/i915/sdvo: Only set is_lvds if we have a valid fixed mode.
drm/i915: Set up a render context on Ironlake
drm/i915 invalidate indirect state pointers at end of ring exec
drm/i915: Wake-up wait_request() from elapsed hang-check (v2)
drm/i915: Apply i830 errata for cursor alignment
drm/i915: Only update i845/i865 CURBASE when disabled (v2)
drm/i915: FBC is updated within set_base() so remove second call in mode_set()
...
Chris Wilson [Sun, 15 Aug 2010 09:52:34 +0000 (10:52 +0100)]
drm/i915: Remove the conflicting BUG_ON()
We now attempt to free "active" objects following a GPU hang as either
the GPU will be reset or the hang is permenant. In either case, the GPU
writes will not be flushed to main memory and it should be safe to
return that memory back to the system.
The BUG_ON(active) is thus overkill and can erroneously fire after a
EIO.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>
Jesse Barnes [Wed, 18 Aug 2010 20:20:54 +0000 (13:20 -0700)]
drm/i915: wait for actual vblank, not just 20ms
Waiting for a hard coded 20ms isn't always enough to make sure a vblank
period has actually occurred, so add code to make sure we really have
passed through a vblank period (or that the pipe is off when disabling).
This prevents problems with mode setting and link training, and seems to
fix a bug like https://bugs.freedesktop.org/show_bug.cgi?id=29278, but
on an HP 8440p instead. Hopefully also fixes
https://bugs.freedesktop.org/show_bug.cgi?id=29141.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net>
Arjan van de Ven [Sat, 21 Aug 2010 20:07:26 +0000 (13:07 -0700)]
workqueue: Add basic tracepoints to track workqueue execution
With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.
This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.
With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):
Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Sat, 21 Aug 2010 19:32:41 +0000 (21:32 +0200)]
Replace Configure with Enable in description of MAXSMP
The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes. "Enable" should be
unambiguous enough.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Aug 2010 23:49:40 +0000 (16:49 -0700)]
mm: make stack guard page logic use vm_prev pointer
Like the mlock() change previously, this makes the stack guard check
code use vma->vm_prev to see what the mapping below the current stack
is, rather than have to look it up with find_vma().
Also, accept an abutting stack segment, since that happens naturally if
you split the stack with mlock or mprotect.
Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Aug 2010 23:24:55 +0000 (16:24 -0700)]
mm: make the vma list be doubly linked
It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tilman Sauerbeck [Fri, 20 Aug 2010 21:01:47 +0000 (14:01 -0700)]
mtd: nand: Fix probe of Samsung NAND chips
Apparently, the check for a 6-byte ID string introduced by commit 426c457a3216fac74e3d44dd39729b0689f4c7ab ("mtd: nand: extend NAND flash
detection to new MLC chips") is NOT sufficient to determine whether or
not a Samsung chip uses their new MLC detection scheme or the old,
standard scheme. This adds a condition to check cell type.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de> Signed-off-by: Brian Norris <norris@broadcom.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org
Stefan Richter [Thu, 19 Aug 2010 21:13:43 +0000 (14:13 -0700)]
Documentation: kernel-locking: mutex_trylock cannot be used in interrupt context
Chapter 6 is right about mutex_trylock, but chapter 10 wasn't. This error
was introduced during semaphore-to-mutex conversion of the Unreliable
guide. :-)
If user context which performs mutex_lock() or mutex_trylock() is
preempted by interrupt context which performs mutex_trylock() on the same
mutex instance, a deadlock occurs. This is because these functions do not
disable local IRQs when they operate on mutex->wait_lock.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 19 Aug 2010 21:13:42 +0000 (14:13 -0700)]
drivers/scsi/qla4xxx: fix build
gcc-4.0.2:
drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4_8xxx_error_recovery':
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2377: sorry, unimplemented: called from here
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2393: sorry, unimplemented: called from here
Cc: Ravi Anand <ravi.anand@qlogic.com> Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 19 Aug 2010 21:13:40 +0000 (14:13 -0700)]
uml: fix compile error in dma_get_cache_alignment()
Fix uml compile error:
include/linux/dma-mapping.h:145: error: redefinition of 'dma_get_cache_alignment'
arch/um/include/asm/dma-mapping.h:99: note: previous definition of 'dma_get_cache_alignment' was here
Introduced by commit 4565f0170dfc ("dma-mapping: unify
dma_get_cache_alignment implementations")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jeff Dike <jdike@addtoit.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: __task_cred() need rcu_read_lock()
dump_tasks() needs to hold the RCU read lock around its access of the
target task's UID. To this end it should use task_uid() as it only needs
that one thing from the creds.
The fact that dump_tasks() holds tasklist_lock is insufficient to prevent the
target process replacing its credentials on another CPU.
Then, this patch change to call rcu_read_lock() explicitly.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: fix tasklist_lock leak
Commit 0aad4b3124 ("oom: fold __out_of_memory into out_of_memory")
introduced a tasklist_lock leak. Then it caused following obvious
danger warnings and panic.
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
rsyslogd/1422 is leaving the kernel with locks still held!
1 lock held by rsyslogd/1422:
#0: (tasklist_lock){.+.+.+}, at: [<ffffffff810faf64>] out_of_memory+0x164/0x3f0
BUG: scheduling while atomic: rsyslogd/1422/0x00000002
INFO: lockdep is turned off.
This patch fixes it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungmin Park [Thu, 19 Aug 2010 21:13:37 +0000 (14:13 -0700)]
drivers/mmc/host/sdhci-s3c.c: use the correct mutex and card detect function
There's some merge problem between sdhic core and sdhci-s3c host. After
mutex is changed to spinlock. It needs to use use spin lock functions and
use the correct card detection function.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 19 Aug 2010 21:13:33 +0000 (14:13 -0700)]
lib/radix-tree.c: fix overflow in radix_tree_range_tag_if_tagged()
When radix_tree_maxindex() is ~0UL, it can happen that scanning overflows
index and tree traversal code goes astray reading memory until it hits
unreadable memory. Check for overflow and exit in that case.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
: A second review after I've received a data sheet for this device from
: Fintek has turned up a few bugs.
:
: Unfortunately Giel (nor I) have time to fix this in time for the 2.6.36
: cycle. Therefor I would like to see this patch reverted as not having any
: support for the hwmon function of this superio chip is better then having
: unreliable support.
Cc: Giel van Schijndel <me@mortis.eu> Cc: Jean Delvare <khali@linux-fr.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ondrej Zary [Thu, 19 Aug 2010 21:13:25 +0000 (14:13 -0700)]
matroxfb: fix incorrect use of memcpy_toio()
Screen is completely corrupted since 2.6.34. Bisection revealed that it's
caused by commit 6175ddf06b61720 ("x86: Clean up mem*io functions.").
H. Peter Anvin explained that memcpy_toio() does not copy data in 32bit
chunks anymore on x86.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Petr Vandrovec <vandrove@vc.cvut.cz> Cc: Jean Delvare <khali@linux-fr.org> Cc: <stable@kernel.org> [2.6.34.x, 2.6.35.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Thu, 19 Aug 2010 18:10:29 +0000 (20:10 +0200)]
x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.
Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.
[ v2.1: fix !HOTPLUG_CPU build ]
Cc: <stable@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100819181029.GC17171@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Brian Norris [Wed, 18 Aug 2010 18:25:04 +0000 (11:25 -0700)]
mtd: nand: Fix regression in BBM detection
Commit c7b28e25cb9beb943aead770ff14551b55fa8c79 ("mtd: nand: refactor BB
marker detection") caused a regression in detection of factory-set bad
block markers, especially for certain small-page NAND. This fix removes
some unneeded constraints on using NAND_SMALL_BADBLOCK_POS, making the
detection code more correct.
This regression can be seen, for example, in Hynix HY27US081G1M and
similar.
Signed-off-by: Brian Norris <norris@broadcom.com> Tested-by: Michael Guntsche <mike@it-loops.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Linus Torvalds [Wed, 18 Aug 2010 22:45:23 +0000 (15:45 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix an Oops in the NFSv4 atomic open code
NFS: Fix the selection of security flavours in Kconfig
NFS: fix the return value of nfs_file_fsync()
rpcrdma: Fix SQ size calculation when memreg is FRMR
xprtrdma: Do not truncate iova_start values in frmr registrations.
nfs: Remove redundant NULL check upon kfree()
nfs: Add "lookupcache" to displayed mount options
NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
Linus Torvalds [Wed, 18 Aug 2010 22:29:38 +0000 (15:29 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
USB HID: Add ID for eGalax Multitouch used in JooJoo tablet
HID: hiddev: fix memory corruption due to invalid intfdata
HID: hiddev: protect against disconnect/NULL-dereference race
HID: picolcd: correct ordering of framebuffer freeing
HID: picolcd: testing the wrong variable
Jesse Barnes [Fri, 13 Aug 2010 22:11:26 +0000 (15:11 -0700)]
drm/i915: fix VGA plane disable for Ironlake+
We need to use I/O port instructions to access VGA registers on
Ironlake+, and it doesn't hurt on other platforms, so switch the VGA
plane disable function over to using them. Move it to init time as well
while we're at it, no need to repeatedly disable the VGA plane with
every mode set and DPMS event.
Jesse Barnes [Wed, 11 Aug 2010 17:06:44 +0000 (10:06 -0700)]
drm/i915: eDP mode set sequence corrections
We should disable the panel first when shutting down an eDP link. And
when turning one on, the panel needs to be enabled before link training
or eDP I/O won't be enabled.
Jesse Barnes [Wed, 11 Aug 2010 17:04:43 +0000 (10:04 -0700)]
drm/i915: add panel reset workaround
Ironlake requires that we clear the reset panel bit during power
sequences and restore it afterwards. Uncondtionally add code to do that
since it should be harmless on SNB+.
Missed the declaration of sys_execve in the ia64 asm/unistd.h (perhaps
because there is no reason for it to be there ... it might be a left over
from the COMPAT code?). Just delete the conflicting version.
Linus Torvalds [Wed, 18 Aug 2010 16:35:08 +0000 (09:35 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
Kusanagi Kouichi [Wed, 18 Aug 2010 16:32:37 +0000 (13:32 -0300)]
perf tools: Fix build error on read only source.
Parts of the build process were generating files outside the specified
O= directory, causing the build to fail on systems where the sources are
in a read only file system.
Fix it by using $(OUTPUT) on these locations.
Also check that $(OUTPUT) actually exists, just like the top level
kernel Makefile does. Otherwise the failure message emitted is
completely misleading.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20100817140841.0859362C03A@msa106.auone-net.jp> Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>