Lin Ming [Tue, 17 Nov 2009 05:49:50 +0000 (13:49 +0800)]
timekeeping: Fix clock_gettime vsyscall time warp
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
to struct timekeeper" the clock multiplier of vsyscall is updated with
the unmodified clock multiplier of the clock source and not with the
NTP adjusted multiplier of the timekeeper.
Add a new argument "mult" to update_vsyscall() and hand in the
timekeeping internal NTP adjusted multiplier.
Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* git://git.infradead.org/users/dwmw2/iommu-2.6.32:
intel-iommu: Support PCIe hot-plug
intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough
intel-iommu: Check for 'DMAR at zero' BIOS error earlier.
Linus Torvalds [Sat, 14 Nov 2009 21:03:24 +0000 (13:03 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: SMTC: Fix lockup in smtc_distribute_timer
MIPS: TXx9: Update rbtx49xx_defconfig
MIPS: Make local arrays with CL_SIZE static __initdata
MIPS: Add DMA declare coherent memory support
MIPS: Fix emulation of 64-bit FPU on FPU-less 64-bit CPUs.
Linus Torvalds [Sat, 14 Nov 2009 21:00:17 +0000 (13:00 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: psmouse - remove unneeded '\n' from psmouse.proto parameter
Input: atkbd - restore LED state at reconnect
Input: force LED reset on resume
Input: fix locking in memoryless force-feedback devices
Linus Torvalds [Sat, 14 Nov 2009 20:59:32 +0000 (12:59 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md/raid5: Allow dirty-degraded arrays to be assembled when only party is degraded.
Don't unconditionally set in_sync on newly added device in raid5_reshape
md: allow v0.91 metadata to record devices as being active but not in-sync.
md: factor out updating of 'recovery_offset'.
Linus Torvalds [Sat, 14 Nov 2009 20:59:06 +0000 (12:59 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability
Linus Torvalds [Sat, 14 Nov 2009 20:57:39 +0000 (12:57 -0800)]
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] s390: fix single stepping on svc0
[S390] sclp: undo quiesce handler override on resume
[S390] reset cputime accounting after IPL from NSS
[S390] monreader: fix use after free bug with suspend/resume
Oliver Neukum [Fri, 13 Nov 2009 13:26:23 +0000 (14:26 +0100)]
fix memory leak in fixed btusb_close
If the waker is killed before it can replay outstanding URBs, these URBs
won't be freed or will be replayed at the next open. This patch closes
the window by explicitely discarding outstanding URBs.
Petr Vandrovec [Sat, 14 Nov 2009 09:47:07 +0000 (10:47 +0100)]
Fix memory corruption caused by nfsd readdir+
Commit 8177e6d6dfb9cd03d9bdeb647c32161f8f58f686 ("nfsd: clean up
readdirplus encoding") introduced single character typo in nfs3 readdir+
implementation. Unfortunately that typo has quite bad side effects:
random memory corruption, followed (on my box) with immediate
spontaneous box reboot.
Using 'p1' instead of 'p' fixes my Linux box rebooting whenever VMware
ESXi box tries to list contents of my home directory.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin D. Kissell [Tue, 10 Nov 2009 19:45:46 +0000 (11:45 -0800)]
MIPS: SMTC: Fix lockup in smtc_distribute_timer
1. At the end of smtc_distribute_timer, nextstamp is valid and has already
passed so we goto repeat.
2. Nothing updates nextstamp (only updated if the timeout is in the future
And we just decided it is in the past)
3. At the end nextstamp still has the same value so it is still valid and
in the past.
4. This repeats until read_c0_count has a value which causes nextstamp to
be in the future.
Reported and initial patch and testing by Mikael Starvik
<mikael.starvik@axis.com>.
Signed-off-by: Kevin D. Kissell <kevink@paralogos.com> Cc: Mikael Starvik <mikael.starvik@axis.com> Cc: linux-mips@linux-mips.org Cc: Jesper Nilsson <Jesper.Nilsson@axis.com>
Patchwork: http://patchwork.linux-mips.org/patch/621/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Atsushi Nemoto [Sat, 7 Nov 2009 17:20:37 +0000 (02:20 +0900)]
MIPS: Make local arrays with CL_SIZE static __initdata
Since commit 22242681cff52bfb7cba5d2a37b91802be7a4e4c ("MIPS: Extend
COMMAND_LINE_SIZE"), CL_SIZE is 4096 and local array variables with this
size will cause an build failure with default CONFIG_FRAME_WARN settings.
Although current users of such array variables are all early bootstrap
code and not likely to cause real stack overflow (thread_info corruption),
it is preferable to to declare these arrays static with __initdata.
David Daney [Mon, 2 Nov 2009 19:33:46 +0000 (11:33 -0800)]
MIPS: Fix emulation of 64-bit FPU on FPU-less 64-bit CPUs.
Running a 64-bit kernel on a 64-bit CPU without an FPU would cause the
emulator to run in 32-bit mode. The c0_Status.FR bit is wired to zero
on systems without an FPU, so using that bit to decide how the emulator
behaves doesn't allow for proper emulation on 64-bit FPU-less
processors.
Instead, we need to select the emulator mode based on the user-space
ABI. Since the thread flag TIF_32BIT_REGS is used to set c0_Status.FR,
we can just use it to decide if the emulator should be in 32-bit or
64-bit mode.
Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
On s390 there are two ways of specifying the system call number for
the svc instruction. The standard way is to use the immediate field
in the instruction (or to use EXecute for values unknown during
assemble time). This can encode 256 system calls.
The kernel ABI also allows to put the system call number in r1 and
then execute svc 0 to enable system call numbers > 255.
It turns out that single stepping svc 0 is broken, since the PER
program check handler uses r1. We have to use a different register.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[S390] sclp: undo quiesce handler override on resume
In a system where the ctrl-alt-del init action initiated by signal
quiesce suspends the machine the quiesce handler override for
_machine_restart, _machine_halt and _machine_power_off needs to be
undone, otherwise the override is still present in the resumed
system. The next shutdown would then load the quiesce state psw
instead of performing the correct shutdown action.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[S390] reset cputime accounting after IPL from NSS
After an IPL from NSS the uptime of the system is incorrect. The reason
is that the startup code in head.S is not executed in case of an IPL
from NSS. Due to that sched_clock_base_cc which is used to initialze
wall_to_monotonic contains the time stamp when the NSS has been created
instead of the time stamp of the system start.
Reinitialize the cputime accounting values in create_kernel_nss after
the SAVESYS CP command that created the NSS segment.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Gerald Schaefer [Fri, 13 Nov 2009 14:43:51 +0000 (15:43 +0100)]
[S390] monreader: fix use after free bug with suspend/resume
The monreader device driver doesn't set dev->driver_data to NULL after
freeing the corresponding data structure. This leads to a use after
free bug in the freeze/thaw suspend/resume functions after the device
has been opened and closed once. Fix this by clearing dev->driver_data
in the close() function.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Dmitry Torokhov [Fri, 13 Nov 2009 07:19:05 +0000 (23:19 -0800)]
Input: atkbd - restore LED state at reconnect
Even though input core tells us to restore LED state and repeat rate
at resume keyboard may be reconnected either by request from userspace
(via sysfs) or just by pulling it from the box and plugging it back in.
In these cases we still need to restore state ourselves.
Dmitry Torokhov [Fri, 13 Nov 2009 07:19:05 +0000 (23:19 -0800)]
Input: force LED reset on resume
We should be sending EV_LED event down to drivers upon resume even in cases
when in-kernel state of the LED is off since device could come up with some
leds turned on.
Reported-and-tested-by: Mikko Vinni <mmvinni@yahoo.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
NeilBrown [Fri, 13 Nov 2009 06:47:00 +0000 (17:47 +1100)]
md/raid5: Allow dirty-degraded arrays to be assembled when only party is degraded.
Normally is it not safe to allow a raid5 that is both dirty and
degraded to be assembled without explicit request from that admin, as
it can cause hidden data corruption.
This is because 'dirty' means that the parity cannot be trusted, and
'degraded' means that the parity needs to be used.
However, if the device that is missing contains only parity, then
there is no issue and assembly can continue.
This particularly applies when a RAID5 is being converted to a RAID6
and there is an unclean shutdown while the conversion is happening.
So check for whether the degraded space only contains parity, and
in that case, allow the assembly.
NeilBrown [Fri, 13 Nov 2009 06:40:51 +0000 (17:40 +1100)]
Don't unconditionally set in_sync on newly added device in raid5_reshape
When a reshape finds that it can add spare devices into the array,
those devices might already be 'in_sync' if they are beyond the old
size of the array, or they might not if they are within the array.
The first case happens when we change an N-drive RAID5 to an
N+1-drive RAID5.
The second happens when we convert an N-drive RAID5 to an
N+1-drive RAID6.
So set the flag more carefully.
Also, ->recovery_offset is only meaningful when the flag is clear,
so only set it in that case.
This change needs the preceding two to ensure that the non-in_sync
device doesn't get evicted from the array when it is stopped, in the
case where v0.90 metadata is used.
NeilBrown [Fri, 13 Nov 2009 06:40:48 +0000 (17:40 +1100)]
md: allow v0.91 metadata to record devices as being active but not in-sync.
This is a combination that didn't really make sense before.
However when a reshape is converting e.g. raid5 -> raid6, the extra
device is not fully in-sync, but is certainly active and contains
important data.
So allow that start to be meaningful and in particular get
the 'recovery_offset' value (which is needed for any non-in-sync
active device) from the reshape_position.
Tejun Heo [Wed, 11 Nov 2009 06:35:18 +0000 (15:35 +0900)]
percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability
pcpu_extend_area_map() had the following two bugs.
* It should return 1 if pcpu_lock was dropped and reacquired but it
returned 0. This could lead to oops if free_percpu() races with
area map extension.
* pcpu_mem_free() was called under pcpu_lock. pcpu_mem_free() might
end up calling vfree() which isn't IRQ safe. This could lead to
deadlock through lock order inversion via IRQ.
In addition, Linus pointed out that the temporary lock dropping and
subtle three-way return value of pcpu_extend_area_map() was very ugly
and suggested to split the function into two - pcpu_need_to_extend()
and pcpu_extend_area_map().
This patch restructures pcpu_extend_area_map() as suggested and fixes
the two bugs.
Mike Hommey [Wed, 11 Nov 2009 22:26:55 +0000 (14:26 -0800)]
__generic_block_fiemap(): fix for files bigger than 4GB
Because of an integer overflow on start_blk, various kind of wrong results
would be returned by the generic_block_fiemap() handler, such as no
extents when there is a 4GB+ hole at the beginning of the file, or wrong
fe_logical when an extent starts after the first 4GB.
Signed-off-by: Mike Hommey <mh@glandium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Eric Sandeen <sandeen@sgi.com> Cc: Josef Bacik <jbacik@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rodolfo Giometti [Wed, 11 Nov 2009 22:26:54 +0000 (14:26 -0800)]
pps: events reporting fix up
PPS events must be recorded according to PPS's mode settings.
If a process asks for (i.e.) capture-assert events only, when the PPS
client calls the pps_event() function to save the current PPS event, we
should verify the event type and then discard unwanted ones.
Also, without this patch userland processes waiting for a specific PPS
event (assert or clear but not both) may be awakened at wrong time.
Signed-off-by: Rodolfo Giometti <giometti@linux.it> Tested-by: William S. Brasher <billb958@door.net> Tested-by: Reg Clemens <clemens@dwf.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergei Shtylyov [Wed, 11 Nov 2009 22:26:50 +0000 (14:26 -0800)]
gpiolib: fix device_create() result check
In case of failure, device_create() returns not NULL but the error code.
The current code checks for non-NULL though which causes kernel oops in
sysfs_create_group() when device_create() fails. Check for error using
IS_ERR() and propagate the error value using PTR_ERR() instead of fixed
-ENODEV code returned now...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Scott Valentine [Wed, 11 Nov 2009 22:26:49 +0000 (14:26 -0800)]
rtc: v3020: fix v3020_mmio_read_bit()
v3020_mmio_read_bit() always returns 0 when left_shift > 7.
v3020_mmio_read_bit()'s return type is (unsigned char). The code returns
a value masked by (1 << left_shift) that is casted to the return type. If
left_shift is larger than 7, the cast will always result in a 0 return
value. The problem was discovered with left_shift = 16, and the included
patch corrects the problem.
The bug was introduced in the last (Apr 3 2009) commit of the file, kernel
versions 2.6.30 and later.
Yoichi Yuasa [Wed, 11 Nov 2009 22:26:48 +0000 (14:26 -0800)]
rtc-vr41xx: fix do_div() warning
drivers/rtc/rtc-vr41xx.c: In function 'vr41xx_rtc_irq_set_freq':
drivers/rtc/rtc-vr41xx.c:217: warning: comparison of distinct pointer types lacks a cast
drivers/rtc/rtc-vr41xx.c:217: warning: right shift count >= width of type
drivers/rtc/rtc-vr41xx.c:217: warning: passing argument 1 of '__div64_32' from incompatible pointer type
include/asm-generic/div64.h:35: note: expected 'uint64_t *' but argument is of type 'long unsigned int *'
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Paul Gortmaker <p_gortmaker@yahoo.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rtc: pcf50633: consider alrm->enable in pcf50633_rtc_set_alarm
According to Documentation/rtc.txt, RTC_WKALM_SET sets the alarm time and
enables/disables the alarm. We implement RTC_WKALM_SET through
pcf50633_rtc_set_alarm. The enabling/disabling part was missing.
Signed-off-by: Werner Almesberger <werner@openmoko.org> Reported-by: Michael 'Mickey' Lauer <mickey@openmoko.org> Signed-off-by: Paul Fertser <fercerpav@gmail.com> Cc: Paul Gortmaker <p_gortmaker@yahoo.com> Cc: Balaji Rao <balajirrao@openmoko.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Ferre [Wed, 11 Nov 2009 22:26:35 +0000 (14:26 -0800)]
atmel_lcdfb: new alternate pixel clock formula
at91sam9g45 non ES lots have an alternate pixel clock calculation formula.
Introduce this one with condition on the cpu_is_xxxxx() macros.
Newer 9g45 SOC will not have good pixel clock calculation without this
fix.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Carstens [Wed, 11 Nov 2009 22:26:34 +0000 (14:26 -0800)]
fs: add missing compat_ptr handling for FS_IOC_RESVSP ioctl
For FS_IOC_RESVSP and FS_IOC_RESVSP64 compat_sys_ioctl() uses its
arg argument as a pointer to userspace. However it is missing a
a call to compat_ptr() which will do a proper pointer conversion.
This was introduced with 3e63cbb1 "fs: Add new pre-allocation ioctls
to vfs for compatibility with legacy xfs ioctls".
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ankit Jain <me@ankitjain.org> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Arnd Bergmann <arndbergmann@googlemail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> [2.6.31.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pidns: fix a leak in /proc dentries and inodes with pid namespaces.
Daniel Lezcano reported a leak in 'struct pid' and 'struct pid_namespace'
that is discussed in:
http://lkml.org/lkml/2009/10/2/159.
To summarize the thread, when container-init is terminated, it sets the
PF_EXITING flag, zaps other processes in the container and waits to reap
them. As a part of reaping, the container-init should flush any /proc
dentries associated with the processes. But because the container-init is
itself exiting and the following PF_EXITING check, the dentries are not
flushed, resulting in leak in /proc inodes and dentries.
This fix reverts the commit 7766755a2f249e7e0 ("Fix /proc dcache deadlock
in do_exit") which introduced the check for PF_EXITING. At the time of
the commit, shrink_dcache_parent() flushed dentries from other filesystems
also and could have caused a deadlock which the commit fixed. But as
pointed out by Eric Biederman, after commit 0feae5c47aabdde59,
shrink_dcache_parent() no longer affects other filesystems. So reverting
the commit is now safe.
As pointed out by Jan Kara, the leak is not as critical since the
unclaimed space will be reclaimed under memory pressure or by:
echo 3 > /proc/sys/vm/drop_caches
But since this check is no longer required, its best to remove it.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Jan Kara <jack@ucw.cz> Cc: Andrea Arcangeli <andrea@cpushare.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alpha: Clean up linker script using new linker script macros.
Note that .data.page_aligned and .data.cacheline_aligned are now after
_data; it was probably a bug that they were before it.
Also, some explicit ALIGN(8)'s between various initcall sections were
removed; this should be harmless as the implicit alignment of
initcall_t was already 8.
Cc: Geoffrey Thomas <geofft@ksplice.com> Cc: Tim Abbott <tabbott@ksplice.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Wed, 11 Nov 2009 22:26:25 +0000 (14:26 -0800)]
savagefb: fix blanking mode on CRT display
Fix wrong bit mask for blanking register. Due to the error a CRT monitor
blanks off due to wrong frequency (out of range) instead of PM signal
(vertical and horizontal frequencies cut off).
Just compare the mask with bits set in the switch(blank) clause below the
changed line.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Wed, 11 Nov 2009 22:26:22 +0000 (14:26 -0800)]
fb: remove fb_save_state() and fb_restore_state operations
Remove fb_save_state() and fb_restore_state operations from frame buffer layer.
They are used only in two drivers:
1. savagefb - and cause bug #11248
2. uvesafb
Usage of these operations is misunderstood in both drivers so kill these
operations, fix the bug #11248 and avoid confusion in the future.
Tested on Savage 3D/MV card and the patch fixes the bug #11248.
The frame buffer layer uses these funtions during switch between graphics
and text mode of the console, but these drivers saves state before
switching of the frame buffer (in the fb_open) and after releasing it (in
the fb_release). This defeats the purpose of these operations.
Mel Gorman [Wed, 11 Nov 2009 22:26:17 +0000 (14:26 -0800)]
page allocator: Do not allow interrupts to use ALLOC_HARDER
Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 ("page allocator:
calculate the alloc_flags for allocation only once") altered watermark
logic slightly by allowing rt_tasks that are handling an interrupt to set
ALLOC_HARDER. This patch brings the watermark logic more in line with
2.6.30.
This change results in a reduction of the number high-order GFP_ATOMIC
allocation failures reported. See
http://www.gossamer-threads.com/lists/linux/kernel/1144153
[rientjes@google.com: Spotted the problem] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Wed, 11 Nov 2009 22:26:14 +0000 (14:26 -0800)]
page allocator: always wake kswapd when restarting an allocation attempt after direct reclaim failed
If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not. Whether OOM is triggered or not, it may retry the
allocation afterwards. In times past, this would always wake kswapd as
well but currently, kswapd is not woken up after direct reclaim fails.
For order-0 allocations, this makes little difference but if there is a
heavy mix of higher-order allocations that direct reclaim is failing for,
it might mean that kswapd is not rewoken for higher orders as much as it
did previously.
This patch wakes up kswapd when an allocation is being retried after a
direct reclaim failure. It would be expected that kswapd is already
awake, but this has the effect of telling kswapd to reclaim at the higher
order as well.
Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Williamson [Wed, 4 Nov 2009 22:59:34 +0000 (15:59 -0700)]
intel-iommu: Obey coherent_dma_mask for alloc_coherent on passthrough
The model for IOMMU passthrough is that decent devices that can cope
with DMA to all of memory get passthrough; crappy devices with a limited
dma_mask don't -- they get to use the IOMMU anyway.
This is done on the basis that IOMMU passthrough is usually wanted for
performance reasons, and it's only the decent PCI devices that you
really care about performance for, while the crappy 32-bit ones like
your USB controller can just use the IOMMU and you won't really care.
Unfortunately, the check for this was only looking at dev->dma_mask, not
at dev->coherent_dma_mask. And some devices have a 32-bit
coherent_dma_mask even though they have a full 64-bit dma_mask.
Even more unfortunately, fixing that simple oversight would upset
certain broken HP devices. Not only do they have a 32-bit
coherent_dma_mask, but they also have a tendency to do stray DMA to
unmapped addresses. And then they die when they take the DMA fault they
so richly deserve.
So if we do the 'correct' fix, it'll mean that affected users have to
disable IOMMU support completely on "a large percentage of servers from
a major vendor."
Personally, I have little sympathy -- given that this is the _same_
'major vendor' who is shipping machines which claim to have IOMMU
support but have obviously never _once_ booted a VT-d capable OS to do
any form of QA. But strictly speaking, it _would_ be a regression even
though it only ever worked by fluke.
For 2.6.33, we'll come up with a quirk which gives swiotlb support
for this particular device, and other devices with an inadequate
coherent_dma_mask will just get normal IOMMU mapping.
The simplest fix for 2.6.32, though, is just to jump through some hoops
to try to allocate coherent DMA memory for such devices in a place that
they can reach. We'd use dma_generic_alloc_coherent() for this if it
existed on IA64.
Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
NeilBrown [Thu, 12 Nov 2009 01:08:04 +0000 (12:08 +1100)]
md: factor out updating of 'recovery_offset'.
Each device has its own 'recovery_offset' showing how far
recovery has progressed on the device.
As the only real significance of this is that fact that it can
be stored in the metadata and recovered at restart, and as
only 1.x metadata can do this, we were only updating
'recovery_offset' to 'curr_resync_completed' when updating
v1.x metadata.
But this is wrong, and we will shortly make limited use of this
field in v0.90 metadata.
Mike Turquette [Wed, 11 Nov 2009 19:00:38 +0000 (11:00 -0800)]
omap3: Decrease cpufreq transition latency
Adjust OMAP3 frequency transition latency from 10,000,000uS to a more
reasonable 300,000uS. This causes ondemand and conservative governors to
sample CPU load more often resulting in more responsive behavior.
Tested on Android 2.6.29; using this value and conservative governor, CORE
power consumption on Zoom2 was comparable to the old and unresponsive
10,000,000uS value while UI responsiveness was greatly improved.
Signed-off-by: Mike Turquette <mturquette@ti.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix panic when trying to destroy a newly allocated
Btrfs: allow more metadata chunk preallocation
Btrfs: fallback on uncompressed io if compressed io fails
Btrfs: find ideal block group for caching
Btrfs: avoid null deref in unpin_extent_cache()
Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root
Btrfs: fix some metadata enospc issues
Btrfs: fix how we set max_size for free space clusters
Btrfs: cleanup transaction starting and fix journal_info usage
Btrfs: fix data allocation hint start
Linus Torvalds [Wed, 11 Nov 2009 21:32:29 +0000 (13:32 -0800)]
btusb bluetooth driver: wait for 'waker' work too before closing
Rafael debugged a resume-time hang (with oopses in workqueue handling)
on his laptop that was due to the 'waker' workqueue entry being
disconnected and then released without the workqueue entry having been
synchronized.
Several people were involved, with Oleg Nesterov doing a debugging patch
showing what workqueue entry was corrupt etc.
This was a regression introduced by commit 7bee549e19 ("Bluetooth: Add
USB autosuspend support to btusb driver") as Rafael points out (not
actually bisected, but it became clear once the bug was found).
Tested-and-reported-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Oliver Neukum <oliver@neukum.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josef Bacik [Wed, 11 Nov 2009 20:53:34 +0000 (15:53 -0500)]
Btrfs: fix panic when trying to destroy a newly allocated
There is a problem where iget5_locked will look for an inode, not find it, and
then subsequently try to allocate it. Another CPU will have raced in and
allocated the inode instead, so when iget5_locked gets the inode spin lock again
and does a search, it finds the new inode. So it goes ahead and calls
destroy_inode on the inode it just allocated. The problem is we don't set
BTRFS_I(inode)->root until the new inode is completely initialized. This patch
makes us set root to NULL when alloc'ing a new inode, so when we get to
btrfs_destroy_inode and we see that root is NULL we can just free up the memory
and continue on. This fixes the panic
Linus Torvalds [Wed, 11 Nov 2009 19:52:22 +0000 (11:52 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
JBD/JBD2: free j_wbuf if journal init fails.
ext3: Wait for proper transaction commit on fsync
ext3: retry failed direct IO allocations
Linus Torvalds [Wed, 11 Nov 2009 19:34:14 +0000 (11:34 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI: Adjust GFP mask handling for coherent allocations
PCI ASPM: fix oops on root port removal
Linus Torvalds [Wed, 11 Nov 2009 19:33:08 +0000 (11:33 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: pasemi_defconfig update
powerpc: 2.6.32 update of defconfigs for embedded 6xx/7xxx, 8xx, 8{3,5,6}xxx
powerpc/8xxx: enable IPsec ESP by default on mpc83xx/mpc85xx
powerpc/83xx: Fix u-boot partion size for MPC8377E-WLAN boards
powerpc/85xx: Fix USB GPIOs for MPC8569E-MDS boards
powerpc/82xx: kmalloc failure ignored in ep8248e_mdio_probe()
powerpc/85xx: sbc8548 - fixup of PCI-e related DTS fields
Linus Torvalds [Wed, 11 Nov 2009 19:32:04 +0000 (11:32 -0800)]
Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (52 commits)
drm/kms: Init the CRTC info fields for modes forced from the command line.
drm/radeon/r600: CS parser updates
drm/radeon/kms: add debugfs for power management for AtomBIOS devices
drm/radeon/kms: initial mode validation support
drm/radeon/kms/atom/dce3: call transmitter init on mode set
drm/radeon/kms: store detailed connector info
drm/radeon/kms/atom/dce3: fix up usPixelClock calculation for Transmitter tables
drm/radeon/kms/r600: fix rs880 support v2
drm/radeon/kms/r700: fix some typos in chip init
drm/radeon/kms: remove some misleading debugging output
drm/radeon/kms: stop putting VRAM at 0 in MC space on r600s.
drm/radeon/kms: disable D1VGA and D2VGA if enabled
drm/radeon/kms: Don't RMW CP_RB_CNTL
drm/radeon/kms: fix coherency issues on AGP cards.
drm/radeon/kms: fix rc410 suspend/resume.
drm/radeon/kms: add quirk for hp dc5750
drm/radeon/kms/atom: fix potential oops in spread spectrum code
drm/kms: typo fix
drm/radeon/kms/atom: Make card_info per device
drm/radeon/kms/atom: Fix DVO support
...
Linus Torvalds [Wed, 11 Nov 2009 19:30:15 +0000 (11:30 -0800)]
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE
highmem: Fix race in debug_kmap_atomic() which could cause warn_count to underflow
rcu: Fix long-grace-period race between forcing and initialization
uids: Prevent tear down race
Linus Torvalds [Wed, 11 Nov 2009 19:29:34 +0000 (11:29 -0800)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Fix permission checks
perf_events: Fix some typo in the perf events config description
Linus Torvalds [Wed, 11 Nov 2009 19:29:24 +0000 (11:29 -0800)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Use root_task_group_empty only with FAIR_GROUP_SCHED
sched: Fix kernel-doc function parameter name
Linus Torvalds [Wed, 11 Nov 2009 19:29:10 +0000 (11:29 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, amd-ucode: Check UCODE_MAGIC before loading the container file
x86: Fix error return sequence in __ioremap_caller()
x86: Add Phoenix/MSC BIOSes to lowmem corruption list
Linus Torvalds [Wed, 11 Nov 2009 19:28:11 +0000 (11:28 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: partial revert to fix double brelse WARNING()
ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O
ext4: code clean up for dio fallocate handling
ext4: skip conversion of uninit extents after direct IO if there isn't any
ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents
ext4: discard preallocation when restarting a transaction during truncate
Linus Torvalds [Wed, 11 Nov 2009 19:27:26 +0000 (11:27 -0800)]
Merge branch 'fixes-s3c-2632-rc6' of git://git.fluff.org/bjdooks/linux
* 'fixes-s3c-2632-rc6' of git://git.fluff.org/bjdooks/linux:
ARM: S3C64XX: DMA: Free node for non-circular queues
ARM: S3C64XX: DMA: Callback with correct buffer pointer
ARM: S3C64XX: DMA: Make src and dst transfer size same
ARM: S3C64XX: DMA: Unify callback functions for success/failure
ARM: S3C64XX: DMA: Protect buffer pointers while manipulation
ARM: S3C64XX: Tidy definition and comments in s3c_dma_has_circular()
ARM: S3C64XX: Remove duplicate s3c_dma_has_circular() definition for S3C64xx.
ARM: SMDK6410: Allocate more GPIO space for WM1190-EV1
ARM: SMDK6410: Configure GPIO pull up for WM835x IRQ line
Linus Torvalds [Wed, 11 Nov 2009 19:26:42 +0000 (11:26 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (27 commits)
V4L/DVB (13314): saa7134: set ts_force_val for the Hauppauge WinTV HVR-1150
V4L/DVB (13313): saa7134: add support for FORCE_TS_VALID mode for mpeg ts input
V4L/DVB (13311): uvcvideo: Fix compilation warning with 2.6.32 due to type mismatch with abs()
V4L/DVB (13309): uvcvideo: Ignore the FIX_BANDWIDTH for compressed video
V4L/DVB (13287): ce6230 - saa7164-cmd: Fix wrong sizeof
V4L/DVB (13286): pxa-camera: Fix missing sched.h
V4L/DVB (13264): gspca_mr97310a: Change vstart for CIF sensor type 1 cams
V4L/DVB (13257): gspca - m5602-s5k4aa: Add vflip for Fujitsu Amilo Xi 2528
V4L/DVB (13256): gspca - m5602-s5k4aa: Add another MSI GX700 vflip quirk
V4L/DVB (13255): gspca - m5602-s5k4aa: Add vflip quirk for the Bruneinit laptop
V4L/DVB (13240): firedtv: fix regression: tuning fails due to bogus error return
V4L/DVB (13237): firedtv: length field corrupt in ca2host if length>127
V4L/DVB (13230): s2255drv: Don't conditionalize video buffer completion on waiting processes
V4L/DVB (13202): smsusb: add autodetection support for three additional Hauppauge USB IDs
V4L/DVB (13190): em28xx: fix panic that can occur when starting audio streaming
V4L/DVB (13170): bttv: Fix reversed polarity error when switching video standard
V4L/DVB (13169): bttv: Fix potential out-of-order field processing
V4L/DVB (13167): pt1: Fix a compile error on arm
V4L/DVB (13132): fix use-after-free Oops, resulting from a driver-core API change
V4L/DVB (13131): pxa_camera: fix camera pixel format configuration
...
Chris Mason [Wed, 11 Nov 2009 15:16:57 +0000 (10:16 -0500)]
Btrfs: allow more metadata chunk preallocation
On an FS where all of the space has not been allocated into chunks yet,
the enospc can return enospc just because the existing metadata chunks
are full.
We get around this by allowing more metadata chunks to be allocated up
to a certain limit, and finding the right limit is a little fuzzy. The
problem is the reservations for delalloc would preallocate way too much
of the FS as metadata. We need to start saying no and just force some
IO to happen.
But we also need to let a reasonable amount of the FS become metadata.
This bumps the hard limit up, later releases will have a better system.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Wed, 11 Nov 2009 02:23:48 +0000 (21:23 -0500)]
Btrfs: fallback on uncompressed io if compressed io fails
Currently compressed IO does not deal with not having its entire extent able to
be allocated. So if we have enough free space to allocate for the extent, but
its not contiguous, it will fail spectacularly. This patch fixes this by
falling back on uncompressed IO which lets us spread the delalloc extent across
multiple extents. I tested this by making us randomly think the reservation had
failed to make it fallback on the uncompressed io way and it seemed to work
fine. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Wed, 11 Nov 2009 02:23:48 +0000 (21:23 -0500)]
Btrfs: find ideal block group for caching
This patch changes a few things. Hopefully the comments are helpfull, but
I'll try and be as verbose here.
Problem:
My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root.
Part of this problem was we pick the first block group we can find and start
caching it, even if it may not have enough free space. The other problem is
we only search for cached block groups the first time around, which we won't
find any cached block groups because this is a newly mounted fs, so we end up
caching several block groups during bootup, which with alot of fragmentation
takes around 30-45 seconds to complete, which bogs down the system. So
Solution:
1) Don't cache block groups willy-nilly at first. Instead try and figure out
which block group has the most free, and therefore will take the least amount
of time to cache.
2) Don't be so picky about cached block groups. The other problem is once
we've filled up a cluster, if the block group isn't finished caching the next
time we try and do the allocation we'll completely ignore the cluster and
start searching from the beginning of the space, which makes us cache more
block groups, which slows us down even more. So instead of skipping block
groups that are not finished caching when we have a hint, only skip the block
group if it hasn't started caching yet.
There is one other tweak in here. Before if we allocated a chunk and still
couldn't find new space, we'd end up switching the space info to force another
chunk allocation. This could make us end up with way too many chunks, so keep
track of this particular case.
With this patch and my previous cluster fixes my fedora box now boots in 43
seconds, and according to the bootchart is not held up by our block group
caching at all.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Wed, 11 Nov 2009 02:23:48 +0000 (21:23 -0500)]
Btrfs: fix how we set max_size for free space clusters
This patch fixes a problem where max_size can be set to 0 even though we
filled the cluster properly. We set max_size to 0 if we restart the cluster
window, but if the new start entry is big enough to be our new cluster then we
could return with a max_size set to 0, which will mean the next time we try to
allocate from this cluster it will fail. So set max_extent to the entry's
size. Tested this on my box and now we actually allocate from the cluster
after we fill it. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Wed, 11 Nov 2009 02:23:48 +0000 (21:23 -0500)]
Btrfs: cleanup transaction starting and fix journal_info usage
We use journal_info to tell if we're in a nested transaction to make sure we
don't commit the transaction within a nested transaction. We use another
method to see if there are any outstanding ioctl trans handles, so if we're
starting one do not set current->journal_info, since it will screw with other
filesystems. This patch also cleans up the starting stuff so there aren't any
magic numbers.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Wed, 11 Nov 2009 02:23:47 +0000 (21:23 -0500)]
Btrfs: fix data allocation hint start
Sometimes our start allocation hint when we cow a file can be either
EXTENT_HOLE or some other such place holder, which is not optimal. So if we
find that our em->block_start is one of these special values, check to see
where the first block of the inode is stored, and use that as a hint. If that
block is also a special value, just fallback on a hint of 0 and let the
allocator figure out a good place to put the data.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch updates defconfig to enable options needed to properly
boot OMAP3 pandora board. It also enables MMC, OTG, GPIO LEDs,
TWL4030 GPIO and sound drivers.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
The original TWL4030 keypad driver from linux-omap used KEY()
macro defined as (col, row), but while it was merged upstream
it was changed to use matrix keypad infrastructure, which uses
(row, col) format. Update the keymap in board file to match
layout of mainline driver.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
- keep kernel small enough to boot with standard tools,
- ensure compatibility with both new and legacy distros,
- turn on support for recently added or fixed hardware features.
Created and tested against linux-2.6.32-rc5.
Signed-off-by: Janusz Krzysztofik <jkrzysz@tis.icnet.pl> Signed-off-by: Tony Lindgren <tony@atomide.com>
The definite reason for broken omapfb/lcdc behavoiur in PM mode
appeared to be ARM_IDLECT1:IDLIF_ARM (bit 6) put into idle regardless of LCD
DMA possibly running. The bit were set based on return value of the
omap_dma_running() function that did not check for dedicated LCD DMA
channel status. The patch below fixes this.
Note that the hardcoded register value will be fixed during the next merge
cycle to use OMAP_LCDC_ defines. Currently the OMAP_LCDC_ defines are local
to drivers/video/omap/lcdc.c, so let's not start moving those right now.
Created against linux-2.6.32-rc6
Tested on Amstrad Delta
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> Signed-off-by: Tony Lindgren <tony@atomide.com>
Tao Ma [Tue, 10 Nov 2009 09:13:22 +0000 (17:13 +0800)]
JBD/JBD2: free j_wbuf if journal init fails.
If journal init fails, we need to free j_wbuf.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Fri, 16 Oct 2009 17:26:15 +0000 (19:26 +0200)]
ext3: Wait for proper transaction commit on fsync
We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>