Yinghai Lu [Tue, 12 Oct 2010 21:07:09 +0000 (14:07 -0700)]
memblock, bootmem: Round pfn properly for memory and reserved regions
We need to round memory regions correctly -- specifically, we need to
round reserved region in the more expansive direction (lower limit
down, upper limit up) whereas usable memory regions need to be rounded
in the more restrictive direction (lower limit up, upper limit down).
Although they are antisymmetric (and therefore are technically
duplicates) the use of the different inlines explicitly documents the
programmer's intention.
The lack of proper rounding caused a bug on ARM, which was then found
to also affect other architectures.
Reported-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB4CDFD.4020105@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Mon, 11 Oct 2010 19:34:09 +0000 (12:34 -0700)]
memblock: Annotate memblock functions with __init_memblock
Stephen found
WARNING: mm/built-in.o(.text+0x25ab8): Section mismatch in reference from the function memblock_find_base() to the function .init.text:memblock_find_region()
The function memblock_find_base() references
the function __init memblock_find_region().
This is often because memblock_find_base lacks a __init
annotation or the annotation of memblock_find_region is wrong.
So let memblock_find_region() to use __init_memblock instead of __init
directly.
Also fix one function that did not have __init* to be __init_memblock.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB366B1.40405@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The Xen setup code needs to call memblock_x86_reserve_range() very early,
so allow it to initialize the memblock subsystem before doing so. The
second memblock_init() is ignored.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <4CACFDAD.3090900@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Before that commit, register_active_regions() is called for every SRAT memory
entry right away.
Use nodememblk_range[] instead of nodes[] in order to make sure we
capture the actual memory blocks registered with each node. nodes[]
contains an extended range which spans all memory regions associated
with a node, but that does not mean that all the memory in between are
included.
Reported-by: Russ Anderson <rja@sgi.com> Tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB27BDF.5000800@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> 2.6.33 .34 .35 .36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Borislav Petkov [Fri, 8 Oct 2010 10:08:34 +0000 (12:08 +0200)]
x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
This fixes possible cases of not collecting valid error info in
the MCE error thresholding groups on F10h hardware.
The current code contains a subtle problem of checking only the
Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
thresholding group) in its first iteration and breaking out if
the bit is cleared.
But (!), this MSR contains an offset value, BlkPtr[31:24], which
points to the remaining MSRs in this thresholding group which
might contain valid information too. But if we bail out only
after we checked the valid bit in the first MSR and not the
block pointer too, we miss that other information.
The thing is, MC4_MISC0[BlkPtr] is not predicated on
MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
prior to iterating over the MCI_MISCj thresholding group,
irrespective of the MC4_MISC0[Valid] setting.
Linus Torvalds [Thu, 7 Oct 2010 20:59:32 +0000 (13:59 -0700)]
Merge branch 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
HWPOISON: Stop shrinking at right page count
HWPOISON: Report correct address granuality for AO huge page errors
HWPOISON: Copy si_addr_lsb to user
page-types.c: fix name of unpoison interface
Linus Torvalds [Thu, 7 Oct 2010 20:50:48 +0000 (13:50 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: check return code of read_sb_page
md/raid1: minor bio initialisation improvements.
md/raid1: avoid overflow in raid1 resync when bitmap is in use.
Linus Torvalds [Thu, 7 Oct 2010 20:47:20 +0000 (13:47 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: don't drop handle reference on unload
drm/ttm: Fix two race conditions + fix busy codepaths
Linus Torvalds [Thu, 7 Oct 2010 20:44:30 +0000 (13:44 -0700)]
Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
of/i2c: Fix module load order issue caused by of_i2c.c
i2c: Fix checks which cause legacy suspend to never get called
i2c-pca: Fix waitforcompletion() return value
i2c: Fix for suspend/resume issue
i2c: Remove obsolete cleanup for clientdata
Eric Dumazet [Thu, 7 Oct 2010 19:59:29 +0000 (12:59 -0700)]
sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
When proc_doulongvec_minmax() is used with an array of longs, and no
min/max check requested (.extra1 or .extra2 being NULL), we dereference a
NULL pointer for the second element of the array.
Noticed while doing some changes in network stack for the "16TB problem"
Fix is to not change min & max pointers in __do_proc_doulongvec_minmax(),
so that all elements of the vector share an unique min/max limit, like
proc_dointvec_minmax().
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungmin Park [Thu, 7 Oct 2010 19:59:28 +0000 (12:59 -0700)]
MAINTAINERS: add Samsung S5P series FIMC maintainers
Add Samsung S5P series FIMC(Camera Interface) maintainers.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Sylwester Nawrocki <s.nawrocki@samsung.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to check parent's thresholds if parent has use_hierarchy == 1 to
be sure that parent's threshold events will be triggered even if parent
itself is not active (no MEM_CGROUP_EVENTS).
HWPOISON: Report correct address granuality for AO huge page errors
The SIGBUS user space signalling is supposed to report the
address granuality of a corruption. Pass this information correctly
for huge pages by querying the hpage order.
The original hwpoison code added a new siginfo field si_addr_lsb to
pass the granuality of the fault address to user space. Unfortunately
this field was never copied to user space. Fix this here.
I added explicit checks for the MCEERR codes to avoid having
to patch all potential callers to initialize the field.
Jens Axboe [Thu, 7 Oct 2010 07:35:16 +0000 (09:35 +0200)]
elevator: fix oops on early call to elevator_change()
2.6.36 introduces an API for drivers to switch the IO scheduler
instead of manually calling the elevator exit and init functions.
This API was added since q->elevator must be cleared in between
those two calls. And since we already have this functionality
directly from use by the sysfs interface to switch schedulers
online, it was prudent to reuse it internally too.
But this API needs the queue to be in a fully initialized state
before it is called, or it will attempt to unregister elevator
kobjects before they have been added. This results in an oops
like this:
Dave Airlie [Thu, 7 Oct 2010 04:01:17 +0000 (14:01 +1000)]
drm: don't drop handle reference on unload
since the handle references are all tied to a file_priv, and when it disappears
all the handle refs go with it.
The fbcon ones we'd only notice on unload, but the nouveau notifier one
would would happen on reboot.
nouveau: Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
nouveau: Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
i915 unload: Reported-by: Keith Packard <keithp@keithp.com> Acked-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Johannes Weiner [Fri, 1 Oct 2010 07:43:54 +0000 (07:43 +0000)]
xfs: properly account for reclaimed inodes
When marking an inode reclaimable, a per-AG counter is increased, the
inode is tagged reclaimable in its per-AG tree, and, when this is the
first reclaimable inode in the AG, the AG entry in the per-mount tree
is also tagged.
When an inode is finally reclaimed, however, it is only deleted from
the per-AG tree. Neither the counter is decreased, nor is the parent
tree's AG entry untagged properly.
Since the tags in the per-mount tree are not cleared, the inode
shrinker iterates over all AGs that have had reclaimable inodes at one
point in time.
The counters on the other hand signal an increasing amount of slab
objects to reclaim. Since "70e60ce xfs: convert inode shrinker to
per-filesystem context" this is not a real issue anymore because the
shrinker bails out after one iteration.
But the problem was observable on a machine running v2.6.34, where the
reclaimable work increased and each process going into direct reclaim
eventually got stuck on the xfs inode shrinking path, trying to scan
several million objects.
Fix this by properly unwinding the reclaimable-state tracking of an
inode when it is reclaimed.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@kernel.org Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
Vasiliy Kulikov [Fri, 1 Oct 2010 21:18:12 +0000 (14:18 -0700)]
md: check return code of read_sb_page
Function read_sb_page may return ERR_PTR(...). Check for it.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 7 Oct 2010 01:00:50 +0000 (12:00 +1100)]
md/raid1: minor bio initialisation improvements.
When performing a resync we pre-allocate some bios and repeatedly use
them. This requires us to re-initialise them each time.
One field (bi_comp_cpu) and some flags weren't being initiaised
reliably.
NeilBrown [Thu, 7 Oct 2010 00:54:46 +0000 (11:54 +1100)]
md/raid1: avoid overflow in raid1 resync when bitmap is in use.
bitmap_start_sync returns - via a pass-by-reference variable - the
number of sectors before we need to check with the bitmap again.
Since commit ef4256733506f245 this number can be substantially larger,
2^27 is a common value.
Unfortunately it is an 'int' and so when raid1.c:sync_request shifts
it 9 places to the left it becomes 0. This results in a zero-length
read which the scsi layer justifiably complains about.
This patch just removes the shift so the common case becomes safe with
a trivially-correct patch.
In the next merge window we will convert this 'int' to a 'sector_t'
Linus Torvalds [Wed, 6 Oct 2010 20:27:19 +0000 (13:27 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
MIPS: Octeon: Place cnmips_cu2_setup in __init memory.
MIPS: Don't place cu2 notifiers in __cpuinitdata
MIPS: Calculate VMLINUZ_LOAD_ADDRESS based on the length of vmlinux.bin
MIPS: Alchemy: Resolve prom section mismatches
MIPS: Fix syscall 64 bit number comments.
MIPS: Hookup fanotify_init, fanotify_mark, and prlimit64 syscalls.
MIPS: TX49xx: Rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN
MIPS: N32: Fix getdents64 syscall for n32
MIPS: Remove pr_<level> uses of KERN_<level>
MIPS: PNX8550: Sort out machine halt, restart and powerdown functions.
MIPS: GIC: Remove dependencies from Malta files.
MIPS: Kconfig: Fix and clarify kconfig help text for VSMP and SMTC.
MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.
MIPS: Audit: Fix hang in entry.S.
MIPS: Document why RELOC_HIDE is there.
MIPS: Octeon: Determine if helper needs to be built
MIPS: Use generic atomic64 for 32-bit kernels
MIPS: RM7000: Symbol should be static
MIPS: kspd: Adjust confusing if indentation
MIPS: Fix a typo.
Linus Torvalds [Wed, 6 Oct 2010 16:51:28 +0000 (09:51 -0700)]
Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
xen: do not initialize PV timers on HVM if !xen_have_vector_callback
xen: do not set xenstored_ready before xenbus_probe on hvm
Yinghai Lu [Tue, 5 Oct 2010 23:15:15 +0000 (16:15 -0700)]
x86-32, memblock: Make add_highpages honor early reserved ranges
Originally the only early reserved range that is overlapped with high
pages is "KVA RAM", but we already do remove that from the active ranges.
However, It turns out Xen could have that kind of overlapping to support memory
ballooning.x
So we need to make add_highpage_with_active_regions() to subtract
memblock reserved just like low ram; this is the proper design anyway.
In this patch, refactering get_freel_all_memory_range() to make it can
be used by add_highpage_with_active_regions(). Also we don't need to
remove "KVA RAM" from active ranges.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CABB183.1040607@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Tue, 5 Oct 2010 23:05:14 +0000 (16:05 -0700)]
x86, memblock: Fix crashkernel allocation
Cai Qian found crashkernel is broken with the x86 memblock changes.
1. crashkernel=128M@32M always reported that range is used, even if
the first kernel is small and does not usethat range
2. we always got following report when using "kexec -p"
Could not find a free area of memory of a000 bytes...
locate_hole failed
The root cause is that generic memblock_find_in_range() will try to
allocate from the top of the range, whereas the kexec code was written
assuming that allocation was always near the bottom and that it could
blindly extend memory upward. Unfortunately the kexec code doesn't
have a system for requesting the range that it really needs, so this
is subject to probabilistic failures.
This patch hacks around the problem by limiting the target range
heuristically to below the traditional bzImage max range. This number
is arbitrary and not always correct, and a much better result would be
obtained by having kexec communicate this number based on the kernel
header information and any appropriate command line options.
Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CABAF2A.5090501@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Since powerpc uses -Werror on arch powerpc, the build was broken like
this:
cc1: warnings being treated as errors
arch/powerpc/kernel/module.c: In function 'module_finalize':
arch/powerpc/kernel/module.c:66: error: unused variable 'err'
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Hellstrom [Thu, 30 Sep 2010 10:36:45 +0000 (12:36 +0200)]
drm/ttm: Fix two race conditions + fix busy codepaths
This fixes a race pointed out by Dave Airlie where we don't take a buffer
object about to be destroyed off the LRU lists properly. It also fixes a rare
case where a buffer object could be destroyed in the middle of an
accelerated eviction.
The patch also adds a utility function that can be used to prematurely
release GPU memory space usage of an object waiting to be destroyed.
For example during eviction or swapout.
The above mentioned commit didn't queue the buffer on the delayed destroy
list under some rare circumstances. It also didn't completely honor the
remove_all parameter.
Linus Torvalds [Tue, 5 Oct 2010 18:57:37 +0000 (11:57 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf trace scripting: Fix extern struct definitions
perf ui hist browser: Fix segfault on 'a' for annotate
perf tools: Fix build breakage
perf, x86: Handle in flight NMIs on P4 platform
oprofile, ARM: Release resources on failure
oprofile: Add Support for Intel CPU Family 6 / Model 29
The "flags" member of "struct wait_queue_t" is used in several places in
the kernel code without beeing initialized by init_wait(). "flags" is
used in bitwise operations.
If "flags" not initialized then unexpected behaviour may take place.
Incorrect flags might used later in code.
Added initialization of "wait_queue_t.flags" with zero value into
"init_wait".
Signed-off-by: Evgeny Kuznetsov <EXT-Eugeny.Kuznetsov@nokia.com>
[ The bit we care about does end up being initialized by both
prepare_to_wait() and add_to_wait_queue(), so this doesn't seem to
cause actual bugs, but is definitely the right thing to do -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 5 Oct 2010 18:29:27 +0000 (11:29 -0700)]
modules: Fix module_bug_list list corruption race
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.
However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling. That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.
Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.
So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.
Future fixups:
- move the module list handling code into kernel/module.c where it
belongs.
- get rid of 'module_bug_list' and just use the regular list of modules
(called 'modules' - imagine that) that we already create and maintain
for other reasons.
Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xen: do not initialize PV timers on HVM if !xen_have_vector_callback
if !xen_have_vector_callback do not initialize PV timer unconditionally
because we still don't know how many cpus are available and if there is
more than one we won't be able to receive the timer interrupts on
cpu > 0.
This patch fixes an hang at boot when Xen does not support vector
callbacks and the guest has multiple vcpus.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
xen: do not set xenstored_ready before xenbus_probe on hvm
Register_xenstore_notifier should guarantee that the caller gets
notified even if xenstore is already up.
Therefore we revert "do not notify callers from
register_xenstore_notifier" and set xenstored_read at the right time for
PV on HVM guests too.
In fact in case of PV on HVM guests xenstored is ready only after the
platform pci driver has completed the initialization, so do not set
xenstored_ready before the call to xenbus_probe().
This patch fixes a shutdown_event watcher registration bug that causes
"xm shutdown" not to work properly.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
wacom_open() takes wacom->lock and calls usb_autopm_get_interface(),
which in turn calls wacom_resume() which tries to acquire the lock
again.
The fix is to call usb_autopm_get_interface() first, before we take
the lock.
Since we do not do usb_autopm_put_interface() until wacom_close()
is called runtime PM is effectively disabled for the driver, however
changing it now would risk regressions so the complete fix will
have to wait till the next merge window.
Linus Torvalds [Mon, 4 Oct 2010 18:15:59 +0000 (11:15 -0700)]
Merge branch 'fix/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'fix/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: i2c/other/ak4xx-adda: Fix a compile warning with CONFIG_PROCFS=n
ALSA: prevent heap corruption in snd_ctl_new()
Linus Torvalds [Mon, 4 Oct 2010 18:13:22 +0000 (11:13 -0700)]
Merge branch 'merge-spi' of git://git.secretlab.ca/git/linux-2.6
* 'merge-spi' of git://git.secretlab.ca/git/linux-2.6:
of/spi: Fix OF-style driver binding of spi devices
spi: spi-gpio.c tests SPI_MASTER_NO_RX bit twice, but not SPI_MASTER_NO_TX
spi/mpc8xxx: fix buffer overrun on large transfers
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
vlan: dont drop packets from unknown vlans in promiscuous mode
Phonet: Correct header retrieval after pskb_may_pull
um: Proper Fix for f25c80a4: remove duplicate structure field initialization
ip_gre: Fix dependencies wrt. ipv6.
net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()
iwl3945: queue the right work if the scan needs to be aborted
mac80211: fix use-after-free
Linus Torvalds [Mon, 4 Oct 2010 18:10:26 +0000 (11:10 -0700)]
Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
drm/i915: Rephrase pwrite bounds checking to avoid any potential overflow
drm/i915: Sanity check pread/pwrite
drm/i915: Use pipe state to tell when pipe is off
drm/i915: vblank status not valid while training display port
drivers/gpu/drm/i915/i915_gem.c: Add missing error handling code
drm/i915: Fix refleak during eviction.
drm/i915: fix GMCH power reporting
Hugh Dickins [Sun, 3 Oct 2010 00:49:08 +0000 (17:49 -0700)]
ksm: fix bad user data when swapping
Building under memory pressure, with KSM on 2.6.36-rc5, collapsed with
an internal compiler error: typically indicating an error in swapping.
Perhaps there's a timing issue which makes it now more likely, perhaps
it's just a long time since I tried for so long: this bug goes back to
KSM swapping in 2.6.33.
Notice how reuse_swap_page() allows an exclusive page to be reused, but
only does SetPageDirty if it can delete it from swap cache right then -
if it's currently under Writeback, it has to be left in cache and we
don't SetPageDirty, but the page can be reused. Fine, the dirty bit
will get set in the pte; but notice how zap_pte_range() does not bother
to transfer pte_dirty to page_dirty when unmapping a PageAnon.
If KSM chooses to share such a page, it will look like a clean copy of
swapcache, and not be written out to swap when its memory is needed;
then stale data read back from swap when it's needed again.
We could fix this in reuse_swap_page() (or even refuse to reuse a
page under writeback), but it's more honest to fix my oversight in
KSM's write_protect_page(). Several days of testing on three machines
confirms that this fixes the issue they showed.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sun, 3 Oct 2010 00:46:06 +0000 (17:46 -0700)]
ksm: fix page_address_in_vma anon_vma oops
2.6.36-rc1 commit 21d0d443cdc1658a8c1484fdcece4803f0f96d0e "rmap:
resurrect page_address_in_vma anon_vma check" was right to resurrect
that check; but now that it's comparing anon_vma->roots instead of
just anon_vmas, there's a danger of oopsing on a NULL anon_vma.
In most cases no NULL anon_vma ever gets here; but it turns out that
occasionally KSM, when enabled on a forked or forking process, will
itself call page_address_in_vma() on a "half-KSM" page left over from
an earlier failed attempt to merge - whose page_anon_vma() is NULL.
It's my bug that those should be getting here at all: I thought they
were already dealt with, this oops proves me wrong, I'll fix it in
the next release - such pages are effectively pinned until their
process exits, since rmap cannot find their ptes (though swapoff can).
For now just work around it by making page_address_in_vma() safe (and
add a comment on why that check is wanted anyway). A similar check
in __page_check_anon_rmap() is safe because do_page_add_anon_rmap()
already excluded KSM pages.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shmulik Ladkani [Tue, 31 Aug 2010 10:24:19 +0000 (13:24 +0300)]
MIPS: Calculate VMLINUZ_LOAD_ADDRESS based on the length of vmlinux.bin
Fix VMLINUZ_LOAD_ADDRESS calculation to be based on the length of
vmlinux.bin, the actual uncompressed kernel binary.
Previously it was based on the length of KBUILD_IMAGE (the unstripped ELF
vmlinux), which is bigger than vmlinux.bin. As a result, vmlinuz was
loaded into a memory address higher then actually needed - a problem for
small memory platforms.
FUJITA Tomonori [Sat, 14 Aug 2010 07:02:37 +0000 (16:02 +0900)]
MIPS: TX49xx: Rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN
Architectures need to set ARCH_DMA_MINALIGN to the minimum DMA
alignment (commit a6eb9fe105d5de0053b261148cee56c94b4720ca). Defining
ARCH_KMALLOC_MINALIGN doesn't work anymore.
Bernhard Walle [Fri, 3 Sep 2010 08:15:34 +0000 (10:15 +0200)]
MIPS: N32: Fix getdents64 syscall for n32
Commit 31c984a5acabea5d8c7224dc226453022be46f33 introduced a new syscall
getdents64. However, in the syscall table, the new syscall still refers to
the old getdents which doesn't work.
The problem appeared with a system that uses the eglibc 2.12-r11187 (that
utilizes that new syscall) is very confused. The fix has been tested with
that eglibc version.
Signed-off-by: Bernhard Walle <walle@corscience.de>
To: linux-mips@linux-mips.org Cc: ddaney@caviumnetworks.com Cc: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1567/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
MIPS: PNX8550: Sort out machine halt, restart and powerdown functions.
No rubbish printks - those belong to userspace. The halt function now
actually halts the system and the poweroff function was deleted because
it didn't actually power down the system.
MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.
This only matters for ISA devices with a 24-bit DMA limit or for devices
with a 32-bit DMA limit on systems with ZONE_DMA32 enabled. The latter
currently only affects 32-bit PCI cards on Sibyte-based systems with more
than 1GB RAM installed.
_TIF_WORK_MASK false had _TIF_SYSCALL_AUDIT set. If a thread's
_TIF_SYSCALL_AUDIT is ever set this will lead to an endless loop on the
way out from a syscall.
Currently this is only a theoretic bug as init/Kconfig doesn't allow
AUDIT_SYSCALL to be enabled for MIPS.
Andreas Bießmann [Wed, 11 Aug 2010 16:49:53 +0000 (18:49 +0200)]
MIPS: Octeon: Determine if helper needs to be built
This patch adds an config switch to determine if we need to build some
workaround helper files.
The staging driver octeon-ethernet references some symbols which are only
built when PCI is enabled. The new config switch enables these symbols in
bothe cases.
Signed-off-by: Andreas Bießmann <biessmann@corscience.de>
To: linux-kernel@vger.kernel.org Cc: Andreas Bießmann <biessmann@corscience.de> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1543/ Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Deng-Cheng Zhu [Wed, 9 Jun 2010 04:35:25 +0000 (12:35 +0800)]
MIPS: Use generic atomic64 for 32-bit kernels
The 64-bit kernel has already had its atomic64 functions. Except for that,
we use the generic spinlocked version. The atomic64 types and related
functions are needed for the Linux performance counter subsystem.
Julia Lawall [Thu, 5 Aug 2010 20:17:22 +0000 (22:17 +0200)]
MIPS: kspd: Adjust confusing if indentation
Indent the branch of an if.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@
(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
if (p1[0].column == p2[0].column):
cocci.print_main("branch",p1)
cocci.print_secs("after",p2)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
To: linux-mips@linux-mips.org
To: linux-kernel@vger.kernel.org
To: kernel-janitors@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1539/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Both python_scripting_ops and perl_scripting_ops have two global definitions.
One in trace-event-scripting.c and one in their respective scripting-engine
modules.
The issue is that depending on the linker order one definition or the other
is chosen. One is uninitialized (bss), while the other is initialized. If
the uninitialized version is chosen, then perf does not function properly.
This patch fixes this by adding the extern prefix to the definitions in
trace-event-scripting.c.
Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4c97e41a.078fd80a.7a8b.3cc9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
writeback: always use sb->s_bdi for writeback purposes
We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags. We're also using for tracking dirty inodes for
writeback.
To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.
Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.
The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently. For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Chris Wilson [Sun, 26 Sep 2010 19:50:05 +0000 (20:50 +0100)]
drm/i915: Sanity check pread/pwrite
Move the access control up from the fast paths, which are no longer
universally taken first, up into the caller. This then duplicates some
sanity checking along the slow paths, but is much simpler.
Tracked as CVE-2010-2962.
Reported-by: Kees Cook <kees@ubuntu.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
hwmon: f71882fg: use a muxed resource lock for the Super I/O port
Sleep while acquiring a resource lock on the Super I/O port. This should
prevent collisions from causing the hardware probe to fail with -EBUSY.
Signed-off-by: Giel van Schijndel <me@mortis.eu> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Keith Packard [Sun, 3 Oct 2010 07:33:06 +0000 (00:33 -0700)]
drm/i915: Use pipe state to tell when pipe is off
Instead of waiting for the display line value to settle, we can simply
wait for the pipe configuration register 'state' bit to turn off.
Contrarywise, disabling the plane will not cause the display line
value to stop changing, so instead we wait for the vblank interrupt
bit to get set. And, we only do this when we're not about to wait for
the pipe to turn off.
Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Keith Packard [Sun, 3 Oct 2010 07:33:05 +0000 (00:33 -0700)]
drm/i915: vblank status not valid while training display port
While the display port is in training mode, vblank interrupts don't
occur. Because we have to wait for the display port output to turn on
before starting the training sequence, enable the output in 'normal'
mode so that we can tell when a vblank has occurred, then start the
training sequence.
Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Sinan Akman [Sun, 3 Oct 2010 03:28:29 +0000 (21:28 -0600)]
of/spi: Fix OF-style driver binding of spi devices
This patch adds the OF hook to the spi core so that devices
can automatically be registered based on device tree data. This fixes
a problem with spi devices not binding to drivers after the cleanup of
the spi & i2c binding code.
Signed-off-by: Sinan Akman <sinan@writeme.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Roel Kluin [Sat, 2 Oct 2010 12:03:32 +0000 (14:03 +0200)]
spi: spi-gpio.c tests SPI_MASTER_NO_RX bit twice, but not SPI_MASTER_NO_TX
The SPI_MASTER_NO_TX bit (can't do buffer write) wasn't tested. This
code was introduced in commit 3c8e1a84 (spi/spi-gpio: add support for
controllers without MISO or MOSI pin). This patch fixes a bug in
choosing which transfer ops to use.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Axel Lin [Fri, 1 Oct 2010 05:56:27 +0000 (13:56 +0800)]
regulator: max8649 - fix setting extclk_freq
The SYNC bits are BIT6 and BIT7 of MAX8649_SYNC register.
pdata->extclk_freq could be [0|1|2].
(MAX8649_EXTCLK_26MHZ|MAX8649_EXTCLK_13MHZ|MAX8649_EXTCLK_19MHZ)
It requires to left shift 6 bits to properly set extclk_freq.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: prevent infinite recursion in cifs_reconnect_tcon
cifs: set backing_dev_info on new S_ISREG inodes
David Howells [Fri, 1 Oct 2010 09:31:03 +0000 (10:31 +0100)]
MN10300: Fix flush_icache_range()
flush_icache_range() is given virtual addresses to describe the region. It
deals with these by attempting to translate them through the current set of
page tables.
This is fine for userspace memory and vmalloc()'d areas as they are governed by
page tables. However, since the regions above 0x80000000 aren't translated
through the page tables by the MMU, the kernel doesn't bother to set up page
tables for them (see paging_init()).
This means flush_icache_range() as it stands cannot be used to flush regions of
the VM area between 0x80000000 and 0x9fffffff where the kernel resides if the
data cache is operating in WriteBack mode.
To fix this, make flush_icache_range() first check for addresses in the upper
half of VM space and deal with them appropriately, before dealing with any
range in the page table mapped area.
Ordinarily, this is not a problem, but it has the capacity to make kprobes and
kgdb malfunction. It should not affect gdbstub, signal frame setup or module
loading as gdb has its own flush functions, and the others take place in the
page table mapped area only.
Linus Torvalds [Fri, 1 Oct 2010 17:58:31 +0000 (10:58 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
vmwgfx: Fix fb VRAM pinning failure due to fragmentation
vmwgfx: Remove initialisation of dev::devname
vmwgfx: Enable use of the vblank system
vmwgfx: vt-switch (master drop) fixes
drm/vmwgfx: Fix breakage introduced by commit "drm: block userspace under allocating buffer and having drivers overwrite it (v2)"
drm: Hold the mutex when dropping the last GEM reference (v2)
drm/gem: handlecount isn't really a kref so don't make it one.
drm: i810/i830: fix locked ioctl variant
drm/radeon/kms: add quirk for MSI K9A2GM motherboard
drm/radeon/kms: fix potential segfault in r600_ioctl_wait_idle
drm: Prune GEM vma entries
drm/radeon/kms: fix up encoder info messages for DFP6
drm/radeon: fix PCI ID 5657 to be an RV410