Krzysztof Helt [Tue, 4 Mar 2008 22:28:39 +0000 (14:28 -0800)]
tridentfb: resource management fixes in probe function
Correct error paths in probe function.
The probe function enables mmio mode so it important to disable the mmio
mode before exiting the probe function. Otherwise, the console is left in
unusable state (garbled fonts at least, lock up at worst).
[akpm@linux-foundation.org: cleanups] Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Balbir Singh [Tue, 4 Mar 2008 22:28:39 +0000 (14:28 -0800)]
Memory controller: rename to Memory Resource Controller
Rename Memory Controller to Memory Resource Controller. Reflect the same
changes in the CONFIG definition for the Memory Resource Controller. Group
together the config options for Resource Counters and Memory Resource
Controller.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move kprobes examples from Documentation/kprobes.txt to under samples/.
Patch originally by Randy Dunlap.
o Updated the patch to apply on 2.6.25-rc3
o Modified examples code to build on multiple architectures. Currently,
the kprobe and jprobe examples code works for x86 and powerpc
o Cleaned up unneeded #includes
o Cleaned up Kconfig per Sam Ravnborg's suggestions to fix build break
on archs that don't have kretprobes
o Implemented suggestions by Mathieu Desnoyers on CONFIG_KRETPROBES
o Included Andrew Morton's cleanup based on x86-git
o Modified kretprobe_example to act as a arch-agnostic module to
determine routine execution times:
Use 'modprobe kretprobe_example func=<func_name>' to determine
execution time of func_name in nanoseconds.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant
architectures with kprobes support. This facilitates easy handling of
in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on
kretprobes being present in the kernel.
Thanks to Sam Ravnborg for helping make the patch more lean.
Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies.
Samuel Thibault [Tue, 4 Mar 2008 22:28:36 +0000 (14:28 -0800)]
VT notifier fix for VT switch
VT notifier callbacks need to be aware of console switches. This is already
partially done from console_callback(), but at that time fg_console, cursor
positions, etc. are not yet updated and hence screen readers fetch the old
values.
This adds an update notify after all of the values are updated in
redraw_screen(vc, 1).
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Tue, 4 Mar 2008 22:28:35 +0000 (14:28 -0800)]
alloc_percpu() fails to allocate percpu data
Some oprofile results obtained while using tbench on a 2x2 cpu machine were
very surprising.
For example, loopback_xmit() function was using high number of cpu cycles
to perform the statistic updates, supposed to be real cheap since they use
percpu data
struct pcpu_lstats is a small structure containing two longs. It appears
that on my 32bits platform, alloc_percpu(8) allocates a single cache line,
instead of giving to each cpu a separate cache line.
Using the following patch gave me impressive boost in various benchmarks
( 6 % in tbench)
(all percpu_counters hit this bug too)
Long term fix (ie >= 2.6.26) would be to let each CPU allocate their own
block of memory, so that we dont need to roudup sizes to L1_CACHE_BYTES, or
merging the SGI stuff of course...
Note : SLUB vs SLAB is important here to *show* the improvement, since they
dont have the same minimum allocation sizes (8 bytes vs 32 bytes). This
could very well explain regressions some guys reported when they switched
to SLUB.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 4 Mar 2008 22:28:33 +0000 (14:28 -0800)]
vfs: fix NULL pointer dereference in fsync_buffers_list()
Fix NULL pointer dereference in fsync_buffers_list() introduced by recent fix
of races in private_list handling. Since bh->b_assoc_map has been cleared in
__remove_assoc_queue() we should really use original value stored in the
'mapping' variable.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Riesen [Tue, 4 Mar 2008 22:28:31 +0000 (14:28 -0800)]
Fix "Malformed early option 'loglevel'"
Keith Mannthey said:
The parameter hotadd_percent is setup right but there is a "Malformed
early option 'numa'" message.
Rusty Russell said:
This happens when the function registered with early_param() returns
non-zero. __setup() functions return 1 if OK, module_param() and
early_param() return 0 or a -ve error code.
Acked-by: Yinghai Lu <yhlu.kernel@gmai.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Keith Mannthey <kmannth@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Tue, 4 Mar 2008 22:28:30 +0000 (14:28 -0800)]
core dump: user_regset writeback
This makes the user_regset-based core dump code call user_regset writeback
hooks when available. This is necessary groundwork to allow IA64 to set
CORE_DUMP_USE_REGSET.
Cc: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 4 Mar 2008 22:28:27 +0000 (14:28 -0800)]
gpio: <linux/gpio.h> and "no GPIO support here" stubs
Add a <linux/gpio.h> defining fail/warn stubs for GPIO calls on platforms that
don't support the GPIO programming interface. That includes the arch-specific
implementation glue otherwise.
This facilitates a new model for GPIO usage: drivers that can use GPIOs if
they're available, but don't require them. One example of such a driver is
NAND driver for various FreeScale chips. On platforms update with GPIO
support, they can be used instead of a worst-case delay to verify that the
BUSY signal is off.
(Also includes a couple minor unrelated doc updates.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Byron Bradley [Tue, 4 Mar 2008 22:28:25 +0000 (14:28 -0800)]
rtc: add support for the S-35390A RTC chip
This adds basic get/set time support for the Seiko Instruments S-35390A.
This chip communicates using I2C and is used on the QNAP TS-109/TS-209 NAS
devices.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Byron Bradley <byron.bbradley@gmail.com> Acked-by: Jean Delvare <khali@linux-fr.org> Acked-by: David Brownell <david-b@pacbell.net> Tested-by: Tim Ellis <tim@ngndg.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Balbir Singh [Tue, 4 Mar 2008 22:28:24 +0000 (14:28 -0800)]
Memory Resource Controller use strstrip while parsing arguments
The memory controller has a requirement that while writing values, we need
to use echo -n. This patch fixes the problem and makes the UI more consistent.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Nilsson [Tue, 4 Mar 2008 22:28:23 +0000 (14:28 -0800)]
CRIS v10: Include mm.h instead of vmstat.h in kernel/time.c
Commit 2f569afd9ced9ebec9a6eb3dbf6f83429be0a7b4
(CONFIG_HIGHPTE vs. sub-page page tables) introduced use of
inc_zone_page_state and dec_zone_page_state in include/linux/mm.h.
Those are defined in include/linux/vmstat.h, but after it includes
mm.h, making it impossible to include vmstat.h since inc_zone_page_state
and dec_zone_page_state then would be undefined.
arch/cris/arch-v10/kernel/time.c does just this, which makes the
CRIS v10 build break with the following error:
...
CC arch/cris/arch-v10/kernel/time.o
In file included from include/linux/vmstat.h:7,
from arch/cris/arch-v10/kernel/time.c:17:
include/linux/mm.h: In function 'pgtable_page_ctor':
include/linux/mm.h:902: error: implicit declaration of function 'inc_zone_page_state'
include/linux/mm.h: In function 'pgtable_page_dtor':
include/linux/mm.h:908: error: implicit declaration of function 'dec_zone_page_state'
make[2]: *** [arch/cris/arch-v10/kernel/time.o] Error 1
make[1]: *** [arch/cris/arch-v10/kernel] Error 2
make: *** [sub-make] Error 2
...
By changing kernel/time.c to include linux/mm.h, the build succeeds.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Mikael Starvik <mikael.starvik@axis.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Tue, 4 Mar 2008 22:28:20 +0000 (14:28 -0800)]
update checkpatch.pl to version 0.15
This version brings a number of minor fixes updating the type detector and
the unary tracker. It also brings a few small fixes for false positives.
It also reverts the --file warning. Of note:
- limit CVS checks to added lines
- improved type detections
- fixes to the unary tracker
Andy Whitcroft (13):
Version: 0.15
EXPORT_SYMBOL checks need to accept array variables
export checks must match DECLARE_foo and LIST_HEAD
possible types: cleanup debugging missing line
values: track values through preprocessor conditional paths
typeof is actually a type
possible types: detect definitions which cross lines
values: include line numbers on value debug information
values: ensure we find correctly record pending brackets
values: simplify the brace history stack
CVS keyword checks should only apply to added lines
loosen spacing for comments
allow braces for single statement blocks with multiline conditionals
Li Zefan [Tue, 4 Mar 2008 22:28:19 +0000 (14:28 -0800)]
cgroup: fix default notify_on_release setting
The documentation says the default value of notify_on_release of a child
cgroup is inherited from its parent, which is reasonable, but the
implementation just sets the flag disabled.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 4 Mar 2008 19:27:32 +0000 (11:27 -0800)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
[PATCH] fs/ocfs2/aops.c: Correct use of ! and &
[2.6 patch] ocfs2: make dlm_do_assert_master() static
[2.6 patch] make ocfs2_downconvert_thread() static
[2.6 patch] fs/ocfs2/: possible cleanups
[PATCH] ocfs2: le*_add_cpu conversion
ocfs2: Fix writeout in ocfs2_data_convert_worker()
ocfs2: Enable localalloc for local mounts
Linus Torvalds [Tue, 4 Mar 2008 19:26:01 +0000 (11:26 -0800)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
ioat: fix 'ack' handling, driver must ensure that 'ack' is zero
dmaengine: fix sparse warning
fsldma: do not cleanup descriptors in hardirq context
dmaengine: add driver for Freescale MPC85xx DMA controller
Linus Torvalds [Tue, 4 Mar 2008 17:22:32 +0000 (09:22 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86/xen: fix DomU boot problem
x86: not set node to cpu_to_node if the node is not online
x86, i387: fix ptrace leakage using init_fpu()
Linus Torvalds [Tue, 4 Mar 2008 17:22:05 +0000 (09:22 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
x86: disable KVM for Voyager and friends
KVM: VMX: Avoid rearranging switched guest msrs while they are loaded
KVM: MMU: Fix race when instantiating a shadow pte
KVM: Route irq 0 to vcpu 0 exclusively
KVM: Avoid infinite-frequency local apic timer
KVM: make MMU_DEBUG compile again
KVM: move alloc_apic_access_page() outside of non-preemptable region
KVM: SVM: fix Windows XP 64 bit installation crash
KVM: remove the usage of the mmap_sem for the protection of the memory slots.
KVM: emulate access to MSR_IA32_MCG_CTL
KVM: Make the supported cpuid list a host property rather than a vm property
KVM: Fix kvm_arch_vcpu_ioctl_set_sregs so that set_cr0 works properly
KVM: SVM: set NM intercept when enabling CR0.TS in the guest
KVM: SVM: Fix lazy FPU switching
Dan Williams [Sat, 1 Mar 2008 14:51:17 +0000 (07:51 -0700)]
fsldma: do not cleanup descriptors in hardirq context
"Cleaning" descriptors involves calling pending callbacks and clients
assume that their callback will only ever happen in softirq context.
Delay cleanup to the tasklet.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Zhang Wei <wei.zhang@freescale.com>
Zhang Wei [Sat, 1 Mar 2008 14:42:48 +0000 (07:42 -0700)]
dmaengine: add driver for Freescale MPC85xx DMA controller
The driver implements DMA engine API for Freescale MPC85xx DMA controller,
which could be used by devices in the silicon. The driver supports the
Basic mode of Freescale MPC85xx DMA controller. The MPC85xx processors
supported include MPC8540/60, MPC8555, MPC8548, MPC8641 and so on.
The MPC83xx(MPC8349, MPC8360) are also supported.
[kamalesh@linux.vnet.ibm.com: build fix]
[dan.j.williams@intel.com: merge mm fixes, rebase on async_tx-2.6.25] Signed-off-by: Zhang Wei <wei.zhang@freescale.com> Signed-off-by: Ebony Zhu <ebony.zhu@freescale.com> Acked-by: Kumar Gala <galak@gate.crashing.org> Cc: Shannon Nelson <shannon.nelson@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Peter Zijlstra [Mon, 25 Feb 2008 16:34:02 +0000 (17:34 +0100)]
sched: revert load_balance_monitor() changes
The following commits cause a number of regressions:
commit 58e2d4ca581167c2a079f4ee02be2f0bc52e8729
Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Date: Fri Jan 25 21:08:00 2008 +0100
sched: group scheduling, change how cpu load is calculated
commit 6b2d7700266b9402e12824e11e0099ae6a4a6a79
Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Date: Fri Jan 25 21:08:00 2008 +0100
sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups
Namely:
- very frequent wakeups on SMP, reported by PowerTop users.
- cacheline trashing on (large) SMP
- some latencies larger than 500ms
While there is a mergeable patch to fix the latter, the former issues
are not fixable in a manner suitable for .25 (we're at -rc3 now).
Ian Campbell [Thu, 28 Feb 2008 23:16:49 +0000 (23:16 +0000)]
x86/xen: fix DomU boot problem
Construct Xen guest e820 map with a hole between 640K-1M.
It's pure luck that Xen kernels have gotten away with it in the past.
The patch below seems like the right thing to do. It certainly boots in
a domU without the DMI problem (without any of the other related patches
such as Alexander's).
Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de>
Yinghai Lu [Tue, 19 Feb 2008 23:35:54 +0000 (15:35 -0800)]
x86: not set node to cpu_to_node if the node is not online
resolve boot problem reported by Mel Gorman:
http://lkml.org/lkml/2008/2/13/404
init_cpu_to_node will use cpu->apic (from MADT or mptable) and
apic->node(from SRAT or AMD config space with k8_bus_64.c) to have
cpu->node mapping, and later identify_cpu will overwrite them
again...(with nearby_node...)
this patch checks if the node is online, otherwise it will not
update cpu_node map. so keep cpu_node map to online node before
identify_cpu..., to prevent possible error.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de>
Current usage of unlazy_fpu() in ptrace specific routines is wrong.
unlazy_fpu() will not init fpu if the task never used math. So the
ptrace calls can expose the parent tasks FPU data in some cases.
Replace it with the init_fpu() which will init the math state, if the
task never used math before.
Linus Torvalds [Tue, 4 Mar 2008 16:08:05 +0000 (08:08 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix blkdev_issue_flush() not detecting and passing EOPNOTSUPP back
block: fix shadowed variable warning in blk-map.c
block: remove extern on function definition
cciss: remove READ_AHEAD define and use block layer defaults
make cdrom.c:check_for_audio_disc() static
block/genhd.c: proper externs
unexport blk_rq_map_user_iov
unexport blk_{get,put}_queue
block/genhd.c: cleanups
proper prototype for blk_dev_init()
block/blk-tag.c should #include "blk.h"
Fix DMA access of block device in 64-bit kernel on some non-x86 systems with 4GB or upper 4GB memory
block: separate out padding from alignment
block: restore the meaning of rq->data_len to the true data length
resubmit: cciss: procfs updates to display info about many
splice: only return -EAGAIN if there's hope of more data
block: fix kernel-docbook parameters and files
Greg Ungerer [Tue, 4 Mar 2008 06:52:01 +0000 (16:52 +1000)]
m68knommu: fix fec driver interrupt races
The FEC driver has a common interrupt handler for all interrupt event
types. It is raised on a number of distinct interrupt vectors.
This handler can't be re-entered while processing an interrupt, so
make sure all requested vectors are flagged as IRQF_DISABLED.
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] remove unused variable
[CIFS] consolidate duplicate code in posix/unix inode handling
[CIFS] fix build break when proc disabled
[CIFS] factoring out common code in get_inode_info functions
[CIFS] fix prepath conversion when server supports posix paths
[CIFS] Only convert / when server does not support posix paths
[CIFS] Fix mixed case name in structure dfs_info3_param
[CIFS] fixup prefixpaths which contain multiple path components
[CIFS] fix typo
[CIFS] patch to fix incorrect encoding of number of aces on set mode
[CIFS] Fix typo in quota operations
[CIFS] clean up some hard to read ifdefs
[CIFS] reduce checkpatch warnings
[CIFS] fix warning in cifs_spnego.c
Roland McGrath [Tue, 4 Mar 2008 04:22:05 +0000 (20:22 -0800)]
freezer vs stopped or traced
This changes the "freezer" code used by suspend/hibernate in its treatment
of tasks in TASK_STOPPED (job control stop) and TASK_TRACED (ptrace) states.
As I understand it, the intent of the "freezer" is to hold all tasks
from doing anything significant. For this purpose, TASK_STOPPED and
TASK_TRACED are "frozen enough". It's possible the tasks might resume
from ptrace calls (if the tracer were unfrozen) or from signals
(including ones that could come via timer interrupts, etc). But this
doesn't matter as long as they quickly block again while "freezing" is
in effect. Some minor adjustments to the signal.c code make sure that
try_to_freeze() very shortly follows all wakeups from both kinds of
stop. This lets the freezer code safely leave stopped tasks unmolested.
Changing this fixes the longstanding bug of seeing after resuming from
suspend/hibernate your shell report "[1] Stopped" and the like for all
your jobs stopped by ^Z et al, as if you had freshly fg'd and ^Z'd them.
It also removes from the freezer the arcane special case treatment for
ptrace'd tasks, which relied on intimate knowledge of ptrace internals.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 20 Feb 2008 17:20:08 +0000 (09:20 -0800)]
x86: disable KVM for Voyager and friends
Most classic Pentiums don't have hardware virtualization extension,
and building kvm with Voyager, Visual Workstation, or NUMAQ
generates spurious failures.
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Avi Kivity [Wed, 27 Feb 2008 14:06:57 +0000 (16:06 +0200)]
KVM: VMX: Avoid rearranging switched guest msrs while they are loaded
KVM tries to run as much as possible with the guest msrs loaded instead of
host msrs, since switching msrs is very expensive. It also tries to minimize
the number of msrs switched according to the guest mode; for example,
MSR_LSTAR is needed only by long mode guests. This optimization is done by
setup_msrs().
However, we must not change which msrs are switched while we are running with
guest msr state:
- switch to guest msr state
- call setup_msrs(), removing some msrs from the list
- switch to host msr state, leaving a few guest msrs loaded
An easy way to trigger this is to kexec an x86_64 linux guest. Early during
setup, the guest will switch EFER to not include SCE. KVM will stop saving
MSR_LSTAR, and on the next msr switch it will leave the guest LSTAR loaded.
The next host syscall will end up in a random location in the kernel.
Fix by reloading the host msrs before changing the msr list.
Avi Kivity [Tue, 26 Feb 2008 20:12:10 +0000 (22:12 +0200)]
KVM: MMU: Fix race when instantiating a shadow pte
For improved concurrency, the guest walk is performed concurrently with other
vcpus. This means that we need to revalidate the guest ptes once we have
write-protected the guest page tables, at which point they can no longer be
modified.
The current code attempts to avoid this check if the shadow page table is not
new, on the assumption that if it has existed before, the guest could not have
modified the pte without the shadow lock. However the assumption is incorrect,
as the racing vcpu could have modified the pte, then instantiated the shadow
page, before our vcpu regains control:
vcpu0 vcpu1
fault
walk pte
modify pte
fault in same pagetable
instantiate shadow page
lookup shadow page
conclude it is old
instantiate spte based on stale guest pte
We could do something clever with generation counters, but a test run by
Marcelo suggests this is unnecessary and we can just do the revalidation
unconditionally. The pte will be in the processor cache and the check can
be quite fast.
Avi Kivity [Mon, 25 Feb 2008 08:28:31 +0000 (10:28 +0200)]
KVM: Route irq 0 to vcpu 0 exclusively
Some Linux versions allow the timer interrupt to be processed by more than
one cpu, leading to hangs due to tsc instability. Work around the issue
by only disaptching the interrupt to vcpu 0.
Problem analyzed (and patch tested) by Sheng Yang.
Joerg Roedel [Wed, 13 Feb 2008 15:30:28 +0000 (16:30 +0100)]
KVM: SVM: fix Windows XP 64 bit installation crash
While installing Windows XP 64 bit wants to access the DEBUGCTL and the last
branch record (LBR) MSRs. Don't allowing this in KVM causes the installation to
crash. This patch allow the access to these MSRs and fixes the issue.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Izik Eidus [Sun, 10 Feb 2008 16:04:15 +0000 (18:04 +0200)]
KVM: remove the usage of the mmap_sem for the protection of the memory slots.
This patch replaces the mmap_sem lock for the memory slots with a new
kvm private lock, it is needed beacuse untill now there were cases where
kvm accesses user memory while holding the mmap semaphore.
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
Harvey Harrison [Tue, 4 Mar 2008 10:31:22 +0000 (11:31 +0100)]
block: fix shadowed variable warning in blk-map.c
Introduced between 2.6.25-rc2 and -rc3
block/blk-map.c:154:14: warning: symbol 'bio' shadows an earlier one
block/blk-map.c:110:13: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Mike Miller [Tue, 4 Mar 2008 10:25:15 +0000 (11:25 +0100)]
cciss: remove READ_AHEAD define and use block layer defaults
This patch removes the #define READ_AHEAD 1024 from the driver and uses the
block layer defaults, instead. We have found that under certain workloads
the setting can cause a disk connected to the e200 controller to go offline.
If the disk hiccups the link may try to downshift but the controller is
never notified that the link successfully completed the renegotiation.
We've also found that performance using the block layer default of 32 pages
was on par with the 1024 setting. We tried setting it to zero at one time
based on info from our firmware guys but that killed performance. Turns out
we were talking about 2 different read ahead settings.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Tue, 4 Mar 2008 10:18:17 +0000 (11:18 +0100)]
block: separate out padding from alignment
Block layer alignment was used for two different purposes - memory
alignment and padding. This causes problems in lower layers because
drivers which only require memory alignment ends up with adjusted
rq->data_len. Separate out padding such that padding occurs iff
driver explicitly requests it.
FUJITA Tomonori [Tue, 4 Mar 2008 10:17:11 +0000 (11:17 +0100)]
block: restore the meaning of rq->data_len to the true data length
The meaning of rq->data_len was changed to the length of an allocated
buffer from the true data length. It breaks SG_IO friends and
bsg. This patch restores the meaning of rq->data_len to the true data
length and adds rq->extra_len to store an extended length (due to
drain buffer and padding).
This patch also removes the code to update bio in blk_rq_map_user
introduced by the commit 40b01b9bbdf51ae543a04744283bf2d56c4a6afa.
The commit adjusts bio according to memory alignment
(queue_dma_alignment). However, memory alignment is NOT padding
alignment. This adjustment also breaks SG_IO friends and bsg. Padding
alignment needs to be fixed in a proper way (by a separate patch).
Mike Miller [Thu, 21 Feb 2008 07:54:03 +0000 (08:54 +0100)]
resubmit: cciss: procfs updates to display info about many
volumes
This patch allows us to display information about all of the logical volumes
configured on a particular controller without stepping on memory even when
there are many volumes (128 or more) configured.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 20 Feb 2008 09:34:51 +0000 (10:34 +0100)]
splice: only return -EAGAIN if there's hope of more data
sys_tee() currently is a bit eager in returning -EAGAIN, it may do so
even if we don't have a chance of anymore data becoming available. So
improve the logic and only return -EAGAIN if we have an attached writer
to the input pipe.
Reported by Johann Felix Soden <johfel@gmx.de> and
Patrick McManus <mcmanus@ducksong.com>.
Tested-by: Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Julia Lawall [Tue, 26 Feb 2008 20:45:56 +0000 (21:45 +0100)]
[PATCH] fs/ocfs2/aops.c: Correct use of ! and &
In commit e6bafba5b4765a5a252f1b8d31cbf6d2459da337, a bug was fixed that
involved converting !x & y to !(x & y). The code below shows the same
pattern, and thus should perhaps be fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Adrian Bunk [Mon, 28 Jan 2008 22:11:41 +0000 (00:11 +0200)]
[2.6 patch] fs/ocfs2/: possible cleanups
This patch contains the following cleanups that are now possible:
- make the following needlessly global functions static:
- dlmglue.c:ocfs2_process_blocked_lock()
- heartbeat.c:ocfs2_node_map_init()
- #if 0 the following unused global function plus support functions:
- heartbeat.c:ocfs2_node_map_is_only()
Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Mark Fasheh [Fri, 29 Feb 2008 01:16:03 +0000 (17:16 -0800)]
ocfs2: Fix writeout in ocfs2_data_convert_worker()
Commit f1f540688eae66c274ff1c1133b5d9c687b28f58 "optimized"
ocfs2_data_convert_worker() to "only do work for regular files".
Unfortunately, I left out a '!', which casued it to *skip* regular files.
This was hidden from testing until recently because the default data
journaling mode (data=ordered) doesn't exercise this code.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
Sunil Mushran [Wed, 6 Feb 2008 20:11:17 +0000 (12:11 -0800)]
ocfs2: Enable localalloc for local mounts
Commit 2fbe8d1ebe004425b4f7b8bba345623d2280be82 disabled localalloc
for local mounts. This caused issues as ocfs2 uses localalloc to
provide write locality. This patch enables localalloc for local mounts.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Linus Torvalds [Mon, 3 Mar 2008 23:00:09 +0000 (15:00 -0800)]
Merge branch 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm
* 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
slub: fix possible NULL pointer dereference
slub: Add kmalloc_large_node() to support kmalloc_node fallback
slub: look up object from the freelist once
slub: Fix up comments
slub: Rearrange #ifdef CONFIG_SLUB_DEBUG in calculate_sizes()
slub: Remove BUG_ON() from ksize and omit checks for !SLUB_DEBUG
slub: Use the objsize from the kmem_cache_cpu structure
slub: Remove useless checks in alloc_debug_processing
slub: Remove objsize check in kmem_cache_flags()
slub: rename slab_objects to show_slab_objects
Revert "unique end pointer" patch
slab: avoid double initialization & do initialization in 1 place
p->exit_state != 0 doesn't mean this process is dead, it may have
sub-threads. Change the code to use "p->exit_state && thread_group_empty(p)"
instead.
Without this patch, ^Z doesn't deliver SIGTSTP to the foreground process
if the main thread has exited.
However, the new check is not perfect either. There is a window when
exit_notify() drops tasklist and before release_task(). Suppose that
the last (non-leader) thread exits. This means that entire group exits,
but thread_group_empty() is not true yet.
As Eric pointed out, is_global_init() is wrong as well, but I did not
dare to do other changes.
Just for the record, has_stopped_jobs() is absolutely wrong too. But we
can't fix it now, we should first fix SIGNAL_STOP_STOPPED issues.
Even with this patch ^Z doesn't play well with the dead main thread.
The task is stopped correctly but do_wait(WSTOPPED) won't see it. This
is another unrelated issue, will be (hopefully) fixed separately.
Samuel Thibault [Mon, 3 Mar 2008 01:23:49 +0000 (01:23 +0000)]
Fix default compose table initialization
Oddly enough, unsigned int c = '\300'; puts a "negative" value in c, not
0300... This fixes the default unicode compose table by using integers
instead of character constants.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Sun, 2 Mar 2008 20:28:24 +0000 (23:28 +0300)]
slub: fix possible NULL pointer dereference
This patch fix possible NULL pointer dereference if kzalloc
failed. To be able to return proper error code the function
return type is changed to ssize_t (according to callees and
sysfs definitions).
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>
slub: Remove BUG_ON() from ksize and omit checks for !SLUB_DEBUG
The BUG_ONs are useless since the pointer derefs will lead to
NULL deref errors anyways. Some of the checks are not necessary
if no debugging is possible.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
There is no page->offset anymore and also no associated limit on the number
of objects. The page->offset field was removed for 2.6.24. So the check
in kmem_cache_flags() is now also obsolete (should have been dropped
earlier, somehow a hunk vanished).
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-by: Christoph Lameter <clameter@sgi.com>
This only made sense for the alternate fastpath which was reverted last week.
Mathieu is working on a new version that addresses the fastpath issues but that
new code first needs to go through mm and it is not clear if we need the
unique end pointers with his new scheme.
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>