Instead of relying on static allocations for the sched_domain and
sched_group trees, dynamically allocate and RCU free them.
Allocating this dynamically also allows for some build_sched_groups()
simplification since we can now (like with other simplifications) rely
on the sched_domain tree instead of hard-coded knowledge.
One tricky to note is that detach_destroy_domains() needs to hold
rcu_read_lock() over the entire tear-down, per-cpu is not sufficient
since that can lead to partial sched_group existance (could possibly
be solved by doing the tear-down backwards but this is much more
robust).
A concequence of the above is that we can no longer print the
sched_domain debug stuff from cpu_attach_domain() since that might now
run with preemption disabled (due to classic RCU etc.) and
sched_domain_debug() does some GFP_KERNEL allocations.
Another thing to note is that we now fully rely on normal RCU and not
RCU-sched, this is because with the new and exiting RCU flavours we
grew over the years BH doesn't necessarily hold off RCU-sched grace
periods (-rt is known to break this). This would in fact already cause
us grief since we do sched_domain/sched_group iterations from softirq
context.
This patch is somewhat larger than I would like it to be, but I didn't
find any means of shrinking/splitting this.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110407122942.245307941@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 7 Apr 2011 12:09:49 +0000 (14:09 +0200)]
sched: Simplify sched_groups_power initialization
Again, instead of relying on knowing the possible domains and their
order, simply rely on the sched_domain tree and whatever domains are
present in there to initialize the sched_group cpu_power.
Note: we need to iterate the CPU mask backwards because of the
cpumask_first() condition for iterating up the tree. By iterating the
mask backwards we ensure all groups of a domain are set-up before
starting on the parent groups that rely on its children to be
completely done.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110407122942.187335414@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 7 Apr 2011 12:09:48 +0000 (14:09 +0200)]
sched: Simplify finding the lowest sched_domain
Instead of relying on knowing the build order and various CONFIG_
flags simply remember the bottom most sched_domain when we created the
domain hierarchy.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110407122942.134511046@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 7 Apr 2011 12:09:47 +0000 (14:09 +0200)]
sched: Simplify sched_group creation
Instead of calling build_sched_groups() for each possible sched_domain
we might have created, note that we can simply iterate the
sched_domain tree and call it for each sched_domain present.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110407122942.077862519@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 7 Apr 2011 12:09:45 +0000 (14:09 +0200)]
sched: Change NODE sched_domain group creation
The NODE sched_domain is 'special' in that it allocates sched_groups
per CPU, instead of sharing the sched_groups between all CPUs.
While this might have some benefits on large NUMA and avoid remote
memory accesses when iterating the sched_groups, this does break
current code that assumes sched_groups are shared between all
sched_domains (since the dynamic cpu_power patches).
So refactor the NODE groups to behave like all other groups.
(The ALLNODES domain again shared its groups across the CPUs for some
reason).
If someone does measure a performance decrease due to this change we
need to revisit this and come up with another way to have both dynamic
cpu_power and NUMA work nice together.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110407122941.978111700@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 7 Apr 2011 12:09:44 +0000 (14:09 +0200)]
sched: Simplify build_sched_groups()
Notice that the mask being computed is the same as the domain span we
just computed. By using the domain_span we can avoid some mask
allocations and computations.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110407122941.925028189@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 7 Apr 2011 12:09:43 +0000 (14:09 +0200)]
sched: Simplify ->cpu_power initialization
The code in update_group_power() does what init_sched_groups_power()
does and more, so remove the special init_ code and call the generic
code instead.
Also move the sd->span_weight initialization because
update_group_power() needs it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110407122941.875856012@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ken Chen [Fri, 8 Apr 2011 19:20:16 +0000 (12:20 -0700)]
sched: Fix erroneous all_pinned logic
The scheduler load balancer has specific code to deal with cases of
unbalanced system due to lots of unmovable tasks (for example because of
hard CPU affinity). In those situation, it excludes the busiest CPU that
has pinned tasks for load balance consideration such that it can perform
second 2nd load balance pass on the rest of the system.
This all works as designed if there is only one cgroup in the system.
However, when we have multiple cgroups, this logic has false positives and
triggers multiple load balance passes despite there are actually no pinned
tasks at all.
The reason it has false positives is that the all pinned logic is deep in
the lowest function of can_migrate_task() and is too low level:
load_balance_fair() iterates each task group and calls balance_tasks() to
migrate target load. Along the way, balance_tasks() will also set a
all_pinned variable. Given that task-groups are iterated, this all_pinned
variable is essentially the status of last group in the scanning process.
Task group can have number of reasons that no load being migrated, none
due to cpu affinity. However, this status bit is being propagated back up
to the higher level load_balance(), which incorrectly think that no tasks
were moved. It kick off the all pinned logic and start multiple passes
attempt to move load onto puller CPU.
To fix this, move the all_pinned aggregation up at the iterator level.
This ensures that the status is aggregated over all task-groups, not just
last one in the list.
Ken Chen [Fri, 8 Apr 2011 00:23:22 +0000 (17:23 -0700)]
sched: Fix sched-domain avg_load calculation
In function find_busiest_group(), the sched-domain avg_load isn't
calculated at all if there is a group imbalance within the domain. This
will cause erroneous imbalance calculation.
The reason is that calculate_imbalance() sees sds->avg_load = 0 and it
will dump entire sds->max_load into imbalance variable, which is used
later on to migrate entire load from busiest CPU to the puller CPU.
This has two really bad effect:
1. stampede of task migration, and they won't be able to break out
of the bad state because of positive feedback loop: large load
delta -> heavier load migration -> larger imbalance and the cycle
goes on.
2. severe imbalance in CPU queue depth. This causes really long
scheduling latency blip which affects badly on application that
has tight latency requirement.
The fix is to have kernel calculate domain avg_load in both cases. This
will ensure that imbalance calculation is always sensible and the target
is usually half way between busiest and puller CPU.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Don't query connections for widgets have no connections
ALSA: HDA: Fix single internal mic on ALC275 (Sony Vaio VPCSB1C5E)
ALSA: hda - HDMI: Fix MCP7x audio infoframe checksums
ALSA: usb-audio: define another USB ID for a buggy USB MIDI cable
ALSA: HDA: Fix dock mic for Lenovo X220-tablet
ASoC: format_register_str: Don't clip register values
ASoC: PXA: Fix oops in __pxa2xx_pcm_prepare
ASoC: zylonite: set .codec_dai_name in initializer
* git://git.infradead.org/mtd-2.6:
mtd: atmel_nand: use CPU I/O when buffer is in vmalloc(ed) region
mtd: atmel_nand: modify test case for using DMA operations
mtd: atmel_nand: fix support for CPUs that do not support DMA access
mtd: atmel_nand: trivial: change DMA usage information trace
mtd: mtdswap: fix printk format warning
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
NFS: Fix a signed vs. unsigned secinfo bug
Revert "net/sunrpc: Use static const char arrays"
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Don't write quota info in dquot_commit()
ext3: Fix writepage credits computation for ordered mode
Peter Korsgaard [Wed, 30 Mar 2011 13:48:22 +0000 (15:48 +0200)]
watchdog: mpc8xxx_wdt: fix build
Since 1c48a5c93da6313 (dt: Eliminate of_platform_{,un}register_driver)
mpc8xxx_wdt no longer builds as it tries to refer to a 'match' variable
rather than ofdev->dev.of_match that it checks just before.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
When attempting an initial mount, we should only attempt other
authflavors if AUTH_UNIX receives a NFS4ERR_WRONGSEC error.
This allows other errors to be passed back to userspace programs.
Merge branch 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6
* 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6:
efifb: Add override for 11" Macbook Air 3,1
efifb: Support overriding fields FW tells us with the DMI data.
fb: Reduce priority of resource conflict message
savagefb: Remove obsolete else clause in savage_setup_i2c_bus
savagefb: Set up I2C based on chip family instead of card id
savagefb: Replace magic register address with define
drivers/video/bfin-lq035q1-fb.c: introduce missing kfree
video: s3c-fb: fix checkpatch errors and warning
efifb: support AMD Radeon HD 6490
s3fb: fix Virge/GX2
fbcon: Remove unused 'display *p' variable from fb_flashcursor()
fbdev: sh_mobile_lcdcfb: fix module lock acquisition
fbdev: sh_mobile_lcdcfb: add blanking support
viafb: initialize margins correct
viafb: refresh rate bug collection
sh: mach-ap325rxa: move backlight control code
sh: mach-ecovec24: support for main lcd backlight
Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
ARM: arch-shmobile: only run FSI init on respective boards
ARM: arch-shmobile: only run HDMI init on respective boards
ARM: mach-shmobile: Correctly check for CONFIG_MACH_MACKEREL
Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32, fpu: Fix FPU exception handling on non-SSE systems
x86, hibernate: Initialize mmu_cr4_features during boot
x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
x86: visws: Fixup irq overhaul fallout
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Clean up rebalance_domains() load-balance interval calculation
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Fix cpumask leak in __setup_irq()
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf probe: Fix listing incorrect line number with inline function
perf probe: Fix to find recursively inlined function
perf probe: Fix multiple --vars options behavior
perf probe: Fix to remove redundant close
perf probe: Fix to ensure function declared file
Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6
* 'for-linus' of git://git.infradead.org/ubifs-2.6:
UBI: do not select KALLSYMS_ALL
UBI: do not compare array with NULL
UBI: check if we are in RO mode in the erase routine
UBIFS: fix debugging failure in dbg_check_space_info
UBIFS: fix error path in dbg_debugfs_init_fs
UBIFS: unify error path dbg_debugfs_init_fs
UBIFS: do not select KALLSYMS_ALL
UBIFS: fix assertion warnings
UBIFS: fix oops on error path in read_pnode
UBIFS: do not read flash unnecessarily
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: Add support for CH Pro Throttle
HID: hid-magicmouse: Increase evdev buffer size
HID: add FF support for Logitech G25/G27
HID: roccat: Add support for wireless variant of Pyra
HID: Fix typo Keyoutch -> Keytouch
HID: add support for Skycable 0x3f07 wireless presenter
Youquan Song [Wed, 6 Apr 2011 06:35:12 +0000 (14:35 +0800)]
fix build fail for hv_mouse indefine udelay
Fix build failure issue for hv_mouse
When build 2.6.39-rc1 kernel, it will be blocked at build hv_mouse.
drivers/staging/hv/hv_mouse.c: In function ‘ReleaseInputDevice’:
drivers/staging/hv/hv_mouse.c:293: error: implicit declaration of function ‘udelay’
Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The normal mmap paths all avoid creating a mapping where the pgoff
inside the mapping could wrap around due to overflow. However, an
expanding mremap() can take such a non-wrapping mapping and make it
bigger and cause a wrapping condition.
Noticed by Robert Swiecki when running a system call fuzzer, where it
caused a BUG_ON() due to terminally confusing the vma_prio_tree code. A
vma dumping patch by Hugh then pinpointed the crazy wrapped case.
ALSA: HDA: Fix single internal mic on ALC275 (Sony Vaio VPCSB1C5E)
In cases where there is only one internal mic connected to ADC 0x11,
alc275_setup_dual_adc won't handle the case, so we need to add the
ADC node to the array of candidates.
The MCP7x hardware computes the audio infoframe channel count
automatically, but requires the audio driver to set the audio
infoframe checksum manually via the Nv_VERB_SET_Info_Frame_Checksum
control verb.
When audio starts playing, nvhdmi_8ch_7x_pcm_prepare sets the checksum
to (0x71 - chan - chanmask). For example, for 2ch audio, chan == 1
and chanmask == 0 so the checksum is set to 0x70. When audio playback
finishes and the device is closed, nvhdmi_8ch_7x_pcm_close resets the
channel formats, causing the channel count to revert to 8ch. Since
the checksum is not reset, the hardware starts generating audio
infoframes with invalid checksums. This causes some displays to blank
the video.
Fix this by updating the checksum and channel mask when the device is
closed and also when it is first initialized. In addition, make sure
that the channel mask is appropriate for an 8ch infoframe by setting
it to 0x13 (FL FR LFE FC RL RR RLC RRC).
Hans Rosenfeld [Wed, 6 Apr 2011 16:06:43 +0000 (18:06 +0200)]
x86-32, fpu: Fix FPU exception handling on non-SSE systems
On 32bit systems without SSE (that is, they use FSAVE/FRSTOR for FPU
context switches), FPU exceptions in user mode cause Oopses, BUGs,
recursive faults and other nasty things:
staging: usbip: bugfix for isochronous packets and optimization
For isochronous packets the actual_length is the sum of the actual
length of each of the packets, however between the packets might be
padding, so it is not sufficient to just send the first actual_length
bytes of the buffer. To fix this and simultanesouly optimize the
bandwidth the content of the isochronous packets are send without the
padding, the padding is restored on the receiving end.
staging: usbip: bugfix add number of packets for isochronous frames
The number_of_packets was not transmitted for RET_SUBMIT packets. The
linux client used the stored number_of_packet from the submitted
request. The windows userland client does not do this however and needs
to know the number_of_packets to determine the size of the transmission.
staging: usbip: bugfixes related to kthread conversion
When doing a usb port reset do a queued reset instead to prevent a
deadlock: the reset will cause the driver to unbind, causing the
usb_driver_lock_for_reset to stall.
staging: hv: Fix GARP not sent after Quick Migration
After Quick Migration, the network is not immediately operational in the
current context when receiving RNDIS_STATUS_MEDIA_CONNECT event. So, I added
another netif_notify_peers() into a scheduled work, otherwise GARP packet will
not be sent after quick migration, and cause network disconnection.
Thanks to Mike Surcouf <mike@surcouf.co.uk> for reporting the bug and
testing the patch.
Reported-by: Mike Surcouf <mike@surcouf.co.uk> Tested-by: Mike Surcouf <mike@surcouf.co.uk> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Hank Janssen <hjanssen@microsoft.com> Signed-off-by: Abhishek Kane <v-abkane@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
rpc_authflavor_t is cast from an unsigned int, but the
initial code tried to use it as a signed int. I fix
this by passing an rpc_authflavor_t pointer around, and
returning signed integers from functions.
thereby breaking resume from hibernate. This restores previous
functionality in approximately the same place, and corrects the
reading of %cr4 on pre-CPUID hardware (%cr4 exists if and only if
CPUID is supported.)
However, part of the problem is that the hibernate suspend/resume
sequence should manage the save/restore of %cr4 explicitly.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <201104020154.57136.rjw@sisk.pl>
ARM: arch-shmobile: only run FSI init on respective boards
If several boards are enabled in the kernel configuration,
fsi_init_pm_clock() functions from board-ap4evb.c
will run on any of them. Prevent this by calling these functions from the
.init_machine() callback instead of using device_initcall().
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Magnus Damm <damm@opensource.se> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
ARM: arch-shmobile: only run HDMI init on respective boards
If several boards are enabled in the kernel configuration,
hdmi_init_pm_clock() functions from board-ap4evb.c and board-mackerel.c
will run on any of them. Prevent this by calling these functions from the
.init_machine() callback instead of using device_initcall().
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Magnus Damm <damm@opensource.se> Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
savagefb: Set up I2C based on chip family instead of card id
In practice this means enabling I2C (for DDC2) on all prosavage cards,
like the xorg ddx does. The savage4 and savage2000 families have only
one member each, so there is no change for those.
Tested on TwisterK.
Signed-off-by: Tormod Volden <debian.tormod@gmail.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Jingoo Han [Fri, 1 Apr 2011 07:17:27 +0000 (07:17 +0000)]
video: s3c-fb: fix checkpatch errors and warning
This patch fixes the checkpatch errors listed below:
ERROR: space required before the open parenthesis '('
ERROR: need consistent spacing around '+' (ctx:WxV)
ERROR: space prohibited before that close parenthesis ')'
Also, following warning is fixed by adding 'platid' variable
which can reduce number of lines exceeding 80 characters.
WARNING: line over 80 characters
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Chase Douglas [Fri, 1 Apr 2011 21:03:39 +0000 (17:03 -0400)]
HID: hid-magicmouse: Increase evdev buffer size
The evdev buffer isn't big enough when you get many fingers on the
device. Bump up the buffer to a reasonable size, matching what other
multitouch devices use. Without this change, events may be discarded in
the evdev buffer before they are read.
Reported-by: Simon Budig <simon@budig.de> Cc: Henrik Rydberg <rydberg@euromail.se> Cc: Jiri Kosina <jkosina@suse.cz> Cc: stable@kernel.org Signed-off-by: Chase Douglas <chase.douglas@canonical.com> Acked-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Andre Przywara [Thu, 31 Mar 2011 14:58:49 +0000 (16:58 +0200)]
KVM: move and fix substitue search for missing CPUID entries
If KVM cannot find an exact match for a requested CPUID leaf, the
code will try to find the closest match instead of simply confessing
it's failure.
The implementation was meant to satisfy the CPUID specification, but
did not properly check for extended and standard leaves and also
didn't account for the index subleaf.
Beside that this rule only applies to CPUID intercepts, which is not
the only user of the kvm_find_cpuid_entry() function.
So fix this algorithm and call it from kvm_emulate_cpuid().
This fixes a crash of newer Linux kernels as KVM guests on
AMD Bulldozer CPUs, where bogus values were returned in response to
a CPUID intercept.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Andre Przywara [Wed, 30 Mar 2011 13:01:45 +0000 (15:01 +0200)]
KVM: fix XSAVE bit scanning
When KVM scans the 0xD CPUID leaf for propagating the XSAVE save area
leaves, it assumes that the leaves are contigious and stops at the
first zero one. On AMD hardware there is a gap, though, as LWP uses
leaf 62 to announce it's state save area.
So lets iterate through all 64 possible leaves and simply skip zero
ones to also cover later features.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Tue, 1 Feb 2011 11:21:47 +0000 (13:21 +0200)]
KVM: Enable async page fault processing
If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to
avoid sleeping on IO. Check for hwpoison is done at the same time,
otherwise check_user_page_hwpoison() will call GUP again and will put
vcpu to sleep.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
irqfd in kvm used flush_work incorrectly: it assumed that work scheduled
previously can't run after flush_work, but since kvm uses a non-reentrant
workqueue (by means of schedule_work) we need flush_work_sync to get that
guarantee.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr> Tested-by: Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr> Signed-off-by: Avi Kivity <avi@redhat.com>
ALSA: usb-audio: define another USB ID for a buggy USB MIDI cable
There are many USB MIDI cables out there that have buggy
firmware that reports it can do more than 4 bytes in a
packet when they can only properly handle 4
This patch adds the ID of yet another one of those cables
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block:
ide: always ensure that blk_delay_queue() is called if we have pending IO
block: fix request sorting at unplug
dm: improve block integrity support
fs: export empty_aops
ide: ide_requeue_and_plug() reinstate "always plug" behaviour
blk-throttle: don't call xchg on bool
ufs: remove unessecary blk_flush_plug
block: make the flush insertion use the tail of the dispatch list
block: get rid of elv_insert() interface
block: dump request state on seeing a corrupted request completion
Eric Paris [Tue, 5 Apr 2011 21:20:50 +0000 (17:20 -0400)]
inotify: fix double free/corruption of stuct user
On an error path in inotify_init1 a normal user can trigger a double
free of struct user. This is a regression introduced by a2ae4cc9a16e
("inotify: stop kernel memory leak on file creation failure").
We fix this by making sure that if a group exists the user reference is
dropped when the group is cleaned up. We should not explictly drop the
reference on error and also drop the reference when the group is cleaned
up.
The new lifetime rules are that an inotify group lives from
inotify_new_group to the last fsnotify_put_group. Since the struct user
and inotify_devs are directly tied to this lifetime they are only
changed/updated in those two locations. We get rid of all special
casing of struct user or user->inotify_devs.
Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org (2.6.37 and up) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ide: always ensure that blk_delay_queue() is called if we have pending IO
Just because we are not requeuing a request does not mean that
some aren't pending. So always issue a blk_delay_queue() if
either we are requeueing OR there's pending IO.
Comparison function for list_sort() must be anticommutative,
otherwise it is not sorting in ordinary meaning.
But fortunately list_sort() always check ((*cmp)(priv, a, b) <= 0)
it not distinguish negative and zero, so comparison function can
implement only less-or-equal instead of full three-way comparison.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Mike Snitzer [Fri, 1 Apr 2011 19:02:31 +0000 (21:02 +0200)]
dm: improve block integrity support
The current block integrity (DIF/DIX) support in DM is verifying that
all devices' integrity profiles match during DM device resume (which
is past the point of no return). To some degree that is unavoidable
(stacked DM devices force this late checking). But for most DM
devices (which aren't stacking on other DM devices) the ideal time to
verify all integrity profiles match is during table load.
Introduce the notion of an "initialized" integrity profile: a profile
that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
template. Add blk_integrity_is_initialized() to allow checking if a
profile was initialized.
Update DM integrity support to:
- check all devices with _initialized_ integrity profiles match
during table load; uninitialized profiles (e.g. for underlying DM
device(s) of a stacked DM device) are ignored.
- disallow a table load that would result in an integrity profile that
conflicts with a DM device's existing (in-use) integrity profile
- avoid clearing an existing integrity profile
- validate all integrity profiles match during resume; but if they
don't all we can do is report the mismatch (during resume we're past
the point of no return)
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
With the ->sync_page() hook gone, we have a few users that
add their own static address_space_operations without any
functions defined.
fs/inode.c already has an empty_aops that it uses for init
purposes. Lets export that and use it in the places where
an otherwise empty aops was defined.
Jens Axboe [Wed, 30 Mar 2011 07:52:30 +0000 (09:52 +0200)]
block: get rid of elv_insert() interface
Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pseries: Fix build without CONFIG_HOTPLUG_CPU
powerpc: Set nr_cpu_ids early and use it to free PACAs
powerpc/pseries: Don't register global initcall
powerpc/kexec: Fix mismatched ifdefs for PPC64/SMP.
edac/mpc85xx: Limit setting/clearing of HID1[RFXE] to e500v1/v2 cores
powerpc/85xx: Update dts for PCIe memory maps to match u-boot of Px020RDB
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: don't warn in btrfs_add_orphan
Btrfs: fix free space cache when there are pinned extents and clusters V2
Btrfs: Fix uninitialized root flags for subvolumes
btrfs: clear __GFP_FS flag in the space cache inode
Btrfs: fix memory leak in start_transaction()
Btrfs: fix memory leak in btrfs_ioctl_start_sync()
Btrfs: fix subvol_sem leak in btrfs_rename()
Btrfs: Fix oops for defrag with compression turned on
Btrfs: fix /proc/mounts info.
Btrfs: fix compiler warning in file.c
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
ipv6: Don't pass invalid dst_entry pointer to dst_release().
mlx4: fix kfree on error path in new_steering_entry()
tcp: len check is unnecessarily devastating, change to WARN_ON
sctp: malloc enough room for asconf-ack chunk
sctp: fix auth_hmacs field's length of struct sctp_cookie
net: Fix dev dev_ethtool_get_rx_csum() for forced NETIF_F_RXCSUM
usbnet: use eth%d name for known ethernet devices
starfire: clean up dma_addr_t size test
iwlegacy: fix bugs in change_interface
carl9170: Fix tx aggregation problems with some clients
iwl3945: disable hw scan by default
wireless: rt2x00: rt2800usb.c add and identify ids
iwl3945: do not deprecate software scan
mac80211: fix aggregation frame release during timeout
cfg80211: fix BSS double-unlinking (continued)
cfg80211:: fix possible NULL pointer dereference
mac80211: fix possible NULL pointer dereference
mac80211: fix NULL pointer dereference in ieee80211_key_alloc()
ath9k: fix a chip wakeup related crash in ath9k_start
mac80211: fix a crash in minstrel_ht in HT mode with no supported MCS rates
...
Masami Hiramatsu [Wed, 30 Mar 2011 09:26:05 +0000 (18:26 +0900)]
perf probe: Fix listing incorrect line number with inline function
Fix a bug showing incorrect line number when a probe is put on the head of an
inline function. This patch updates find_perf_probe_point() and introduces new
rules to get correct line number.
- If debuginfo doesn't have a correct file name, we shouldn't return line
number too, because, without file name, line number is meaningless.
- If the address is in a function, it stores the function name and the offset
from the function entry.
- If the address is on a line, it tries to get the relative line number from
the function entry line, except for the address is same as the entry
address of the function (in this case, the relative line number should
be 0).
- If the address is in an inline function entry (call-site), it uses the
inline function call line number as the line on which the address is.
- If the address is in an inline function body, it stores the inline
function name and offset from the inline function call site instead of the
(non-inlined) function.
Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110330092605.2132.11629.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is broken in the same manner as for VGA: trying to write to an
invalid address on the (currently 7-bit) i2c bus.
One notable failure appears to be for MacBooks. The scary part was that
it gave the appearance of working (i.e. reporting the absence of the
panel) on various all-in-one machines with ghost LVDS panels and not
failing for laptops.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Dave Airlie <airlied@linux.ie> Signed-off-by: Keith Packard <keithp@keithp.com>
Following the fix to reset the GMBUS controller after a NAK, we finally
utilize the 0xa0 probe for a CRT connection. And discover that the code
is broken. Shock.
There are a number of issues, but following a key insight from Dave
Airlie, that 0xA0 is an invalid address on a 7-bit bus (though not if we
were to enable 10-bit addressing), and would look like the EDID port
0x50, it is possible to see where the confusion starts.
In short, a write to 0xA0 is accepted by the GMBUS controller which we
interpreted as meaning the existence of a connection (a slave on the
other end of the wire ACKing the write). That was false.
During testing with a broken GMBUS implementation, which never reset an
earlier NAK, this test always reported a NAK and so we proceeded on to
the next test.
Reported-and-tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35904 Reported-and-tested-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=32612 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Dave Airlie <airlied@linux.ie> Signed-off-by: Keith Packard <keithp@keithp.com>
Peter Zijlstra [Tue, 5 Apr 2011 08:14:25 +0000 (10:14 +0200)]
sched: Clean up rebalance_domains() load-balance interval calculation
Instead of the possible multiple-evaluation of num_online_cpus()
in rebalance_domains() that Linus reported, avoid it altogether
in the normal case since it's implemented with a Hamming weight
function over a cpu bitmask which can be darn expensive for those
with big iron.
This also makes it cleaner, smaller and documents the code.