Jesse Barnes [Tue, 28 Jun 2011 20:04:16 +0000 (13:04 -0700)]
drm/i915: load a ring frequency scaling table v3
The ring frequency scaling table tells the PCU to treat certain GPU
frequencies as if they were a given CPU frequency for purposes of
scaling the ring frequency. Normally the PCU will scale the ring
frequency based on the CPU P-state, but with the table present, it will
also take the GPU frequency into account.
The main downside of keeping the ring frequency high while the CPU is
at a low frequency (or asleep altogether) is increased power
consumption. But then if you're keeping your GPU busy, you probably
want the extra performance.
v2:
- add units to debug table header (from Eric)
- use tsc_khz as a fallback if the cpufreq driver doesn't give us a freq
(from Chris)
v3:
- fix comments & debug output
- remove unneeded force wake get/put
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Keith Packard <keithp@keithp.com>
Jesse Barnes [Tue, 28 Jun 2011 17:59:12 +0000 (10:59 -0700)]
cpufreq: expose a cpufreq_quick_get_max routine
This allows drivers and other code to get the max reported CPU frequency.
Initial use is for scaling ring frequency with GPU frequency in the i915
driver.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Keith Packard <keithp@keithp.com>
Chris Wilson [Tue, 28 Jun 2011 10:48:51 +0000 (11:48 +0100)]
drm/i915: Use chipset-specific irq installers
Konstantin Belousov pointed out that 4697995b98417 replaced the generic
i915_driver_irq_*install() functions with chipset specific routines
accessible only through driver->irq_*install(). So update the sanity
check in i915_request_wait() to match.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com>
Ben Widawsky [Fri, 24 Jun 2011 21:31:47 +0000 (14:31 -0700)]
drm/i915: forcewake fix after reset
The failure is as follows:
1. Userspace gets forcewake lock, lock count >=1
2. GPU hang/reset occurs (forcewake bit is reset)
3. count is now incorrect
The failure can only occur when using the forcewake userspace lock.
This has the unfortunate consequence of messing up the driver as well as
userspace, unless userspace closes the debugfs file, the kernel will
never end up waking the GT since the refcount will be > 1.
The solution is to try to recover the correct forcewake state based on
the refcount. There is a period of time where userspace reads/writes may
occur after the reset, before the GT has been forcewaked. The interface
was never designed to be a perfect solution for userspace reads/writes,
and the kernel portion is fixed by this patch.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Keith Packard <keithp@keithp.com>
Hugh Dickins [Mon, 27 Jun 2011 23:18:20 +0000 (16:18 -0700)]
drm/i915: more struct_mutex locking
When auditing the locking in i915_gem.c (for a prospective change which I
then abandoned), I noticed two places where struct_mutex is not held
across GEM object manipulations that would usually require it. Since one
is in initial setup and the other in driver unload, I'm guessing the mutex
is not required for either; but post a patch in case it is.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Keith Packard <keithp@keithp.com>
Ben Widawsky [Wed, 22 Jun 2011 16:55:01 +0000 (09:55 -0700)]
drm/i915: save/resume forcewake lock fixes
The lock must be held for the saving and restoring of VGA state.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> CC: Alexander Zhaunerchyk <alex.vizor@gmail.com> CC: Andrey Rahmatullin <wrar@wrar.name> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com>
Eric Anholt [Tue, 14 Jun 2011 23:43:09 +0000 (16:43 -0700)]
Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
This reverts commit 4a684a4117abd756291969336af454e8a958802f.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce. (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).
Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.
Reported-and-tested-by: nkalkhof@web.de
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38529 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com>
Linus Torvalds [Tue, 21 Jun 2011 03:13:49 +0000 (20:13 -0700)]
vfs: i_state needs to be 'unsigned long' for now
Commit 13e12d14e2dc ("vfs: reorganize 'struct inode' layout a bit")
moved things around a bit changed i_state to be unsigned int instead of
unsigned long. That was to help structure layout for the 64-bit case,
and shrink 'struct inode' a bit (admittedly that only happened when
spinlock debugging was on and i_flags didn't pack with i_lock).
However, Meelis Roos reports that this results in unaligned exceptions
on sprc, and it turns out that the bit-locking primitives that we use
for the I_NEW bit want to use the bitops. Which want 'unsigned long',
not 'unsigned int'.
We really should fix the bit locking code to not have that kind of
requirement, but that's a much bigger change. So for now, revert that
field back to 'unsigned long' (but keep the other re-ordering changes
from the commit that caused this).
Andi points out that we have played games with this in 'struct page', so
it's solvable with other hacks too, but since right now the struct inode
size advantage only happens with some rare config options, it's not
worth fighting.
It _would_ be worth fixing the bitlocking code, though. Especially
since there is no type safety in the bitlocking code (this never caused
any warnings, and worked fine on x86-64, because the bitlocks take a
'void *' and x86-64 doesn't care that deeply about alignment). So it's
currently a very easy problem to trigger by mistake and never notice.
Reported-by: Meelis Roos <mroos@linux.ee> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 21 Jun 2011 03:12:48 +0000 (20:12 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms/r6xx+: voltage fixes
drm/nouveau: drop leftover debugging
drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
drm/radeon/kms: add missing param for dce3.2 DP transmitter setup
drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
drm/nv50/disp: fix gamma with page flipping overlay turned on
drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
drm/nouveau: fix big-endian switch
Linus Torvalds [Tue, 21 Jun 2011 03:10:52 +0000 (20:10 -0700)]
Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix break_lease flags on nfsd open
nfsd: link returns nfserr_delay when breaking lease
nfsd: v4 support requires CRYPTO
nfsd: fix dependency of nfsd on auth_rpcgss
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
pxa168_eth: fix race in transmit path.
ipv4, ping: Remove duplicate icmp.h include
netxen: fix race in skb->len access
sgi-xp: fix a use after free
hp100: fix an skb->len race
netpoll: copy dev name of slaves to struct netpoll
ipv4: fix multicast losses
r8169: fix static initializers.
inet_diag: fix inet_diag_bc_audit()
gigaset: call module_put before restart of if_open()
farsync: add module_put to error path in fst_open()
net: rfs: enable RFS before first data packet is received
fs_enet: fix freescale FCC ethernet dp buffer alignment
netdev: bfin_mac: fix memory leak when freeing dma descriptors
vlan: don't call ndo_vlan_rx_register on hardware that doesn't have vlan support
caif: Bugfix - XOFF removed channel from caif-mux
tun: teach the tun/tap driver to support netpoll
dp83640: drop PHY status frames in the driver.
dp83640: fix phy status frame event parsing
phylib: Allow BCM63XX PHY to be selected only on BCM63XX.
...
Linus Torvalds [Tue, 21 Jun 2011 03:09:15 +0000 (20:09 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
fix comment in generic_permission()
kill obsolete comment for follow_down()
proc_sys_permission() is OK in RCU mode
reiserfs_permission() doesn't need to bail out in RCU mode
proc_fd_permission() is doesn't need to bail out in RCU mode
nilfs2_permission() doesn't need to bail out in RCU mode
logfs doesn't need ->permission() at all
coda_ioctl_permission() is safe in RCU mode
cifs_permission() doesn't need to bail out in RCU mode
bad_inode_permission() is safe from RCU mode
ubifs: dereferencing an ERR_PTR in ubifs_mount()
Richard Cochran [Sun, 19 Jun 2011 21:48:06 +0000 (21:48 +0000)]
pxa168_eth: fix race in transmit path.
Because the socket buffer is freed in the completion interrupt, it is not
safe to access it after submitting it to the hardware.
Cc: stable@kernel.org Cc: Sachin Sanap <ssanap@marvell.com> Cc: Zhangfei Gao <zgao6@marvell.com> Cc: Philip Rakity <prakity@marvell.com> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 19 Jun 2011 20:26:15 +0000 (20:26 +0000)]
netxen: fix race in skb->len access
As soon as skb is given to hardware, TX completion can free skb under
us.
Therefore, we should update dev stats before kicking the device.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 20 Jun 2011 16:01:33 +0000 (09:01 -0700)]
Merge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/setup: Fix for incorrect xen_extra_mem_start.
xen: When calling power_off, don't call the halt function.
xen: Fix compile warning when CONFIG_SMP is not defined.
xen: support CONFIG_MAXSMP
xen: partially revert "xen: set max_pfn_mapped to the last pfn mapped"
Linus Torvalds [Mon, 20 Jun 2011 15:59:46 +0000 (08:59 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: sh_keysc - 8x8 MODE_6 fix
Input: omap-keypad - add missing input_sync()
Input: evdev - try to wake up readers only if we have full packet
Input: properly assign return value of clamp() macro.
Linus Torvalds [Mon, 20 Jun 2011 15:58:53 +0000 (08:58 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: avoid delayed metadata items during commits
btrfs: fix uninitialized return value
btrfs: fix wrong reservation when doing delayed inode operations
btrfs: Remove unused sysfs code
btrfs: fix dereference of ERR_PTR value
Btrfs: fix relocation races
Btrfs: set no_trans_join after trying to expand the transaction
Btrfs: protect the pending_snapshots list with trans_lock
Btrfs: fix path leakage on subvol deletion
Btrfs: drop the delalloc_bytes check in shrink_delalloc
Btrfs: check the return value from set_anon_super
Linus Torvalds [Mon, 20 Jun 2011 15:58:07 +0000 (08:58 -0700)]
Merge branch 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Fix register corruption in pvclock_scale_delta
KVM: MMU: fix opposite condition in mapping_level_dirty_bitmap
KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS
KVM: MMU: Fix build warnings in walk_addr_generic()
Al Viro [Sun, 19 Jun 2011 17:01:04 +0000 (13:01 -0400)]
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
inode_permission() calls devcgroup_inode_permission() and almost all such
calls are _not_ for device nodes; let's at least keep the common path
straight...
Dan Carpenter [Mon, 20 Jun 2011 07:10:24 +0000 (10:10 +0300)]
ubifs: dereferencing an ERR_PTR in ubifs_mount()
d251ed271d5 "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
J. Bruce Fields [Tue, 7 Jun 2011 15:50:23 +0000 (11:50 -0400)]
nfsd4: fix break_lease flags on nfsd open
Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!
Reported-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Dave Airlie [Mon, 20 Jun 2011 02:02:38 +0000 (12:02 +1000)]
Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
drm/nv50/disp: fix gamma with page flipping overlay turned on
drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
drm/nouveau: fix big-endian switch
Dave Airlie [Sat, 18 Jun 2011 03:59:51 +0000 (03:59 +0000)]
drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
Since we were calling the wptr function before checking if the IH was
even enabled, or the GPU wasn't shutdown, we'd get spam in the logs when
the GPU readback 0xffffffff. This reorders things so we return early
in the no IH and GPU shutdown cases.
Reported-and-tested-by: ManDay on #radeon Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Fri, 17 Jun 2011 17:13:52 +0000 (13:13 -0400)]
drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
Certain revisions of the vbios on DCE3.2 cards have a bug
in the transmitter control table which prevents duallink from
being enabled properly on some cards. The action switch statement
jumps to the wrong offset for the OUTPUT_ENABLE action. The fix
is to use the ENABLE action rather than the OUTPUT_ENABLE action
on the affected cards. In fixed version of the vbios, both
actions jump to the same offset, so the change should be safe.
Reported-and-tested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
Eric Dumazet [Sun, 19 Jun 2011 12:43:33 +0000 (12:43 +0000)]
hp100: fix an skb->len race
As soon as skb is given to hardware and spinlock released, TX completion
can free skb under us. Therefore, we should update netdev stats before
spinlock release.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zachary Amsden [Thu, 16 Jun 2011 03:50:04 +0000 (20:50 -0700)]
KVM: Fix register corruption in pvclock_scale_delta
The 128-bit multiply in pvclock.h was missing an output constraint for
EDX which caused a register corruption to appear. Thanks to Ulrich for
diagnosing the EDX corruption and Avi for providing this fix.
Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Borislav Petkov [Mon, 30 May 2011 20:11:17 +0000 (22:11 +0200)]
KVM: MMU: Fix build warnings in walk_addr_generic()
On 3.0-rc1 I get
In file included from arch/x86/kvm/mmu.c:2856:
arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
In file included from arch/x86/kvm/mmu.c:2852:
arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
Linus Torvalds [Sun, 19 Jun 2011 16:00:18 +0000 (09:00 -0700)]
Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tools/perf: Fix static build of perf tool
tracing: Fix regression in printk_formats file
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: Make watchdog robust vs. interruption
timerfd: Fix wakeup of processes when timer is cancelled on clock change
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, MAINTAINERS: Add x86 MCE people
x86, efi: Do not reserve boot services regions within reserved areas
Linus Torvalds [Sun, 19 Jun 2011 15:56:56 +0000 (08:56 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Move RCU_BOOST #ifdefs to header file
rcu: use softirq instead of kthreads except when RCU_BOOST=y
rcu: Use softirq to address performance regression
rcu: Simplify curing of load woes
x86, efi: Do not reserve boot services regions within reserved areas
Commit 916f676f8dc started reserving boot service code since some systems
require you to keep that code around until SetVirtualAddressMap is called.
However, in some cases those areas will overlap with reserved regions.
The proper medium-term fix is to fix the bootloader to prevent the
conflicts from occurring by moving the kernel to a better position,
but the kernel should check for this possibility, and only reserve regions
which can be reserved.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Link: http://lkml.kernel.org/r/4DF7A005.1050407@gmail.com Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric Dumazet [Sat, 18 Jun 2011 18:59:18 +0000 (11:59 -0700)]
ipv4: fix multicast losses
Knut Tidemann found that first packet of a multicast flow was not
correctly received, and bisected the regression to commit b23dd4fe42b4
(Make output route lookup return rtable directly.)
Special thanks to Knut, who provided a very nice bug report, including
sample programs to demonstrate the bug.
Reported-and-bisectedby: Knut Tidemann <knut.andre.tidemann@jotron.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise the updated evdev driver (commit cdda911c34006f1089f3c87b1a1f,
"Input: evdev - only signal polls on full packets") no longer works on
top of omap-keypad.
Tested on Amstrad Delta.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> Reviewed-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dmitry Torokhov [Sat, 18 Jun 2011 09:50:11 +0000 (02:50 -0700)]
Input: evdev - try to wake up readers only if we have full packet
We should only wake waiters on the event device when we actually post
an EV_SYN/SYN_REPORT to the queue. Otherwise we end up making waiting
threads runnable only to go right back to sleep because the device
still isn't readable.
Reported-by: Jeffrey Brown <jeffbrown@android.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Guenter Roeck [Tue, 24 May 2011 19:34:55 +0000 (12:34 -0700)]
hwmon: (s3c) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Guenter Roeck [Tue, 24 May 2011 19:34:12 +0000 (12:34 -0700)]
hwmon: (ibmpex) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Guenter Roeck [Tue, 24 May 2011 19:33:26 +0000 (12:33 -0700)]
hwmon: (ibmaem) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Linus Torvalds [Sat, 18 Jun 2011 04:13:43 +0000 (21:13 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: Don't try to transition if the pstate is incorrect
[CPUFREQ] powernow-k8: Don't notify of successful transition if we failed (vid case).
[CPUFREQ] Don't set stat->last_index to -1 if the pol->cur has incorrect value.
Linus Torvalds [Sat, 18 Jun 2011 02:05:36 +0000 (19:05 -0700)]
mm: avoid anon_vma_chain allocation under anon_vma lock
Hugh Dickins points out that lockdep (correctly) spots a potential
deadlock on the anon_vma lock, because we now do a GFP_KERNEL allocation
of anon_vma_chain while doing anon_vma_clone(). The problem is that
page reclaim will want to take the anon_vma lock of any anonymous pages
that it will try to reclaim.
So re-organize the code in anon_vma_clone() slightly: first do just a
GFP_NOWAIT allocation, which will usually work fine. But if that fails,
let's just drop the lock and re-do the allocation, now with GFP_KERNEL.
End result: not only do we avoid the locking problem, this also ends up
getting better concurrency in case the allocation does need to block.
Tim Chen reports that with all these anon_vma locking tweaks, we're now
almost back up to the spinlock performance.
Reported-and-tested-by: Hugh Dickins <hughd@google.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Fri, 17 Jun 2011 11:54:23 +0000 (13:54 +0200)]
mm: avoid repeated anon_vma lock/unlock sequences in unlink_anon_vmas()
This matches the anon_vma_clone() case, and uses the same lock helper
functions. Because of the need to potentially release the anon_vma's,
it's a bit more complex, though.
We traverse the 'vma->anon_vma_chain' in two phases: the first loop gets
the anon_vma lock (with the helper function that only takes the lock
once for the whole loop), and removes any entries that don't need any
more processing.
The second phase just traverses the remaining list entries (without
holding the anon_vma lock), and does any actual freeing of the
anon_vma's that is required.
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Hugh Dickins <hughd@google.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 17 Jun 2011 03:44:51 +0000 (20:44 -0700)]
mm: avoid repeated anon_vma lock/unlock sequences in anon_vma_clone()
In anon_vma_clone() we traverse the vma->anon_vma_chain of the source
vma, locking the anon_vma for each entry.
But they are all going to have the same root entry, which means that
we're locking and unlocking the same lock over and over again. Which is
expensive in locked operations, but can get _really_ expensive when that
root entry sees any kind of lock contention.
In fact, Tim Chen reports a big performance regression due to this: when
we switched to use a mutex instead of a spinlock, the contention case
gets much worse.
So to alleviate this all, this commit creates a small helper function
(lock_anon_vma_root()) that can be used to take the lock just once
rather than taking and releasing it over and over again.
We still have the same "take the lock and release" it behavior in the
exit path (in unlink_anon_vmas()), but that one is a bit harder to fix
since we're actually freeing the anon_vma entries as we go, and that
will touch the lock too.
Reported-and-tested-by: Tim Chen <tim.c.chen@linux.intel.com> Tested-by: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel J Blueman [Fri, 17 Jun 2011 18:32:19 +0000 (11:32 -0700)]
drm/i915: Fix gen6 (SNB) missed BLT ring interrupts.
The failure appeared in dmesg as:
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt
ring idle [waiting on 35064155, at 35064155], missed IRQ?
This works around that problem on by making the blitter command
streamer write interrupt state to the Hardware Status Page when a
MI_USER_INTERRUPT command is decoded, which appears to force the seqno
out to memory before the interrupt happens.
v1->v2: Moved to prior interrupt handler installation and RMW flags as
per feedback.
v2->v3: Removed RMW of flags (by anholt)
Cc: stable@kernel.org Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> [v1] Tested-by: Eric Anholt <eric@anholt.net> [v1,v3]
(incidence of the bug with a testcase went from avg 2/1000 to
0/12651 in the latest test run (plus more for v1)) Tested-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Tested-by: Robert Hooker <robert.hooker@canonical.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33394 Signed-off-by: Dave Airlie <airlied@redhat.com>
Jeff Ohlstein [Fri, 17 Jun 2011 20:55:38 +0000 (13:55 -0700)]
msm: timer: compensate for timer shift in msm_read_timer_count
Some msm targets have timers whose lower bits are unreliable. So, we
present our timers as lower frequency than they actually are, and ignore
the bottom 5 bits on such targets. This compensation was erroneously
removed from the msm_read_timer_count function, so restore it.
This was broken by 94790ec25 "msm: timer: SMP timer support for msm".
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Chris Mason [Fri, 17 Jun 2011 20:14:09 +0000 (16:14 -0400)]
Btrfs: avoid delayed metadata items during commits
Snapshot creation has two phases. One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.
The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.
This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Eric Dumazet [Fri, 17 Jun 2011 20:25:39 +0000 (16:25 -0400)]
inet_diag: fix inet_diag_bc_audit()
A malicious user or buggy application can inject code and trigger an
infinite loop in inet_diag_bc_audit()
Also make sure each instruction is aligned on 4 bytes boundary, to avoid
unaligned accesses.
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Shved [Fri, 17 Jun 2011 06:25:11 +0000 (06:25 +0000)]
gigaset: call module_put before restart of if_open()
if_open() calls try_module_get(), and after an attempt to lock a mutex
the if_open() function may return -ERESTARTSYS without
putting the module. Then, when if_open() is executed again,
try_module_get() is called making the reference counter of THIS_MODULE
greater than one at successful exit from if_open(). The if_close()
function puts the module only once, and as a result it can't be
unloaded.
This patch adds module_put call before the return from if_open().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Pavel Shved <shved@ispras.ru> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Pavel Shved [Fri, 17 Jun 2011 06:25:10 +0000 (06:25 +0000)]
farsync: add module_put to error path in fst_open()
The fst_open() function, after a successful try_module_get() may return
an error code if hdlc_open() returns it. However, it does not put the
module on this error path.
This patch adds the necessary module_put() call.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Pavel Shved <shved@ispras.ru> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Eric Dumazet [Fri, 17 Jun 2011 03:45:15 +0000 (03:45 +0000)]
net: rfs: enable RFS before first data packet is received
Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit :
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Fri, 17 Jun 2011 00:50:46 +0100
>
> > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote:
> >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
> >> goto discard;
> >>
> >> if (nsk != sk) {
> >> + sock_rps_save_rxhash(nsk, skb->rxhash);
> >> if (tcp_child_process(sk, nsk, skb)) {
> >> rsk = nsk;
> >> goto reset;
> >>
> >
> > I haven't tried this, but it looks reasonable to me.
> >
> > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar.
>
> Indeed ipv6 side needs the same fix.
>
> Eric please add that part and resubmit. And in fact I might stick
> this into net-2.6 instead of net-next-2.6
>
OK, here is the net-2.6 based one then, thanks !
[PATCH v2] net: rfs: enable RFS before first data packet is received
First packet received on a passive tcp flow is not correctly RFS
steered.
One sock_rps_record_flow() call is missing in inet_accept()
But before that, we also must record rxhash when child socket is setup.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
The RIPTR and TIPTR (receive/transmit internal temporary data pointer),
used by microcode as a temporary buffer for data, must be 32-byte aligned
according to the RM for MPC8247.
Tested on mgcoge.
Signed-off-by: Clive Stubbings <clive.stubbings@xentech.co.uk> Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
cc: Pantelis Antoniou <pantelis.antoniou@gmail.com>
cc: Vitaly Bordug <vbordug@ru.mvista.com>
cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Miao Xie [Wed, 15 Jun 2011 10:47:30 +0000 (10:47 +0000)]
btrfs: fix wrong reservation when doing delayed inode operations
We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened: