Steven Rostedt [Fri, 15 Jul 2011 03:02:27 +0000 (23:02 -0400)]
ftrace: Fix regression where ftrace breaks when modules are loaded
Enabling function tracer to trace all functions, then load a module and
then disable function tracing will cause ftrace to fail.
This can also happen by enabling function tracing on the command line:
ftrace=function
and during boot up, modules are loaded, then you disable function tracing
with 'echo nop > current_tracer' you will trigger a bug in ftrace that
will shut itself down.
The reason is, the new ftrace code keeps ref counts of all ftrace_ops that
are registered for tracing. When one or more ftrace_ops are registered,
all the records that represent the functions that the ftrace_ops will
trace have a ref count incremented. If this ref count is not zero,
when the code modification runs, that function will be enabled for tracing.
If the ref count is zero, that function will be disabled from tracing.
To make sure the accounting was working, FTRACE_WARN_ON()s were added
to updating of the ref counts.
If the ref count hits its max (> 2^30 ftrace_ops added), or if
the ref count goes below zero, a FTRACE_WARN_ON() is triggered which
disables all modification of code.
Since it is common for ftrace_ops to trace all functions in the kernel,
instead of creating > 20,000 hash items for the ftrace_ops, the hash
count is just set to zero, and it represents that the ftrace_ops is
to trace all functions. This is where the issues arrise.
If you enable function tracing to trace all functions, and then add
a module, the modules function records do not get the ref count updated.
When the function tracer is disabled, all function records ref counts
are subtracted. Since the modules never had their ref counts incremented,
they go below zero and the FTRACE_WARN_ON() is triggered.
The solution to this is rather simple. When modules are loaded, and
their functions are added to the the ftrace pool, look to see if any
ftrace_ops are registered that trace all functions. And for those,
update the ref count for the module function records.
Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 7 Jul 2011 15:09:22 +0000 (11:09 -0400)]
ftrace: Fix regression of :mod:module function enabling
The new code that allows different utilities to pick and choose
what functions they trace broke the :mod: hook that allows users
to trace only functions of a particular module.
The reason is that the :mod: hook bypasses the hash that is setup
to allow individual users to trace their own functions and uses
the global hash directly. But if the global hash has not been
set up, it will cause a bug:
Steven Rostedt [Tue, 5 Jul 2011 18:32:51 +0000 (14:32 -0400)]
tracing: Have "enable" file use refcounts like the "filter" file
The "enable" file for the event system can be removed when a module
is unloaded and the event system only has events from that module.
As the event system nr_events count goes to zero, it may be freed
if its ref_count is also set to zero.
Like the "filter" file, the "enable" file may be opened by a task and
referenced later, after a module has been unloaded and the events for
that event system have been removed.
Although the "filter" file referenced the event system structure,
the "enable" file only references a pointer to the event system
name. Since the name is freed when the event system is removed,
it is possible that an access to the "enable" file may reference
a freed pointer.
Update the "enable" file to use the subsystem_open() routine that
the "filter" file uses, to keep a reference to the event system
structure while the "enable" file is opened.
Cc: <stable@kernel.org> Reported-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 5 Jul 2011 15:36:06 +0000 (11:36 -0400)]
tracing: Fix bug when reading system filters on module removal
The event system is freed when its nr_events is set to zero. This happens
when a module created an event system and then later the module is
removed. Modules may share systems, so the system is allocated when
it is created and freed when the modules are unloaded and all the
events under the system are removed (nr_events set to zero).
The problem arises when a task opened the "filter" file for the
system. If the module is unloaded and it removed the last event for
that system, the system structure is freed. If the task that opened
the filter file accesses the "filter" file after the system has
been freed, the system will access an invalid pointer.
By adding a ref_count, and using it to keep track of what
is using the event system, we can free it after all users
are finished with the event system.
Cc: <stable@kernel.org> Reported-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Tue, 21 Jun 2011 17:36:06 +0000 (10:36 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: Ensure that LOS and DFE are being turned off
RDMA/cxgb4: Couple of abort fixes
RDMA/cxgb4: Don't truncate MR lengths
RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs
Linus Torvalds [Tue, 21 Jun 2011 17:22:35 +0000 (10:22 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: Fix oops in jbd2_journal_remove_journal_head()
jbd2: Remove obsolete parameters in the comments for some jbd2 functions
ext4: fixed tracepoints cleanup
ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
ext4: Fix max file size and logical block counting of extent format file
ext4: correct comments for ext4_free_blocks()
Linus Torvalds [Tue, 21 Jun 2011 03:13:49 +0000 (20:13 -0700)]
vfs: i_state needs to be 'unsigned long' for now
Commit 13e12d14e2dc ("vfs: reorganize 'struct inode' layout a bit")
moved things around a bit changed i_state to be unsigned int instead of
unsigned long. That was to help structure layout for the 64-bit case,
and shrink 'struct inode' a bit (admittedly that only happened when
spinlock debugging was on and i_flags didn't pack with i_lock).
However, Meelis Roos reports that this results in unaligned exceptions
on sprc, and it turns out that the bit-locking primitives that we use
for the I_NEW bit want to use the bitops. Which want 'unsigned long',
not 'unsigned int'.
We really should fix the bit locking code to not have that kind of
requirement, but that's a much bigger change. So for now, revert that
field back to 'unsigned long' (but keep the other re-ordering changes
from the commit that caused this).
Andi points out that we have played games with this in 'struct page', so
it's solvable with other hacks too, but since right now the struct inode
size advantage only happens with some rare config options, it's not
worth fighting.
It _would_ be worth fixing the bitlocking code, though. Especially
since there is no type safety in the bitlocking code (this never caused
any warnings, and worked fine on x86-64, because the bitlocks take a
'void *' and x86-64 doesn't care that deeply about alignment). So it's
currently a very easy problem to trigger by mistake and never notice.
Reported-by: Meelis Roos <mroos@linux.ee> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 21 Jun 2011 03:12:48 +0000 (20:12 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms/r6xx+: voltage fixes
drm/nouveau: drop leftover debugging
drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
drm/radeon/kms: add missing param for dce3.2 DP transmitter setup
drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
drm/nv50/disp: fix gamma with page flipping overlay turned on
drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
drm/nouveau: fix big-endian switch
Linus Torvalds [Tue, 21 Jun 2011 03:10:52 +0000 (20:10 -0700)]
Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix break_lease flags on nfsd open
nfsd: link returns nfserr_delay when breaking lease
nfsd: v4 support requires CRYPTO
nfsd: fix dependency of nfsd on auth_rpcgss
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
pxa168_eth: fix race in transmit path.
ipv4, ping: Remove duplicate icmp.h include
netxen: fix race in skb->len access
sgi-xp: fix a use after free
hp100: fix an skb->len race
netpoll: copy dev name of slaves to struct netpoll
ipv4: fix multicast losses
r8169: fix static initializers.
inet_diag: fix inet_diag_bc_audit()
gigaset: call module_put before restart of if_open()
farsync: add module_put to error path in fst_open()
net: rfs: enable RFS before first data packet is received
fs_enet: fix freescale FCC ethernet dp buffer alignment
netdev: bfin_mac: fix memory leak when freeing dma descriptors
vlan: don't call ndo_vlan_rx_register on hardware that doesn't have vlan support
caif: Bugfix - XOFF removed channel from caif-mux
tun: teach the tun/tap driver to support netpoll
dp83640: drop PHY status frames in the driver.
dp83640: fix phy status frame event parsing
phylib: Allow BCM63XX PHY to be selected only on BCM63XX.
...
Linus Torvalds [Tue, 21 Jun 2011 03:09:15 +0000 (20:09 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
fix comment in generic_permission()
kill obsolete comment for follow_down()
proc_sys_permission() is OK in RCU mode
reiserfs_permission() doesn't need to bail out in RCU mode
proc_fd_permission() is doesn't need to bail out in RCU mode
nilfs2_permission() doesn't need to bail out in RCU mode
logfs doesn't need ->permission() at all
coda_ioctl_permission() is safe in RCU mode
cifs_permission() doesn't need to bail out in RCU mode
bad_inode_permission() is safe from RCU mode
ubifs: dereferencing an ERR_PTR in ubifs_mount()
Richard Cochran [Sun, 19 Jun 2011 21:48:06 +0000 (21:48 +0000)]
pxa168_eth: fix race in transmit path.
Because the socket buffer is freed in the completion interrupt, it is not
safe to access it after submitting it to the hardware.
Cc: stable@kernel.org Cc: Sachin Sanap <ssanap@marvell.com> Cc: Zhangfei Gao <zgao6@marvell.com> Cc: Philip Rakity <prakity@marvell.com> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 19 Jun 2011 20:26:15 +0000 (20:26 +0000)]
netxen: fix race in skb->len access
As soon as skb is given to hardware, TX completion can free skb under
us.
Therefore, we should update dev stats before kicking the device.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 20 Jun 2011 16:01:33 +0000 (09:01 -0700)]
Merge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/setup: Fix for incorrect xen_extra_mem_start.
xen: When calling power_off, don't call the halt function.
xen: Fix compile warning when CONFIG_SMP is not defined.
xen: support CONFIG_MAXSMP
xen: partially revert "xen: set max_pfn_mapped to the last pfn mapped"
Linus Torvalds [Mon, 20 Jun 2011 15:59:46 +0000 (08:59 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: sh_keysc - 8x8 MODE_6 fix
Input: omap-keypad - add missing input_sync()
Input: evdev - try to wake up readers only if we have full packet
Input: properly assign return value of clamp() macro.
Linus Torvalds [Mon, 20 Jun 2011 15:58:53 +0000 (08:58 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: avoid delayed metadata items during commits
btrfs: fix uninitialized return value
btrfs: fix wrong reservation when doing delayed inode operations
btrfs: Remove unused sysfs code
btrfs: fix dereference of ERR_PTR value
Btrfs: fix relocation races
Btrfs: set no_trans_join after trying to expand the transaction
Btrfs: protect the pending_snapshots list with trans_lock
Btrfs: fix path leakage on subvol deletion
Btrfs: drop the delalloc_bytes check in shrink_delalloc
Btrfs: check the return value from set_anon_super
Linus Torvalds [Mon, 20 Jun 2011 15:58:07 +0000 (08:58 -0700)]
Merge branch 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Fix register corruption in pvclock_scale_delta
KVM: MMU: fix opposite condition in mapping_level_dirty_bitmap
KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS
KVM: MMU: Fix build warnings in walk_addr_generic()
Al Viro [Sun, 19 Jun 2011 17:01:04 +0000 (13:01 -0400)]
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
inode_permission() calls devcgroup_inode_permission() and almost all such
calls are _not_ for device nodes; let's at least keep the common path
straight...
Dan Carpenter [Mon, 20 Jun 2011 07:10:24 +0000 (10:10 +0300)]
ubifs: dereferencing an ERR_PTR in ubifs_mount()
d251ed271d5 "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
J. Bruce Fields [Tue, 7 Jun 2011 15:50:23 +0000 (11:50 -0400)]
nfsd4: fix break_lease flags on nfsd open
Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!
Reported-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Dave Airlie [Mon, 20 Jun 2011 02:02:38 +0000 (12:02 +1000)]
Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
drm/nv50/disp: fix gamma with page flipping overlay turned on
drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
drm/nouveau: fix big-endian switch
Dave Airlie [Sat, 18 Jun 2011 03:59:51 +0000 (03:59 +0000)]
drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
Since we were calling the wptr function before checking if the IH was
even enabled, or the GPU wasn't shutdown, we'd get spam in the logs when
the GPU readback 0xffffffff. This reorders things so we return early
in the no IH and GPU shutdown cases.
Reported-and-tested-by: ManDay on #radeon Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Fri, 17 Jun 2011 17:13:52 +0000 (13:13 -0400)]
drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
Certain revisions of the vbios on DCE3.2 cards have a bug
in the transmitter control table which prevents duallink from
being enabled properly on some cards. The action switch statement
jumps to the wrong offset for the OUTPUT_ENABLE action. The fix
is to use the ENABLE action rather than the OUTPUT_ENABLE action
on the affected cards. In fixed version of the vbios, both
actions jump to the same offset, so the change should be safe.
Reported-and-tested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
Eric Dumazet [Sun, 19 Jun 2011 12:43:33 +0000 (12:43 +0000)]
hp100: fix an skb->len race
As soon as skb is given to hardware and spinlock released, TX completion
can free skb under us. Therefore, we should update netdev stats before
spinlock release.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zachary Amsden [Thu, 16 Jun 2011 03:50:04 +0000 (20:50 -0700)]
KVM: Fix register corruption in pvclock_scale_delta
The 128-bit multiply in pvclock.h was missing an output constraint for
EDX which caused a register corruption to appear. Thanks to Ulrich for
diagnosing the EDX corruption and Avi for providing this fix.
Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Borislav Petkov [Mon, 30 May 2011 20:11:17 +0000 (22:11 +0200)]
KVM: MMU: Fix build warnings in walk_addr_generic()
On 3.0-rc1 I get
In file included from arch/x86/kvm/mmu.c:2856:
arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
In file included from arch/x86/kvm/mmu.c:2852:
arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
Linus Torvalds [Sun, 19 Jun 2011 16:00:18 +0000 (09:00 -0700)]
Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tools/perf: Fix static build of perf tool
tracing: Fix regression in printk_formats file
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: Make watchdog robust vs. interruption
timerfd: Fix wakeup of processes when timer is cancelled on clock change
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, MAINTAINERS: Add x86 MCE people
x86, efi: Do not reserve boot services regions within reserved areas
Linus Torvalds [Sun, 19 Jun 2011 15:56:56 +0000 (08:56 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Move RCU_BOOST #ifdefs to header file
rcu: use softirq instead of kthreads except when RCU_BOOST=y
rcu: Use softirq to address performance regression
rcu: Simplify curing of load woes
x86, efi: Do not reserve boot services regions within reserved areas
Commit 916f676f8dc started reserving boot service code since some systems
require you to keep that code around until SetVirtualAddressMap is called.
However, in some cases those areas will overlap with reserved regions.
The proper medium-term fix is to fix the bootloader to prevent the
conflicts from occurring by moving the kernel to a better position,
but the kernel should check for this possibility, and only reserve regions
which can be reserved.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Link: http://lkml.kernel.org/r/4DF7A005.1050407@gmail.com Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric Dumazet [Sat, 18 Jun 2011 18:59:18 +0000 (11:59 -0700)]
ipv4: fix multicast losses
Knut Tidemann found that first packet of a multicast flow was not
correctly received, and bisected the regression to commit b23dd4fe42b4
(Make output route lookup return rtable directly.)
Special thanks to Knut, who provided a very nice bug report, including
sample programs to demonstrate the bug.
Reported-and-bisectedby: Knut Tidemann <knut.andre.tidemann@jotron.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise the updated evdev driver (commit cdda911c34006f1089f3c87b1a1f,
"Input: evdev - only signal polls on full packets") no longer works on
top of omap-keypad.
Tested on Amstrad Delta.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> Reviewed-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dmitry Torokhov [Sat, 18 Jun 2011 09:50:11 +0000 (02:50 -0700)]
Input: evdev - try to wake up readers only if we have full packet
We should only wake waiters on the event device when we actually post
an EV_SYN/SYN_REPORT to the queue. Otherwise we end up making waiting
threads runnable only to go right back to sleep because the device
still isn't readable.
Reported-by: Jeffrey Brown <jeffbrown@android.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Guenter Roeck [Tue, 24 May 2011 19:34:55 +0000 (12:34 -0700)]
hwmon: (s3c) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Guenter Roeck [Tue, 24 May 2011 19:34:12 +0000 (12:34 -0700)]
hwmon: (ibmpex) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Guenter Roeck [Tue, 24 May 2011 19:33:26 +0000 (12:33 -0700)]
hwmon: (ibmaem) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Linus Torvalds [Sat, 18 Jun 2011 04:13:43 +0000 (21:13 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: Don't try to transition if the pstate is incorrect
[CPUFREQ] powernow-k8: Don't notify of successful transition if we failed (vid case).
[CPUFREQ] Don't set stat->last_index to -1 if the pol->cur has incorrect value.
Linus Torvalds [Sat, 18 Jun 2011 02:05:36 +0000 (19:05 -0700)]
mm: avoid anon_vma_chain allocation under anon_vma lock
Hugh Dickins points out that lockdep (correctly) spots a potential
deadlock on the anon_vma lock, because we now do a GFP_KERNEL allocation
of anon_vma_chain while doing anon_vma_clone(). The problem is that
page reclaim will want to take the anon_vma lock of any anonymous pages
that it will try to reclaim.
So re-organize the code in anon_vma_clone() slightly: first do just a
GFP_NOWAIT allocation, which will usually work fine. But if that fails,
let's just drop the lock and re-do the allocation, now with GFP_KERNEL.
End result: not only do we avoid the locking problem, this also ends up
getting better concurrency in case the allocation does need to block.
Tim Chen reports that with all these anon_vma locking tweaks, we're now
almost back up to the spinlock performance.
Reported-and-tested-by: Hugh Dickins <hughd@google.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Fri, 17 Jun 2011 11:54:23 +0000 (13:54 +0200)]
mm: avoid repeated anon_vma lock/unlock sequences in unlink_anon_vmas()
This matches the anon_vma_clone() case, and uses the same lock helper
functions. Because of the need to potentially release the anon_vma's,
it's a bit more complex, though.
We traverse the 'vma->anon_vma_chain' in two phases: the first loop gets
the anon_vma lock (with the helper function that only takes the lock
once for the whole loop), and removes any entries that don't need any
more processing.
The second phase just traverses the remaining list entries (without
holding the anon_vma lock), and does any actual freeing of the
anon_vma's that is required.
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Hugh Dickins <hughd@google.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 17 Jun 2011 03:44:51 +0000 (20:44 -0700)]
mm: avoid repeated anon_vma lock/unlock sequences in anon_vma_clone()
In anon_vma_clone() we traverse the vma->anon_vma_chain of the source
vma, locking the anon_vma for each entry.
But they are all going to have the same root entry, which means that
we're locking and unlocking the same lock over and over again. Which is
expensive in locked operations, but can get _really_ expensive when that
root entry sees any kind of lock contention.
In fact, Tim Chen reports a big performance regression due to this: when
we switched to use a mutex instead of a spinlock, the contention case
gets much worse.
So to alleviate this all, this commit creates a small helper function
(lock_anon_vma_root()) that can be used to take the lock just once
rather than taking and releasing it over and over again.
We still have the same "take the lock and release" it behavior in the
exit path (in unlink_anon_vmas()), but that one is a bit harder to fix
since we're actually freeing the anon_vma entries as we go, and that
will touch the lock too.
Reported-and-tested-by: Tim Chen <tim.c.chen@linux.intel.com> Tested-by: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel J Blueman [Fri, 17 Jun 2011 18:32:19 +0000 (11:32 -0700)]
drm/i915: Fix gen6 (SNB) missed BLT ring interrupts.
The failure appeared in dmesg as:
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt
ring idle [waiting on 35064155, at 35064155], missed IRQ?
This works around that problem on by making the blitter command
streamer write interrupt state to the Hardware Status Page when a
MI_USER_INTERRUPT command is decoded, which appears to force the seqno
out to memory before the interrupt happens.
v1->v2: Moved to prior interrupt handler installation and RMW flags as
per feedback.
v2->v3: Removed RMW of flags (by anholt)
Cc: stable@kernel.org Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> [v1] Tested-by: Eric Anholt <eric@anholt.net> [v1,v3]
(incidence of the bug with a testcase went from avg 2/1000 to
0/12651 in the latest test run (plus more for v1)) Tested-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Tested-by: Robert Hooker <robert.hooker@canonical.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33394 Signed-off-by: Dave Airlie <airlied@redhat.com>
Jeff Ohlstein [Fri, 17 Jun 2011 20:55:38 +0000 (13:55 -0700)]
msm: timer: compensate for timer shift in msm_read_timer_count
Some msm targets have timers whose lower bits are unreliable. So, we
present our timers as lower frequency than they actually are, and ignore
the bottom 5 bits on such targets. This compensation was erroneously
removed from the msm_read_timer_count function, so restore it.
This was broken by 94790ec25 "msm: timer: SMP timer support for msm".
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Chris Mason [Fri, 17 Jun 2011 20:14:09 +0000 (16:14 -0400)]
Btrfs: avoid delayed metadata items during commits
Snapshot creation has two phases. One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.
The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.
This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Eric Dumazet [Fri, 17 Jun 2011 20:25:39 +0000 (16:25 -0400)]
inet_diag: fix inet_diag_bc_audit()
A malicious user or buggy application can inject code and trigger an
infinite loop in inet_diag_bc_audit()
Also make sure each instruction is aligned on 4 bytes boundary, to avoid
unaligned accesses.
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Shved [Fri, 17 Jun 2011 06:25:11 +0000 (06:25 +0000)]
gigaset: call module_put before restart of if_open()
if_open() calls try_module_get(), and after an attempt to lock a mutex
the if_open() function may return -ERESTARTSYS without
putting the module. Then, when if_open() is executed again,
try_module_get() is called making the reference counter of THIS_MODULE
greater than one at successful exit from if_open(). The if_close()
function puts the module only once, and as a result it can't be
unloaded.
This patch adds module_put call before the return from if_open().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Pavel Shved <shved@ispras.ru> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Pavel Shved [Fri, 17 Jun 2011 06:25:10 +0000 (06:25 +0000)]
farsync: add module_put to error path in fst_open()
The fst_open() function, after a successful try_module_get() may return
an error code if hdlc_open() returns it. However, it does not put the
module on this error path.
This patch adds the necessary module_put() call.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Pavel Shved <shved@ispras.ru> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Eric Dumazet [Fri, 17 Jun 2011 03:45:15 +0000 (03:45 +0000)]
net: rfs: enable RFS before first data packet is received
Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit :
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Fri, 17 Jun 2011 00:50:46 +0100
>
> > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote:
> >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
> >> goto discard;
> >>
> >> if (nsk != sk) {
> >> + sock_rps_save_rxhash(nsk, skb->rxhash);
> >> if (tcp_child_process(sk, nsk, skb)) {
> >> rsk = nsk;
> >> goto reset;
> >>
> >
> > I haven't tried this, but it looks reasonable to me.
> >
> > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar.
>
> Indeed ipv6 side needs the same fix.
>
> Eric please add that part and resubmit. And in fact I might stick
> this into net-2.6 instead of net-next-2.6
>
OK, here is the net-2.6 based one then, thanks !
[PATCH v2] net: rfs: enable RFS before first data packet is received
First packet received on a passive tcp flow is not correctly RFS
steered.
One sock_rps_record_flow() call is missing in inet_accept()
But before that, we also must record rxhash when child socket is setup.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
The RIPTR and TIPTR (receive/transmit internal temporary data pointer),
used by microcode as a temporary buffer for data, must be 32-byte aligned
according to the RM for MPC8247.
Tested on mgcoge.
Signed-off-by: Clive Stubbings <clive.stubbings@xentech.co.uk> Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
cc: Pantelis Antoniou <pantelis.antoniou@gmail.com>
cc: Vitaly Bordug <vbordug@ru.mvista.com>
cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Mitko Haralanov [Thu, 9 Jun 2011 20:27:26 +0000 (20:27 +0000)]
IB/qib: Ensure that LOS and DFE are being turned off
Due to timing, it is possible for the LOS and DFE to remain on. This
is due to the link progressing to LinkUP prior to the driver getting
the first Status Changed interrupt. By expanding the conditions under
which LOS is turned off and DFE timeout is being set, timing is no
longer an issue.
Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Steve Wise [Tue, 14 Jun 2011 20:59:27 +0000 (20:59 +0000)]
RDMA/cxgb4: Couple of abort fixes
- fix a race where the driver could end up sending a close_con_req
after an abort_rpl. In c4iw_ep_disconnect(), send abort or close
request with the ep mutex held.
- fix a hang where driver fails to wake up when a connection is reset
during a normal close. Wake up any waiters in the interrupt path,
and correctly cleanup after rdma_fini() failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Miao Xie [Wed, 15 Jun 2011 10:47:30 +0000 (10:47 +0000)]
btrfs: fix wrong reservation when doing delayed inode operations
We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened:
Steve Wise [Wed, 1 Jun 2011 17:49:14 +0000 (17:49 +0000)]
RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs
Memory allocated for user CQs gets rounded up to the next page
boundary. And after rounding, we recalculate the resulting IQ depth
and we need to make sure we don't exceed the HW limits.
This bug can result a much smaller CQ allocated than was expected if
the HW size field is exceeded, resulting in CQ overflow failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Chris Mason [Tue, 14 Jun 2011 00:00:16 +0000 (20:00 -0400)]
Btrfs: fix relocation races
The recent commit to get rid of our trans_mutex introduced
some races with block group relocation. The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.
This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations. The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Linus Torvalds [Fri, 17 Jun 2011 17:36:32 +0000 (10:36 -0700)]
Merge branches 'gpio/merge' and 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
* 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6:
gpio: add GPIOF_ values regardless on kconfig settings
gpio: include linux/gpio.h where needed
gpio/omap4: Fix missing interrupts during device wakeup due to IOPAD.
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
spi/bfin_spi: fix handling of default bits per word setting