ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails
Upon corrupted inode or disk failures, we may fail after we already
allocate some blocks from the inode or take some blocks from the
inode's preallocation list, but before we successfully insert the
corresponding extent to the extent tree. In this case, we should free
any allocated blocks and discard the inode's preallocated blocks
because the entries in the inode's preallocation list may be in an
inconsistent state.
ext4: fix i_blocks/quota accounting when extent insertion fails
The current implementation of ext4_free_blocks() always calls
dquot_free_block This looks quite sensible in the most cases: blocks
to be freed are associated with inode and were accounted in quota and
i_blocks some time ago.
However, there is a case when blocks to free were not accounted by the
time calling ext4_free_blocks() yet:
1. delalloc is on, write_begin pre-allocated some space in quota
2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
3. then ext4_ext_map_blocks() gets an error (e.g. ENOSPC) from
ext4_ext_insert_extent() and calls ext4_free_blocks().
In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
turn, decrements i_blocks for blocks which were not accounted yet (due
to delalloc) After clean umount, e2fsck reports something like:
> Inode 21, i_blocks is 5080, should be 5128. Fix<y>?
because i_blocks was erroneously decremented as explained above.
The patch fixes the problem by passing the new flag
EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
that the dquot_free_block() call be skipped.
Theodore Ts'o [Thu, 30 Jun 2011 01:44:45 +0000 (21:44 -0400)]
ext4: remove loop around bio_alloc()
These days, bio_alloc() is guaranteed to never fail (as long as nvecs
is less than BIO_MAX_PAGES), so we don't need the loop around the
struct bio allocation.
Amir Goldstein [Mon, 27 Jun 2011 23:40:50 +0000 (19:40 -0400)]
ext4: move ext4_ind_* functions from inode.c to indirect.c
This patch moves functions from inode.c to indirect.c.
The moved functions are ext4_ind_* functions and their helpers.
Functions called from inode.c are declared extern.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 27 Jun 2011 23:16:04 +0000 (19:16 -0400)]
ext4: move common truncate functions to header file
Move two functions that will be needed by the indirect functions to be
moved to indirect.c as well as inode.c to truncate.h as inline
functions, so that we can avoid having duplicate copies of the
function (which can be a maintenance problem) without having to expose
them as globally functions.
Theodore Ts'o [Mon, 27 Jun 2011 23:16:02 +0000 (19:16 -0400)]
ext4: move __ext4_check_blockref to block_validity.c
In preparation for moving the indirect functions to a separate file,
move __ext4_check_blockref() to block_validity.c and rename it to
ext4_check_blockref() which is exported as globally visible function.
Also, rename the cpp macro ext4_check_inode_blockref() to
ext4_ind_check_inode(), to make it clear that it is only valid for use
with non-extent mapped inodes.
Amir Goldstein [Mon, 27 Jun 2011 21:10:28 +0000 (17:10 -0400)]
ext4: rename ext4_indirect_* funcs to ext4_ind_*
We are going to move all ext4_ind_* functions to indirect.c.
Before we do that, let's rename 2 functions called ext4_indirect_*
to ext4_ind_*, to keep to the naming convention.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Amir Goldstein [Mon, 27 Jun 2011 20:36:31 +0000 (16:36 -0400)]
ext4: split ext4_ind_truncate from ext4_truncate
We are about to move all indirect inode functions to a new file.
Before we do that, let's split ext4_ind_truncate() out of ext4_truncate()
leaving only generic code in the latter, so we will be able to move
ext4_ind_truncate() to the new file.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Robin Dong [Mon, 27 Jun 2011 19:35:53 +0000 (15:35 -0400)]
ext4: fix incorrect error msg in ext4_ext_insert_index
In function ext4_ext_insert_index when eh_entries of curp is
bigger than eh_max, error messages will be printed out, but the content
is about logical and ei_block, that's incorret.
Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tao Ma [Mon, 27 Jun 2011 16:36:29 +0000 (12:36 -0400)]
jbd2: use WRITE_SYNC in journal checkpoint
In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:
So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Reported-by: Robin Dong <sanbai@taobao.com>
Linus Torvalds [Tue, 21 Jun 2011 17:22:35 +0000 (10:22 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: Fix oops in jbd2_journal_remove_journal_head()
jbd2: Remove obsolete parameters in the comments for some jbd2 functions
ext4: fixed tracepoints cleanup
ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
ext4: Fix max file size and logical block counting of extent format file
ext4: correct comments for ext4_free_blocks()
Linus Torvalds [Tue, 21 Jun 2011 03:13:49 +0000 (20:13 -0700)]
vfs: i_state needs to be 'unsigned long' for now
Commit 13e12d14e2dc ("vfs: reorganize 'struct inode' layout a bit")
moved things around a bit changed i_state to be unsigned int instead of
unsigned long. That was to help structure layout for the 64-bit case,
and shrink 'struct inode' a bit (admittedly that only happened when
spinlock debugging was on and i_flags didn't pack with i_lock).
However, Meelis Roos reports that this results in unaligned exceptions
on sprc, and it turns out that the bit-locking primitives that we use
for the I_NEW bit want to use the bitops. Which want 'unsigned long',
not 'unsigned int'.
We really should fix the bit locking code to not have that kind of
requirement, but that's a much bigger change. So for now, revert that
field back to 'unsigned long' (but keep the other re-ordering changes
from the commit that caused this).
Andi points out that we have played games with this in 'struct page', so
it's solvable with other hacks too, but since right now the struct inode
size advantage only happens with some rare config options, it's not
worth fighting.
It _would_ be worth fixing the bitlocking code, though. Especially
since there is no type safety in the bitlocking code (this never caused
any warnings, and worked fine on x86-64, because the bitlocks take a
'void *' and x86-64 doesn't care that deeply about alignment). So it's
currently a very easy problem to trigger by mistake and never notice.
Reported-by: Meelis Roos <mroos@linux.ee> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 21 Jun 2011 03:12:48 +0000 (20:12 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms/r6xx+: voltage fixes
drm/nouveau: drop leftover debugging
drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
drm/radeon/kms: add missing param for dce3.2 DP transmitter setup
drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
drm/nv50/disp: fix gamma with page flipping overlay turned on
drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
drm/nouveau: fix big-endian switch
Linus Torvalds [Tue, 21 Jun 2011 03:10:52 +0000 (20:10 -0700)]
Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix break_lease flags on nfsd open
nfsd: link returns nfserr_delay when breaking lease
nfsd: v4 support requires CRYPTO
nfsd: fix dependency of nfsd on auth_rpcgss
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
pxa168_eth: fix race in transmit path.
ipv4, ping: Remove duplicate icmp.h include
netxen: fix race in skb->len access
sgi-xp: fix a use after free
hp100: fix an skb->len race
netpoll: copy dev name of slaves to struct netpoll
ipv4: fix multicast losses
r8169: fix static initializers.
inet_diag: fix inet_diag_bc_audit()
gigaset: call module_put before restart of if_open()
farsync: add module_put to error path in fst_open()
net: rfs: enable RFS before first data packet is received
fs_enet: fix freescale FCC ethernet dp buffer alignment
netdev: bfin_mac: fix memory leak when freeing dma descriptors
vlan: don't call ndo_vlan_rx_register on hardware that doesn't have vlan support
caif: Bugfix - XOFF removed channel from caif-mux
tun: teach the tun/tap driver to support netpoll
dp83640: drop PHY status frames in the driver.
dp83640: fix phy status frame event parsing
phylib: Allow BCM63XX PHY to be selected only on BCM63XX.
...
Linus Torvalds [Tue, 21 Jun 2011 03:09:15 +0000 (20:09 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
fix comment in generic_permission()
kill obsolete comment for follow_down()
proc_sys_permission() is OK in RCU mode
reiserfs_permission() doesn't need to bail out in RCU mode
proc_fd_permission() is doesn't need to bail out in RCU mode
nilfs2_permission() doesn't need to bail out in RCU mode
logfs doesn't need ->permission() at all
coda_ioctl_permission() is safe in RCU mode
cifs_permission() doesn't need to bail out in RCU mode
bad_inode_permission() is safe from RCU mode
ubifs: dereferencing an ERR_PTR in ubifs_mount()
Richard Cochran [Sun, 19 Jun 2011 21:48:06 +0000 (21:48 +0000)]
pxa168_eth: fix race in transmit path.
Because the socket buffer is freed in the completion interrupt, it is not
safe to access it after submitting it to the hardware.
Cc: stable@kernel.org Cc: Sachin Sanap <ssanap@marvell.com> Cc: Zhangfei Gao <zgao6@marvell.com> Cc: Philip Rakity <prakity@marvell.com> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 19 Jun 2011 20:26:15 +0000 (20:26 +0000)]
netxen: fix race in skb->len access
As soon as skb is given to hardware, TX completion can free skb under
us.
Therefore, we should update dev stats before kicking the device.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 20 Jun 2011 16:01:33 +0000 (09:01 -0700)]
Merge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/setup: Fix for incorrect xen_extra_mem_start.
xen: When calling power_off, don't call the halt function.
xen: Fix compile warning when CONFIG_SMP is not defined.
xen: support CONFIG_MAXSMP
xen: partially revert "xen: set max_pfn_mapped to the last pfn mapped"
Linus Torvalds [Mon, 20 Jun 2011 15:59:46 +0000 (08:59 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: sh_keysc - 8x8 MODE_6 fix
Input: omap-keypad - add missing input_sync()
Input: evdev - try to wake up readers only if we have full packet
Input: properly assign return value of clamp() macro.
Linus Torvalds [Mon, 20 Jun 2011 15:58:53 +0000 (08:58 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: avoid delayed metadata items during commits
btrfs: fix uninitialized return value
btrfs: fix wrong reservation when doing delayed inode operations
btrfs: Remove unused sysfs code
btrfs: fix dereference of ERR_PTR value
Btrfs: fix relocation races
Btrfs: set no_trans_join after trying to expand the transaction
Btrfs: protect the pending_snapshots list with trans_lock
Btrfs: fix path leakage on subvol deletion
Btrfs: drop the delalloc_bytes check in shrink_delalloc
Btrfs: check the return value from set_anon_super
Linus Torvalds [Mon, 20 Jun 2011 15:58:07 +0000 (08:58 -0700)]
Merge branch 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Fix register corruption in pvclock_scale_delta
KVM: MMU: fix opposite condition in mapping_level_dirty_bitmap
KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS
KVM: MMU: Fix build warnings in walk_addr_generic()
Al Viro [Sun, 19 Jun 2011 17:01:04 +0000 (13:01 -0400)]
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
inode_permission() calls devcgroup_inode_permission() and almost all such
calls are _not_ for device nodes; let's at least keep the common path
straight...
Dan Carpenter [Mon, 20 Jun 2011 07:10:24 +0000 (10:10 +0300)]
ubifs: dereferencing an ERR_PTR in ubifs_mount()
d251ed271d5 "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
J. Bruce Fields [Tue, 7 Jun 2011 15:50:23 +0000 (11:50 -0400)]
nfsd4: fix break_lease flags on nfsd open
Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!
Reported-by: Casey Bodley <cbodley@citi.umich.edu> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Dave Airlie [Mon, 20 Jun 2011 02:02:38 +0000 (12:02 +1000)]
Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
drm/nv50/disp: fix gamma with page flipping overlay turned on
drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
drm/nouveau: fix big-endian switch
Dave Airlie [Sat, 18 Jun 2011 03:59:51 +0000 (03:59 +0000)]
drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
Since we were calling the wptr function before checking if the IH was
even enabled, or the GPU wasn't shutdown, we'd get spam in the logs when
the GPU readback 0xffffffff. This reorders things so we return early
in the no IH and GPU shutdown cases.
Reported-and-tested-by: ManDay on #radeon Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Fri, 17 Jun 2011 17:13:52 +0000 (13:13 -0400)]
drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
Certain revisions of the vbios on DCE3.2 cards have a bug
in the transmitter control table which prevents duallink from
being enabled properly on some cards. The action switch statement
jumps to the wrong offset for the OUTPUT_ENABLE action. The fix
is to use the ENABLE action rather than the OUTPUT_ENABLE action
on the affected cards. In fixed version of the vbios, both
actions jump to the same offset, so the change should be safe.
Reported-and-tested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
Eric Dumazet [Sun, 19 Jun 2011 12:43:33 +0000 (12:43 +0000)]
hp100: fix an skb->len race
As soon as skb is given to hardware and spinlock released, TX completion
can free skb under us. Therefore, we should update netdev stats before
spinlock release.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zachary Amsden [Thu, 16 Jun 2011 03:50:04 +0000 (20:50 -0700)]
KVM: Fix register corruption in pvclock_scale_delta
The 128-bit multiply in pvclock.h was missing an output constraint for
EDX which caused a register corruption to appear. Thanks to Ulrich for
diagnosing the EDX corruption and Avi for providing this fix.
Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Borislav Petkov [Mon, 30 May 2011 20:11:17 +0000 (22:11 +0200)]
KVM: MMU: Fix build warnings in walk_addr_generic()
On 3.0-rc1 I get
In file included from arch/x86/kvm/mmu.c:2856:
arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
In file included from arch/x86/kvm/mmu.c:2852:
arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
Linus Torvalds [Sun, 19 Jun 2011 16:00:18 +0000 (09:00 -0700)]
Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tools/perf: Fix static build of perf tool
tracing: Fix regression in printk_formats file
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: Make watchdog robust vs. interruption
timerfd: Fix wakeup of processes when timer is cancelled on clock change
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, MAINTAINERS: Add x86 MCE people
x86, efi: Do not reserve boot services regions within reserved areas
Linus Torvalds [Sun, 19 Jun 2011 15:56:56 +0000 (08:56 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Move RCU_BOOST #ifdefs to header file
rcu: use softirq instead of kthreads except when RCU_BOOST=y
rcu: Use softirq to address performance regression
rcu: Simplify curing of load woes
x86, efi: Do not reserve boot services regions within reserved areas
Commit 916f676f8dc started reserving boot service code since some systems
require you to keep that code around until SetVirtualAddressMap is called.
However, in some cases those areas will overlap with reserved regions.
The proper medium-term fix is to fix the bootloader to prevent the
conflicts from occurring by moving the kernel to a better position,
but the kernel should check for this possibility, and only reserve regions
which can be reserved.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Link: http://lkml.kernel.org/r/4DF7A005.1050407@gmail.com Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric Dumazet [Sat, 18 Jun 2011 18:59:18 +0000 (11:59 -0700)]
ipv4: fix multicast losses
Knut Tidemann found that first packet of a multicast flow was not
correctly received, and bisected the regression to commit b23dd4fe42b4
(Make output route lookup return rtable directly.)
Special thanks to Knut, who provided a very nice bug report, including
sample programs to demonstrate the bug.
Reported-and-bisectedby: Knut Tidemann <knut.andre.tidemann@jotron.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise the updated evdev driver (commit cdda911c34006f1089f3c87b1a1f,
"Input: evdev - only signal polls on full packets") no longer works on
top of omap-keypad.
Tested on Amstrad Delta.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl> Reviewed-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dmitry Torokhov [Sat, 18 Jun 2011 09:50:11 +0000 (02:50 -0700)]
Input: evdev - try to wake up readers only if we have full packet
We should only wake waiters on the event device when we actually post
an EV_SYN/SYN_REPORT to the queue. Otherwise we end up making waiting
threads runnable only to go right back to sleep because the device
still isn't readable.
Reported-by: Jeffrey Brown <jeffbrown@android.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Guenter Roeck [Tue, 24 May 2011 19:34:55 +0000 (12:34 -0700)]
hwmon: (s3c) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Guenter Roeck [Tue, 24 May 2011 19:34:12 +0000 (12:34 -0700)]
hwmon: (ibmpex) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Guenter Roeck [Tue, 24 May 2011 19:33:26 +0000 (12:33 -0700)]
hwmon: (ibmaem) Initialize sysfs attributes
Initialize dynamically allocated sysfs attributes before device_create_file()
call to suppress lockdep_init_map() warning if lockdep debugging is enabled.
Linus Torvalds [Sat, 18 Jun 2011 04:13:43 +0000 (21:13 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] powernow-k8: Don't try to transition if the pstate is incorrect
[CPUFREQ] powernow-k8: Don't notify of successful transition if we failed (vid case).
[CPUFREQ] Don't set stat->last_index to -1 if the pol->cur has incorrect value.
Linus Torvalds [Sat, 18 Jun 2011 02:05:36 +0000 (19:05 -0700)]
mm: avoid anon_vma_chain allocation under anon_vma lock
Hugh Dickins points out that lockdep (correctly) spots a potential
deadlock on the anon_vma lock, because we now do a GFP_KERNEL allocation
of anon_vma_chain while doing anon_vma_clone(). The problem is that
page reclaim will want to take the anon_vma lock of any anonymous pages
that it will try to reclaim.
So re-organize the code in anon_vma_clone() slightly: first do just a
GFP_NOWAIT allocation, which will usually work fine. But if that fails,
let's just drop the lock and re-do the allocation, now with GFP_KERNEL.
End result: not only do we avoid the locking problem, this also ends up
getting better concurrency in case the allocation does need to block.
Tim Chen reports that with all these anon_vma locking tweaks, we're now
almost back up to the spinlock performance.
Reported-and-tested-by: Hugh Dickins <hughd@google.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Fri, 17 Jun 2011 11:54:23 +0000 (13:54 +0200)]
mm: avoid repeated anon_vma lock/unlock sequences in unlink_anon_vmas()
This matches the anon_vma_clone() case, and uses the same lock helper
functions. Because of the need to potentially release the anon_vma's,
it's a bit more complex, though.
We traverse the 'vma->anon_vma_chain' in two phases: the first loop gets
the anon_vma lock (with the helper function that only takes the lock
once for the whole loop), and removes any entries that don't need any
more processing.
The second phase just traverses the remaining list entries (without
holding the anon_vma lock), and does any actual freeing of the
anon_vma's that is required.
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Hugh Dickins <hughd@google.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 17 Jun 2011 03:44:51 +0000 (20:44 -0700)]
mm: avoid repeated anon_vma lock/unlock sequences in anon_vma_clone()
In anon_vma_clone() we traverse the vma->anon_vma_chain of the source
vma, locking the anon_vma for each entry.
But they are all going to have the same root entry, which means that
we're locking and unlocking the same lock over and over again. Which is
expensive in locked operations, but can get _really_ expensive when that
root entry sees any kind of lock contention.
In fact, Tim Chen reports a big performance regression due to this: when
we switched to use a mutex instead of a spinlock, the contention case
gets much worse.
So to alleviate this all, this commit creates a small helper function
(lock_anon_vma_root()) that can be used to take the lock just once
rather than taking and releasing it over and over again.
We still have the same "take the lock and release" it behavior in the
exit path (in unlink_anon_vmas()), but that one is a bit harder to fix
since we're actually freeing the anon_vma entries as we go, and that
will touch the lock too.
Reported-and-tested-by: Tim Chen <tim.c.chen@linux.intel.com> Tested-by: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel J Blueman [Fri, 17 Jun 2011 18:32:19 +0000 (11:32 -0700)]
drm/i915: Fix gen6 (SNB) missed BLT ring interrupts.
The failure appeared in dmesg as:
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt
ring idle [waiting on 35064155, at 35064155], missed IRQ?
This works around that problem on by making the blitter command
streamer write interrupt state to the Hardware Status Page when a
MI_USER_INTERRUPT command is decoded, which appears to force the seqno
out to memory before the interrupt happens.
v1->v2: Moved to prior interrupt handler installation and RMW flags as
per feedback.
v2->v3: Removed RMW of flags (by anholt)
Cc: stable@kernel.org Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> [v1] Tested-by: Eric Anholt <eric@anholt.net> [v1,v3]
(incidence of the bug with a testcase went from avg 2/1000 to
0/12651 in the latest test run (plus more for v1)) Tested-by: Kenneth Graunke <kenneth@whitecape.org> [v1] Tested-by: Robert Hooker <robert.hooker@canonical.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33394 Signed-off-by: Dave Airlie <airlied@redhat.com>
Jeff Ohlstein [Fri, 17 Jun 2011 20:55:38 +0000 (13:55 -0700)]
msm: timer: compensate for timer shift in msm_read_timer_count
Some msm targets have timers whose lower bits are unreliable. So, we
present our timers as lower frequency than they actually are, and ignore
the bottom 5 bits on such targets. This compensation was erroneously
removed from the msm_read_timer_count function, so restore it.
This was broken by 94790ec25 "msm: timer: SMP timer support for msm".
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Chris Mason [Fri, 17 Jun 2011 20:14:09 +0000 (16:14 -0400)]
Btrfs: avoid delayed metadata items during commits
Snapshot creation has two phases. One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.
The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.
This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Eric Dumazet [Fri, 17 Jun 2011 20:25:39 +0000 (16:25 -0400)]
inet_diag: fix inet_diag_bc_audit()
A malicious user or buggy application can inject code and trigger an
infinite loop in inet_diag_bc_audit()
Also make sure each instruction is aligned on 4 bytes boundary, to avoid
unaligned accesses.
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Shved [Fri, 17 Jun 2011 06:25:11 +0000 (06:25 +0000)]
gigaset: call module_put before restart of if_open()
if_open() calls try_module_get(), and after an attempt to lock a mutex
the if_open() function may return -ERESTARTSYS without
putting the module. Then, when if_open() is executed again,
try_module_get() is called making the reference counter of THIS_MODULE
greater than one at successful exit from if_open(). The if_close()
function puts the module only once, and as a result it can't be
unloaded.
This patch adds module_put call before the return from if_open().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Pavel Shved <shved@ispras.ru> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Pavel Shved [Fri, 17 Jun 2011 06:25:10 +0000 (06:25 +0000)]
farsync: add module_put to error path in fst_open()
The fst_open() function, after a successful try_module_get() may return
an error code if hdlc_open() returns it. However, it does not put the
module on this error path.
This patch adds the necessary module_put() call.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Pavel Shved <shved@ispras.ru> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Eric Dumazet [Fri, 17 Jun 2011 03:45:15 +0000 (03:45 +0000)]
net: rfs: enable RFS before first data packet is received
Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit :
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Fri, 17 Jun 2011 00:50:46 +0100
>
> > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote:
> >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
> >> goto discard;
> >>
> >> if (nsk != sk) {
> >> + sock_rps_save_rxhash(nsk, skb->rxhash);
> >> if (tcp_child_process(sk, nsk, skb)) {
> >> rsk = nsk;
> >> goto reset;
> >>
> >
> > I haven't tried this, but it looks reasonable to me.
> >
> > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar.
>
> Indeed ipv6 side needs the same fix.
>
> Eric please add that part and resubmit. And in fact I might stick
> this into net-2.6 instead of net-next-2.6
>
OK, here is the net-2.6 based one then, thanks !
[PATCH v2] net: rfs: enable RFS before first data packet is received
First packet received on a passive tcp flow is not correctly RFS
steered.
One sock_rps_record_flow() call is missing in inet_accept()
But before that, we also must record rxhash when child socket is setup.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
The RIPTR and TIPTR (receive/transmit internal temporary data pointer),
used by microcode as a temporary buffer for data, must be 32-byte aligned
according to the RM for MPC8247.
Tested on mgcoge.
Signed-off-by: Clive Stubbings <clive.stubbings@xentech.co.uk> Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
cc: Pantelis Antoniou <pantelis.antoniou@gmail.com>
cc: Vitaly Bordug <vbordug@ru.mvista.com>
cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@conan.davemloft.net>
Miao Xie [Wed, 15 Jun 2011 10:47:30 +0000 (10:47 +0000)]
btrfs: fix wrong reservation when doing delayed inode operations
We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened:
Chris Mason [Tue, 14 Jun 2011 00:00:16 +0000 (20:00 -0400)]
Btrfs: fix relocation races
The recent commit to get rid of our trans_mutex introduced
some races with block group relocation. The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.
This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations. The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Linus Torvalds [Fri, 17 Jun 2011 17:36:32 +0000 (10:36 -0700)]
Merge branches 'gpio/merge' and 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
* 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6:
gpio: add GPIOF_ values regardless on kconfig settings
gpio: include linux/gpio.h where needed
gpio/omap4: Fix missing interrupts during device wakeup due to IOPAD.
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
spi/bfin_spi: fix handling of default bits per word setting