Dave Chinner [Wed, 7 Mar 2012 04:50:24 +0000 (04:50 +0000)]
xfs: remove remaining scraps of struct xfs_iomap
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Wed, 7 Mar 2012 04:50:25 +0000 (04:50 +0000)]
xfs: fix inode lookup race
When we get concurrent lookups of the same inode that is not in the
per-AG inode cache, there is a race condition that triggers warnings
in unlock_new_inode() indicating that we are initialising an inode
that isn't in a the correct state for a new inode.
When we do an inode lookup via a file handle or a bulkstat, we don't
serialise lookups at a higher level through the dentry cache (i.e.
pathless lookup), and so we can get concurrent lookups of the same
inode.
The race condition is between the insertion of the inode into the
cache in the case of a cache miss and a concurrently lookup:
Thread 1 Thread 2
xfs_iget()
xfs_iget_cache_miss()
xfs_iread()
lock radix tree
radix_tree_insert()
rcu_read_lock
radix_tree_lookup
lock inode flags
XFS_INEW not set
igrab()
unlock inode flags
rcu_read_unlock
use uninitialised inode
.....
lock inode flags
set XFS_INEW
unlock inode flags
unlock radix tree
xfs_setup_inode()
inode flags = I_NEW
unlock_new_inode()
WARNING as inode flags != I_NEW
This can lead to inode corruption, inode list corruption, etc, and
is generally a bad thing to occur.
Fix this by setting XFS_INEW before inserting the inode into the
radix tree. This will ensure any concurrent lookup will find the new
inode with XFS_INEW set and that forces the lookup to wait until the
XFS_INEW flag is removed before allowing the lookup to succeed.
cc: <stable@vger.kernel.org> # for 3.0.x, 3.2.x Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
If we initialize the slab caches for the quota code when XFS is loaded there
is no need for a global and reference counted quota manager structure. Drop
all this overhead and also fix the error handling during quota initialization.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Instead of keeping a separate per-filesystem list of dquots we can walk
the radix tree for the two places where we need to iterate all quota
structures.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: use per-filesystem radix trees for dquot lookup
Replace the global hash tables for looking up in-memory dquot structures
with per-filesystem radix trees to allow scaling to a large number of
in-memory dquot structures.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Replace the global dquot lru lists with a per-filesystem one.
Note that the shrinker isn't wire up to the per-superblock VFS shrinker
infrastructure as would have problems summing up and splitting the counts
for inodes and dquots. I don't think this is a major problem as the quota
cache isn't as interwinded with the inode cache as the dentry cache is,
because an inode that is dropped from the cache will generally release
a dquot reference, but most of the time it won't be the last one.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Switch the quota code over to use the generic XFS statistics infrastructure.
While the legacy /proc/fs/xfs/xqm and /proc/fs/xfs/xqmstats interfaces are
preserved for now the statistics that still have a meaning with the current
code are now also available from /proc/fs/xfs/stats.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Add an in-memory only flag to say we logged timestamps only, and use it to
check if fdatasync can optimize away the log force.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: split in-core and on-disk inode log item fields
Add a new ili_fields member to the inode log item to isolate the in-memory
flags from the ones that actually go to the log. This will allow tracking
timestamp-only updates for fdatasync and O_DSYNC in the next patch and
prepares for divorcing the on-disk log format from the in-memory log item
a little further down the road.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Move all code messing with the inode log item flags into xfs_inode_item_format
to make sure xfs_inode_item_size really only calculates the the number of
vectors, but doesn't modify any state of the inode item.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Timestamps on regular files are the last metadata that XFS does not update
transactionally. Now that we use the delaylog mode exclusively and made
the log scode scale extremly well there is no need to bypass that code for
timestamp updates. Logging all updates allows to drop a lot of code, and
will allow for further performance improvements later on.
Note that this patch drops optimized handling of fdatasync - it will be
added back in a separate commit.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Do not use unlogged metadata updates and the VFS dirty bit for updating
the file size after writeback. In addition to causing various problems
with updates getting delayed for far too long this also drags in the
unscalable VFS dirty tracking, and is one of the few remaining unlogged
metadata updates.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: log file size updates as part of unwritten extent conversion
If we convert and unwritten extent past the current i_size log the size update
as part of the extent manipulation transactions instead of doing an unlogged
metadata update later.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: do not require an ioend for new EOF calculation
Replace xfs_ioend_new_eof with a new inline xfs_new_eof helper that
doesn't require and ioend, and is available also outside of xfs_aops.c.
Also make the code a bit more clear by using a normal if statement
instead of a slightly misleading MIN().
Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
The new concurrency managed workqueues are cheap enough that we can create
per-filesystem instead of global workqueues. This allows us to remove the
trylock or defer scheme on the ilock, which is not helpful once we have
outstanding log reservations until finishing a size update.
Also allow the default concurrency on this workqueues so that I/O completions
blocking on the ilock for one inode do not block process for another inode.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Now that XFS takes quota reservations into account there is no need to flush
anything before reporting quotas - in addition to beeing fully transactional
all quota information is also 100% coherent with the rest of the filesystem
now.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Report all quota usage including the currently pending reservations. This
avoids the need to flush delalloc space before gathering quota information,
and matches quota enforcement, which already takes the reservations into
account.
This fixes xfstests 270.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: merge xfs_qm_export_dquot into xfs_qm_scall_getquota
The is no good reason to have these two separate, and for the next change
we would need the full struct xfs_dquot in xfs_qm_export_dquot, so better
just fold the code now instead of changing it spuriously.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Alex Elder [Thu, 16 Feb 2012 22:01:00 +0000 (22:01 +0000)]
xfs: only take the ILOCK in xfs_reclaim_inode()
At the end of xfs_reclaim_inode(), the inode is locked in order to
we wait for a possible concurrent lookup to complete before the
inode is freed. This synchronization step was taking both the ILOCK
and the IOLOCK, but the latter was causing lockdep to produce
reports of the possibility of deadlock.
It turns out that there's no need to acquire the IOLOCK at this
point anyway. It may have been required in some earlier version of
the code, but there should be no need to take the IOLOCK in
xfs_iget(), so there's no (longer) any need to get it here for
synchronization. Add an assertion in xfs_iget() as a reminder
of this assumption.
Dave Chinner diagnosed this on IRC, and Christoph Hellwig suggested
no longer including the IOLOCK. I just put together the patch.
Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Split the log regrant case out of xfs_log_reserve into a separate function,
and merge xlog_grant_log_space and xlog_regrant_write_log_space into their
respective callers. Also replace the XFS_LOG_PERM_RESERV flag, which easily
got misused before the previous cleanups with a simple boolean parameter.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: share code for grant head availability checks
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Add a new data structure to allow sharing code between the log grant and
regrant code.
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
The tic->t_wait waitqueues can never have more than a single waiter
on them, so we can easily replace them with a task_struct pointer
and wake_up_process.
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Remove the now unused opportunistic parameter, and use the the
xlog_writeq_wake and xlog_reserveq_wake helpers now that we don't have
to care about the opportunistic wakeups.
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
There is no reason to wake up log space waiters when unlocking inodes or
dquots, and the commit log has no explanation for this function either.
Given that we now have exact log space wakeups everywhere we can assume
the reason for this function was to paper over log space races in earlier
XFS versions.
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: do exact log space wakeups in xlog_ungrant_log_space
The only reason that xfs_log_space_wake had to do opportunistic wakeups
was that the old xfs_log_move_tail calling convention didn't allow for
exact wakeups when not updating the log tail LSN. Since this issue has
been fixed we can do exact wakeups now.
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
xfs: split tail_lsn assignments from log space wakeups
Currently xfs_log_move_tail has a tail_lsn argument that is horribly
overloaded: it may contain either an actual lsn to assign to the log tail,
0 as a special case to use the last sync LSN, or 1 to indicate that no tail
LSN assignment should be performed, and we should opportunisticly wake up
at one task waiting for log space even if we did not move the LSN.
Remove the tail lsn assigned from xfs_log_move_tail and make the two callers
use xlog_assign_tail_lsn instead of the current variant of partially using
the code in xfs_log_move_tail and partially opencoding it. Note that means
we grow an addition lock roundtrip on the AIL lock for each bulk update
or delete, which is still far less than what we had before introducing the
bulk operations. If this proves to be a problem we can still add a variant
of xlog_assign_tail_lsn that expects the lock to be held already.
Also rename the remainder of xfs_log_move_tail to xfs_log_space_wake as
that name describes its functionality much better.
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Mitsuo Hayasaka [Mon, 6 Feb 2012 12:51:05 +0000 (12:51 +0000)]
xfs: cleanup quota check on disk blocks and inodes reservations
This patch is a cleanup of quota check on disk blocks and inodes
reservations, and changes it as follows.
(1) add a total_count variable to store the total number of
current usages and new reservations for disk blocks and inodes,
respectively.
(2) make it more readable to check if the local variables softlimit
and hardlimit are positive. It has been changed as follows.
if (softlimit > 0ULL) -> if (softlimit)
if (hardlimit > 0ULL) -> if (hardlimit)
This is because they are defined as xfs_qcnt_t which is unsigned.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Mitsuo Hayasaka [Mon, 6 Feb 2012 12:50:30 +0000 (12:50 +0000)]
xfs: make inode quota check more general
The xfs checks quota when reserving disk blocks and inodes. In the block
reservation, it checks if the total number of blocks including current
usage and new reservation exceed quota. In the inode reservation,
it checks using the total number of inodes including only current usage
without new reservation. However, this inode quota check works well
since the caller of xfs_trans_dquot() always sets the argument of the
number of new inode reservation to 1 or 0 and inode is reserved one by
one in current xfs.
To make it more general, this patch changes it to the same way as the
block quota check.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit c922bbc819324558e61402a7a76c10c550ca61bc)
Mitsuo Hayasaka [Mon, 6 Feb 2012 12:50:07 +0000 (12:50 +0000)]
xfs: change available ranges of softlimit and hardlimit in quota check
In general, quota allows us to use disk blocks and inodes up to each
limit, that is, they are available if they don't exceed their limitations.
Current xfs sets their available ranges to lower than them except disk
inode quota check. So, this patch changes the ranges to not beyond them.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 20f12d8ac01917d96860f352f67eddd912df0afb)
Jesper Juhl [Mon, 13 Feb 2012 20:51:05 +0000 (20:51 +0000)]
XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended
It looks to me like the two ASSERT()s in xfs_trans_add_item() really
want to do a compare (==) rather than assignment (=).
This patch changes it from the latter to the former.
Stop reusing dquots from the freelist when allocating new ones directly, and
implement a shrinker that actually follows the specifications for the
interface. The shrinker implementation is still highly suboptimal at this
point, but we can gradually work on it.
This also fixes an bug in the previous lock ordering, where we would take
the hash and dqlist locks inside of the freelist lock against the normal
lock ordering. This is only solvable by introducing the dispose list,
and thus not when using direct reclaim of unused dquots for new allocations.
As a side-effect the quota upper bound and used to free ratio values in
/proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
new world order.
Create a new function xfs_this_quota_on() that takes a xfs_mount
data structure and a disk quota type and returns true if the specified
type of quota is ON in the xfs_mount data structure.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Amit Sahrawat [Mon, 16 Jan 2012 12:24:36 +0000 (12:24 +0000)]
xfs: kill the unused XFS_BB_FSB_OFFSET macro
Removing the macro, as this is no more needed in the code.
Tried to find the reference when it was last used - but the usage
for this seemed to have been dropped long time ago.
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Mitsuo Hayasaka [Fri, 13 Jan 2012 05:58:39 +0000 (05:58 +0000)]
xfs: show uuid when mount fails due to duplicate uuid
When a system tries to mount a filesystem (FS) using UUID, the xfs
returns -EINVAL and shows a message if a FS with the same UUID has
been already mounted. It is useful to output the duplicate UUID
with it.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Mitsuo Hayasaka [Fri, 27 Jan 2012 06:37:26 +0000 (06:37 +0000)]
xfs: pass KM_SLEEP flag to kmem_realloc() in xlog_recover_add_to_cnt_trans()
The kmem_realloc() in xfs is given KM_* memory allocation flags. And it
allocates memory using kmalloc() after they are converted to gfp_mask
flags. In xlog_recover_add_to_cont_trans(), 0u is passed to kmem_realloc(),
instead of them. I guess it is preferred to use them, and here memory must
be allocated but don't have to be done with GFP_ATOMIC. So, this patch
changes it to KM_SLEEP.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
Jan Kara [Wed, 11 Jan 2012 18:52:10 +0000 (18:52 +0000)]
xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()
Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
symplink and bailed out. Fix it by jumping to 'out' instead of doing return.
CC: stable@kernel.org CC: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Alex Elder <elder@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Linus Torvalds [Thu, 19 Jan 2012 22:53:06 +0000 (14:53 -0800)]
Merge branches 'sched-urgent-for-linus', 'perf-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/accounting, proc: Fix /proc/stat interrupts sum
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT
x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore
x86/kprobes: Fix typo transferred from Intel manual
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits
x86, tsc: Fix SMI induced variation in quick_pit_calibrate()
x86, opcode: ANDN and Group 17 in x86-opcode-map.txt
x86/kconfig: Move the ZONE_DMA entry under a menu
x86/UV2: Add accounting for BAU strong nacks
x86/UV2: Ack BAU interrupt earlier
x86/UV2: Remove stale no-resources test for UV2 BAU
x86/UV2: Work around BAU bug
x86/UV2: Fix BAU destination timeout initialization
x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode
x86: Get rid of dubious one-bit signed bitfield
Linus Torvalds [Thu, 19 Jan 2012 22:49:16 +0000 (14:49 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qnx4: don't leak ->BitMap on late failure exits
qnx4: reduce the insane nesting in qnx4_checkroot()
qnx4: di_fname is an array, for crying out loud...
vfs: remove printk from set_nlink()
wake up s_wait_unfrozen when ->freeze_fs fails
H. Peter Anvin [Thu, 19 Jan 2012 20:41:25 +0000 (12:41 -0800)]
x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits
In checkin
303395ac3bf3 x86: Generate system call tables and unistd_*.h from tables
the feature macros in <asm/unistd.h> were unified between 32 and 64
bits. Unfortunately 32 bits requires __ARCH_WANT_SYS_IPC and this was
inadvertently dropped.
Linus Torvalds [Thu, 19 Jan 2012 19:46:08 +0000 (11:46 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Permit key_serial() to be called with a const key pointer
keys: fix user_defined key sparse messages
ima: fix cred sparse warning
MPILIB: Add a missing ENOMEM check
Linus Torvalds [Thu, 19 Jan 2012 03:26:11 +0000 (19:26 -0800)]
uml: fix compile for x86-64
Randy Dunlap reports that we get
arch/x86/um/shared/sysdep/ptrace.h:7:20: error: redefinition of 'regs_return_value'
arch/x86/um/shared/sysdep/ptrace.h:7:20: note: previous definition of 'regs_return_value' was here
when compiling UML for x86-64.
Stephen Rothwell root-caused it and says:
"Caused by commit d7e7528bcd45 ("Audit: push audit success and retcode
into arch ptrace.h") (another patch that was never in linux-next :-().
This file now needs protection against double inclusion."
so let's do as the man says.
Reported-by: Randy Dunlap <rdunlap@xenotime.net> Analyzed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 18 Jan 2012 23:59:18 +0000 (15:59 -0800)]
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (26 commits)
target: Set additional sense length field in sense data
target: Remove legacy device status check from transport_execute_tasks
target: Remove __transport_execute_tasks() for each processing context
target: Remove extra se_device->execute_task_lock access in fast path
target: Drop se_device TCQ queue_depth usage from I/O path
target: Fix possible NULL pointer with __transport_execute_tasks
target: Remove TFO->check_release_cmd() fabric API caller
tcm_fc: Convert ft_send_work to use target_submit_cmd
target: Add target_submit_cmd() for process context fabric submission
target: Make target_put_sess_cmd use target_release_cmd_kref
target: Set response format in INQUIRY response
target: tcm_mod_builder: small fixups
Documentation/target: Fix tcm_mod_builder.py build breakage
target: remove overagressive ____cacheline_aligned annoations
tcm_loop: bump max_sectors
target/configs: remove trailing newline from udev_path and alias
iscsi-target: fix chap identifier simple_strtoul usage
target: remove useless casts
target: simplify target_check_cdb_and_preempt
target: Move core_scsi3_check_cdb_abort_and_preempt
...
Linus Torvalds [Wed, 18 Jan 2012 23:51:48 +0000 (15:51 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
This includes initial support for the recently published ACPI 5.0 spec.
In particular, support for the "hardware-reduced" bit that eliminates
the dependency on legacy hardware.
APEI has patches resulting from testing on real hardware.
Plus other random fixes.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits)
acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
intel_idle: Split up and provide per CPU initialization func
ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2
ACPI processor: Remove unneeded cpuidle_unregister_driver call
intel idle: Make idle driver more robust
intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
ACPI: kernel-parameters.txt : Add intel_idle.max_cstate
intel_idle: remove redundant local_irq_disable() call
ACPI processor: Fix error path, also remove sysdev link
ACPI: processor: fix acpi_get_cpuid for UP processor
intel_idle: fix API misuse
ACPI APEI: Convert atomicio routines
ACPI: Export interfaces for ioremapping/iounmapping ACPI registers
ACPI: Fix possible alignment issues with GAS 'address' references
ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)
ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)
ACPI: Store SRAT table revision
ACPI, APEI, Resolve false conflict between ACPI NVS and APEI
ACPI, Record ACPI NVS regions
ACPI, APEI, EINJ, Refine the fix of resource conflict
...
Stefan Berger [Wed, 18 Jan 2012 03:07:30 +0000 (22:07 -0500)]
tpm: fix (ACPI S3) suspend regression
This patch fixes an (ACPI S3) suspend regression introduced in commit 68d6e6713fcb ("tpm: Introduce function to poll for result of self test")
and occurring with an Infineon TPM and tpm_tis and tpm_infineon drivers
active.
The suspend problem occurred if the TPM was disabled and/or deactivated
and therefore the TPM_PCRRead checking the result of the (asynchronous)
self test returned an error code which then caused the tpm_tis driver to
become inactive and this then seemed to have negatively influenced the
suspend support by the tpm_infineon driver... Besides that the tpm_tis
drive may stay active even if the TPM is disabled and/or deactivated.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 18 Jan 2012 23:41:27 +0000 (15:41 -0800)]
nvme: fix merge error due to change of 'make_request_fn' fn type
The type of 'make_request_fn' changed in 5a7bbad27a4 ("block: remove
support for bio remapping from ->make_request"), but the merge of the
nvme driver didn't take that into account, and as a result the driver
would compile with a warning:
drivers/block/nvme.c: In function 'nvme_alloc_ns':
drivers/block/nvme.c:1336:2: warning: passing argument 2 of 'blk_queue_make_request' from incompatible pointer type [enabled by default]
include/linux/blkdev.h:830:13: note: expected 'void (*)(struct request_queue *, struct bio *)' but argument is of type 'int (*)(struct request_queue *, struct bio *)'
It's benign, but the warning is annoying.
Reported-by: Stephen Rothwell <sfr@canb.auug.org> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell [Wed, 18 Jan 2012 23:24:31 +0000 (10:24 +1100)]
xen: using EXPORT_SYMBOL requires including export.h
Fix these warnings:
drivers/xen/biomerge.c:14:1: warning: data definition has no type or storage class [enabled by default]
drivers/xen/biomerge.c:14:1: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' [-Wimplicit-int]
drivers/xen/biomerge.c:14:1: warning: parameter names (without types) in function declaration [enabled by default]
Linus Torvalds [Wed, 18 Jan 2012 21:46:13 +0000 (13:46 -0800)]
Merge branch 'for-linus/i2c-33' of git://git.fluff.org/bjdooks/linux
* 'for-linus/i2c-33' of git://git.fluff.org/bjdooks/linux:
i2c-eg20t: Change-company-name-OKI-SEMICONDUCTOR to LAPIS Semiconductor
i2c-eg20t: Support new device LAPIS Semiconductor ML7831 IOH
i2c-eg20t: modified the setting of transfer rate.
i2c-eg20t: use i2c_add_numbered_adapter to get a fixed bus number
i2c: OMAP: Add DT support for i2c controller
I2C: OMAP: NACK without STP
I2C: OMAP: correct SYSC register offset for OMAP4
Linus Torvalds [Wed, 18 Jan 2012 20:53:54 +0000 (12:53 -0800)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (57 commits)
[media] as3645a: Fix compilation by including slab.h
[media] s5p-fimc: Remove linux/version.h include from fimc-mdevice.c
[media] s5p-mfc: Remove linux/version.h include from s5p_mfc.c
[media] ds3000: using logical && instead of bitwise &
[media] v4l2-ctrls: make control names consistent
[media] DVB: dib0700, add support for Nova-TD LEDs
[media] DVB: dib0700, add corrected Nova-TD frontend_attach
[media] DVB: dib0700, separate stk7070pd initialization
[media] DVB: dib0700, move Nova-TD Stick to a separate set
[media] : add MODULE_FIRMWARE to dib0700
[media] DVB-CORE: remove superfluous DTV_CMDs
[media] s5p-jpeg: adapt to recent videobuf2 changes
[media] s5p-g2d: fixed a bug in controls setting function
[media] s5p-mfc: Fix volatile controls setup
[media] drivers/media/video/s5p-mfc/s5p_mfc.c: adjust double test
[media] drivers/media/video/s5p-fimc/fimc-capture.c: adjust double test
[media] s5p-fimc: Fix incorrect control ID assignment
[media] dvb_frontend: Don't call get_frontend() if idle
[media] DocBook/dvbproperty.xml: Remove DTV_MODULATION from ISDB-T
[media] DocBook/dvbproperty.xml: Fix ISDB-T delivery system parameters
...
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (49 commits)
[SCSI] libfc: remove redundant timer init for fcp
[SCSI] fcoe: Move fcoe_debug_logging from fcoe.h to fcoe.c
[SCSI] libfc: Declare local functions static
[SCSI] fcoe: fix regression on offload em matching function for initiator/target
[SCSI] qla4xxx: Update driver version to 5.02.00-k12
[SCSI] qla4xxx: Cleanup modinfo display
[SCSI] qla4xxx: Update license
[SCSI] qla4xxx: Clear the RISC interrupt bit during FW init
[SCSI] qla4xxx: Added error logging for firmware abort
[SCSI] qla4xxx: Disable generating pause frames in case of FW hung
[SCSI] qla4xxx: Temperature monitoring for ISP82XX core.
[SCSI] megaraid: fix sparse warnings
[SCSI] sg: convert to kstrtoul_from_user()
[SCSI] don't change sdev starvation list order without request dispatched
[SCSI] isci: fix, prevent port from getting stuck in the 'configuring' state
[SCSI] isci: fix start OOB
[SCSI] isci: fix io failures while wide port links are coming up
[SCSI] isci: allow more time for wide port targets
[SCSI] isci: enable wide port targets
[SCSI] isci: Fix IO fails when pull cable from phy in x4 wideport in MPC mode.
...
* git://git.infradead.org/users/willy/linux-nvme: (105 commits)
NVMe: Set number of queues correctly
NVMe: Version 0.8
NVMe: Set queue flags correctly
NVMe: Simplify nvme_unmap_user_pages
NVMe: Mark the end of the sg list
NVMe: Fix DMA mapping for admin commands
NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT
NVMe: Merge the nvme_bio and nvme_prp data structures
NVMe: Change nvme_completion_fn to take a dev
NVMe: Change get_nvmeq to take a dev instead of a namespace
NVMe: Simplify completion handling
NVMe: Update Identify Controller data structure
NVMe: Implement doorbell stride capability
NVMe: Version 0.7
NVMe: Don't probe namespace 0
Fix calculation of number of pages in a PRP List
NVMe: Create nvme_identify and nvme_get_features functions
NVMe: Fix memory leak in nvme_dev_add()
NVMe: Fix calls to dma_unmap_sg
NVMe: Correct sg list setup in nvme_map_user_pages
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
tg3: Fix single-vector MSI-X code
openvswitch: Fix multipart datapath dumps.
ipv6: fix per device IP snmp counters
inetpeer: initialize ->redirect_genid in inet_getpeer()
net: fix NULL-deref in WARN() in skb_gso_segment()
net: WARN if skb_checksum_help() is called on skb requiring segmentation
caif: Remove bad WARN_ON in caif_dev
caif: Fix typo in Vendor/Product-ID for CAIF modems
bnx2x: Disable AN KR work-around for BCM57810
bnx2x: Remove AutoGrEEEn for BCM84833
bnx2x: Remove 100Mb force speed for BCM84833
bnx2x: Fix PFC setting on BCM57840
bnx2x: Fix Super-Isolate mode for BCM84833
net: fix some sparse errors
net: kill duplicate included header
net: sh-eth: Fix build error by the value which is not defined
net: Use device model to get driver name in skb_gso_segment()
bridge: BH already disabled in br_fdb_cleanup()
net: move sock_update_memcg outside of CONFIG_INET
mwl8k: Fixing Sparse ENDIAN CHECK warning
...
Tony Luck [Tue, 17 Jan 2012 20:10:16 +0000 (12:10 -0800)]
acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
ACPI 5.0 provides extensions to the EINJ mechanism to specify the
target for the error injection - by APICID for cpu related errors,
by address for memory related errors, and by segment/bus/device/function
for PCIe related errors. Also extensions for vendor specific error
injections.
Tested-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Thomas Renninger [Tue, 17 Jan 2012 21:40:08 +0000 (22:40 +0100)]
intel_idle: Split up and provide per CPU initialization func
Function split up, should have no functional change.
Provides entry point for physically hotplugged CPUs
to initialize and activate cpuidle.
Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> CC: Shaohua Li <shaohua.li@intel.com> CC: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
This is a very small part taken from patches which afaik
are coming from Yunhong Jiang (for a Xen not a Linus repo?).
Cleanup only: no functional change.
Advantage (beside cleanup) is that other data of the pr (acpi_processor) struct
in the acpi_processor_hotadd_init() is needed later, for example a newly
introduced flag:
pr->flags.need_hotplug_init
Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Matt Carlson [Tue, 17 Jan 2012 15:27:23 +0000 (15:27 +0000)]
tg3: Fix single-vector MSI-X code
Kdump kernels leave MSI-X interrupts (as setup by the crashed kernel)
enabled. However, kdump only enables one CPU in the new environment,
thus causing tg3 to abort MSI-X setup. When the driver attempts to
enable INTA or MSI interrupt modes on a kdump kernel, interrupt
delivery fails.
This patch attempts to workaround the problem by forcing the driver
to enable a single MSI-X interrupt. In such a configuration, the
device's multivector interrupt mode must be disabled.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Pfaff [Tue, 17 Jan 2012 13:33:39 +0000 (13:33 +0000)]
openvswitch: Fix multipart datapath dumps.
The logic to split up the list of datapaths into multiple Netlink messages
was simply wrong, causing the list to be terminated after the first part.
Only about the first 50 datapaths would be dumped. This fixes the
problem.
Reported-by: Paul Ingram <paul@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 18 Jan 2012 02:55:56 +0000 (18:55 -0800)]
Merge tag 'arm-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
ARM: fixes for ARM platforms
Some fallout from the 3.3. merge window as well as a couple bug fixes
for older preexisting bugs that seem valid to include at this time:
* sched_clock changes broke picoxcell, fix included
* BSYM bugs causing issues with thumb2-built kernels on SMP
* Missing module.h include on msm.
* A collection of bugfixes for samsung platforms that didn't make it into
the first pull requests.
* tag 'arm-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: make BSYM macro assembly only
ARM: highbank: remove incorrect BSYM usage
ARM: imx: remove incorrect BSYM usage
ARM: exynos: remove incorrect BSYM usage
ARM: ux500: add missing ENDPROC to headsmp.S
ARM: msm: Add missing ENDPROC to headsmp.S
ARM: versatile: Add missing ENDPROC to headsmp.S
ARM: EXYNOS: Invert VCLK polarity for framebuffer on ORIGEN
ARM: S3C64XX: Fix interrupt configuration for PCA935x on Cragganmore
ARM: S3C64XX: Fix the memory mapped GPIOs on Cragganmore
ARM: S3C64XX: Remove hsmmc1 from Cragganmore
ARM: S3C64XX: Remove unconditional power domain disables
ARM: SAMSUNG: Declare struct platform_device in plat/s3c64xx-spi.h
ARM: SAMSUNG: dma-ops.h needs mach/dma.h
ARM: SAMSUNG: Guard against multiple inclusion of plat/dma.h
ARM: picoxcell: fix sched_clock() cleanup fallout
ARM: msm: vreg is a module and so needs module.h
Linus Torvalds [Wed, 18 Jan 2012 02:11:38 +0000 (18:11 -0800)]
Merge branch 'upstream-linus' of git://github.com/jgarzik/libata-dev
* 'upstream-linus' of git://github.com/jgarzik/libata-dev:
[libata] ata_piix: Add Toshiba Satellite Pro A120 to the quirks list due to broken suspend functionality.
[libata] add DVRTD08A and DVR-215 to NOSETXFER device quirk list
[libata] pata_bf54x: Support sg list in bmdma transfer.
[libata] sata_fsl: fix the controller operating mode
[libata] enable ata port async suspend
Al Viro [Wed, 18 Jan 2012 01:51:22 +0000 (01:51 +0000)]
x86-32: Fix build failure with AUDIT=y, AUDITSYSCALL=n
JONGMAN HEO reports:
With current linus git (commit a25a2b84), I got following build error,
arch/x86/kernel/vm86_32.c: In function 'do_sys_vm86':
arch/x86/kernel/vm86_32.c:340: error: implicit declaration of function '__audit_syscall_exit'
make[3]: *** [arch/x86/kernel/vm86_32.o] Error 1
OK, I can reproduce it (32bit allmodconfig with AUDIT=y, AUDITSYSCALL=n)
It's due to commit d7e7528bcd45: "Audit: push audit success and retcode
into arch ptrace.h".
Hans Verkuil [Mon, 16 Jan 2012 15:27:15 +0000 (12:27 -0300)]
[media] v4l2-ctrls: make control names consistent
Several control names used inconsistent capitalization or were inconsistent
in other ways. I also corrected a spelling mistake and fixed four strings
that were too long (>31 characters). Harmless, but the string is cut off when
it is returned with QUERYCTRL.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This means cut & paste from the former f. attach. But while at it write
to the right GPIO to turn on the right LED. Also turn the other two
off jsut for sure.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Patrick Boettcher <pboettcher@kernellabs.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Jiri Slaby [Tue, 10 Jan 2012 17:11:22 +0000 (14:11 -0300)]
[media] DVB: dib0700, move Nova-TD Stick to a separate set
To properly support the three LEDs which are on the stick, we need
a special handling in the ->frontend_attach function. Thus let's have
a separate ->frontend_attach instead of ifs in the common one.
The hadnling itself will be added in further patches.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Patrick Boettcher <pboettcher@kernellabs.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>