]> git.karo-electronics.de Git - mv-sheeva.git/log
mv-sheeva.git
14 years agoxfs: minor odds and ends in xfs_log_recover.c
Alex Elder [Thu, 15 Apr 2010 18:17:34 +0000 (18:17 +0000)]
xfs: minor odds and ends in xfs_log_recover.c

Odds and ends in "xfs_log_recover.c".  This patch just contains some
minor things that didn't seem to warrant their own individual
patches:
- In xlog_bread_noalign(), drop an assertion that a pointer is
  non-null (the crash will tell us it was a bad pointer).
- Add a more descriptive header comment for xlog_find_verify_cycle().
- Make a few additions to the comments in xlog_find_head().  Also
  rearrange some expressions in a few spots to produce the same
  result, but in a way that seems more clear what's being computed.

(Updated in response to Dave's review comments.  Note I did not
split this patch like I said I would.)

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: avoid repeated pointer dereferences
Alex Elder [Thu, 15 Apr 2010 18:17:30 +0000 (18:17 +0000)]
xfs: avoid repeated pointer dereferences

In xlog_find_cycle_start() use a local variable for some repeated
operations rather than constantly accessing the memory location
whose address is passed in.

(This version drops an assertion that a pointer is non-null.)

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: change a few labels in xfs_log_recover.c
Alex Elder [Thu, 15 Apr 2010 18:17:26 +0000 (18:17 +0000)]
xfs: change a few labels in xfs_log_recover.c

Rename a label used in xlog_find_head() that I thought was poorly
chosen.  Also combine two adjacent labels xlog_find_tail() into a
single label, and give it a more generic name.

(Now using Dave's suggested "validate_head" name for first label.)

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: enforce synchronous writes in xfs_bwrite
Christoph Hellwig [Fri, 12 Mar 2010 10:59:40 +0000 (10:59 +0000)]
xfs: enforce synchronous writes in xfs_bwrite

xfs_bwrite is used with the intention of synchronously writing out
buffers, but currently it does not actually clear the async flag if
that's left from previous writes but instead implements async
behaviour if it finds it.  Remove the code handling asynchronous
writes as we've got rid of those entirely outside of the log and
delwri buffers, and make sure that we clear the async and read flags
before writing the buffer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
14 years agoxfs: remove periodic superblock writeback
Christoph Hellwig [Fri, 12 Mar 2010 10:59:16 +0000 (10:59 +0000)]
xfs: remove periodic superblock writeback

All modifications to the superblock are done transactional through
xfs_trans_log_buf, so there is no reason to initiate periodic
asynchronous writeback.  This only removes the superblock from the
delwri list and will lead to sub-optimal I/O scheduling.

Cut down xfs_sync_fsdata now that it's only used for synchronous
superblock writes and move the log coverage checks into the two
callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
14 years agoxfs: make the log ticket transaction id random
Dave Chinner [Wed, 14 Apr 2010 05:47:55 +0000 (15:47 +1000)]
xfs: make the log ticket transaction id random

The transaction ID that is written to the log for a transaction is
currently set by taking the lower 32 bits of the memory address of
the ticket structure.  This is not guaranteed to be unique as
tickets comes from a slab and slots can be reallocated immediately
after being freed. As a result, there is no guarantee of uniqueness
in the ticket ID value.

Fix this by assigning a random number to the ticket ID field so that
it is extremely unlikely that duplicates will occur and remove the
possibility of transactions being mixed up during recovery due to
duplicate IDs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: nothing special about 1-block log sector
Alex Elder [Tue, 13 Apr 2010 05:21:13 +0000 (15:21 +1000)]
xfs: nothing special about 1-block log sector

There are a number of places where a log sector size of 1 uses
special case code.  The round_up() and round_down() macros
produce the correct result even when the log sector size is 1, and
this eliminates the need for treating this as a special case.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: encapsulate bbcount validity checking
Alex Elder [Tue, 13 Apr 2010 05:22:58 +0000 (15:22 +1000)]
xfs: encapsulate bbcount validity checking

Define a function that encapsulates checking the validity of a log
block count.

(Updated from previous version--no longer includes error reporting in the
encapsulated validation function.)

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: kill XLOG_SECTOR_ROUND*()
Alex Elder [Tue, 13 Apr 2010 05:22:48 +0000 (15:22 +1000)]
xfs: kill XLOG_SECTOR_ROUND*()

XLOG_SECTOR_ROUNDUP_BBCOUNT() and XLOG_SECTOR_ROUNDDOWN_BLKNO()
are now fairly simple macro translations.  Just get rid of them in
favor of the round_up() and round_down() macro calls they represent.

Also, in spots in xlog_get_bp() and xlog_write_log_records(),
round_up() was being called with value 1, which just evaluates
to the macro's second argument; so just use that instead.
In the latter case, make use of that value, as long as it's
already been computed.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: simplify XLOG_SECTOR_ROUND*()
Alex Elder [Tue, 13 Apr 2010 05:22:40 +0000 (15:22 +1000)]
xfs: simplify XLOG_SECTOR_ROUND*()

XLOG_SECTOR_ROUNDUP_BBCOUNT() is defined in "fs/xfs/xfs_log_recover.c"
in an overly-complicated way.  It is basically roundup(), but that
is not at all clear from its definition.  (Actually, there is
another macro round_up() that applies for power-of-two-based masks
which I'll be using here.)

The operands in XLOG_SECTOR_ROUNDUP_BBCOUNT() are basically the
block number (bbs) and the log sector basic block mask
(log->l_sectbb_mask).  I'll call them B and M for this discussion.

The macro computes is value this way:
M && (B & M) ? (B + M + 1) & ~M : B

Put another way, we can break it into 3 cases:
1)  ! M          -> B # 0 mask, no effect
2)  ! (B & M)    -> B # sector aligned
3)  M && (B & M) -> (B + M + 1) & ~M # round up otherwise

The round_up() macro is cleverly defined using a value, v, and a
power-of-2, p, and the result is the nearest multiple of p greater
than or equal to v.  Its value is computed something like this:
((v - 1) | (p - 1)) + 1
Let's consider using this in the context of the 3 cases above.

When p = 2^0 = 1, the result boils down to ((v - 1) | 0) + 1, so it
just translates any value v to itself.  That handles case (1) above.

When p = 2^n, n > 0, we know that (p - 1) will be a mask with all n
bits 0..n-1 set.  The condition in this case occurs when none of
those mask bits is set in the value v provided.  If that is the
case, subtracting 1 from v will have 1's in all those lower bits (at
least).  Therefore, OR-ing the mask with that decremented value has
no effect, so adding the 1 back again will just translate the v to
itself.  This handles case (2).

Otherwise, the value v is greater than some multiple of p, and
decrementing it will produce a result greater than or equal to that
multiple.  OR-ing in the mask will produce a value 1 less than the
next multiple of p, so finally adding 1 back will result in the
desired rounded-up value.  This handles case (3).

Hopefully this is convincing.

While I was at it, I converted XLOG_SECTOR_ROUNDDOWN_BLKNO() to use
the round_down() macro.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: fix min bufsize bugs in two places
Alex Elder [Tue, 13 Apr 2010 05:22:29 +0000 (15:22 +1000)]
xfs: fix min bufsize bugs in two places

This fixes a bug in two places that I found by inspection.  In
xlog_find_verify_cycle() and xlog_write_log_records(), the code
attempts to allocate a buffer to hold as many blocks as possible.
It gives up if the number of blocks to be allocated gets too small.
Right now it uses log->l_sectbb_log as that lower bound, but I'm
sure it's supposed to be the actual log sector size instead.  That
is, the lower bound should be (1 << log->l_sectbb_log).

Also define a simple macro xlog_sectbb(log) to represent the number
of basic blocks in a sector for the given log.

(No change from original submission; I have implemented Christoph's
suggestion about storing l_sectsize rather than l_sectbb_log in
a new, separate patch in this series.)

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: add const qualifiers to xfs error function args
Alex Elder [Tue, 13 Apr 2010 05:22:08 +0000 (15:22 +1000)]
xfs: add const qualifiers to xfs error function args

Change the tag and file name arguments to xfs_error_report() and
xfs_corruption_error() to use a const qualifier.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: remove xfs_dqmarker
Christoph Hellwig [Tue, 13 Apr 2010 05:06:53 +0000 (15:06 +1000)]
xfs: remove xfs_dqmarker

The xfs_dqmarker structure does not need to exist anymore. Move the
remaining flags field out of it and remove the structure altogether.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
14 years agoxfs: convert the dquot free list to use list heads
Dave Chinner [Tue, 13 Apr 2010 05:06:52 +0000 (15:06 +1000)]
xfs: convert the dquot free list to use list heads

Convert the dquot free list on the filesystem to use listhead
infrastructure rather than the roll-your-own in the quota code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: convert the dquot hash list to use list heads
Dave Chinner [Tue, 13 Apr 2010 05:06:51 +0000 (15:06 +1000)]
xfs: convert the dquot hash list to use list heads

Convert the dquot hash list on the filesystem to use listhead
infrastructure rather than the roll-your-own in the quota code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: remove duplicate code from dquot reclaim
Dave Chinner [Tue, 13 Apr 2010 05:06:50 +0000 (15:06 +1000)]
xfs: remove duplicate code from dquot reclaim

The dquot shaker and the free-list reclaim code use exactly the same
algorithm but the code is duplicated and slightly different in each
case. Make the shaker code use the single dquot reclaim code to
remove the code duplication.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: convert the per-mount dquot list to use list heads
Dave Chinner [Tue, 13 Apr 2010 05:06:48 +0000 (15:06 +1000)]
xfs: convert the per-mount dquot list to use list heads

Convert the dquot list on the filesytesm to use listhead
infrastructure rather than the roll-your-own in the quota code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: add log item recovery tracing
Dave Chinner [Tue, 13 Apr 2010 05:06:46 +0000 (15:06 +1000)]
xfs: add log item recovery tracing

Currently there is no tracing in log recovery, so it is difficult to
determine what is going on when something goes wrong.

Add tracing for log item recovery to provide visibility into the log
recovery process. The tracing added shows regions being extracted
from the log transactions and added to the transaction hash forming
recovery items, followed by the reordering, cancelling and finally
recovery of the items.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: clean up xlog_write_adv_cnt
Christoph Hellwig [Tue, 23 Mar 2010 00:47:38 +0000 (11:47 +1100)]
xfs: clean up xlog_write_adv_cnt

Replace the awkward xlog_write_adv_cnt with an inline helper that makes
it more obvious that it's modifying it's paramters, and replace the use
of an integer type for "ptr" with a real void pointer.  Also move
xlog_write_adv_cnt to xfs_log_priv.h as it will be used outside of
xfs_log.c in the delayed logging series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: introduce new internal log vector structure
Dave Chinner [Tue, 23 Mar 2010 00:43:17 +0000 (11:43 +1100)]
xfs: introduce new internal log vector structure

The current log IO vector structure is a flat array and not
extensible. To make it possible to keep separate log IO vectors for
individual log items, we need a method of chaining log IO vectors
together.

Introduce a new log vector type that can be used to wrap the
existing log IO vectors on use that internally to the log. This
means that the existing external interface (xfs_log_write) does not
change and hence no changes to the transaction commit code are
required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: reindent xlog_write
Christoph Hellwig [Tue, 23 Mar 2010 00:35:45 +0000 (11:35 +1100)]
xfs: reindent xlog_write

Reindent xlog_write to normal one tab indents and move all variable
declarations into the closest enclosing block.

Split from a bigger patch by Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: factor xlog_write
Dave Chinner [Tue, 23 Mar 2010 00:29:44 +0000 (11:29 +1100)]
xfs: factor xlog_write

xlog_write is a mess that takes a lot of effort to understand. It is
a mass of nested loops with 4 space indents to get it to fit in 80 columns
and lots of funky variables that aren't obvious what they mean or do.

Break it down into understandable chunks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
14 years agoxfs: log ticket reservation underestimates the number of iclogs
Dave Chinner [Tue, 23 Mar 2010 00:21:11 +0000 (11:21 +1100)]
xfs: log ticket reservation underestimates the number of iclogs

When allocation a ticket for a transaction, the ticket is initialised with the
worst case log space usage based on the number of bytes the transaction may
consume. Part of this calculation is the number of log headers required for the
iclog space used up by the transaction.

This calculation makes an undocumented assumption that if the transaction uses
the log header space reservation on an iclog, then it consumes either the
entire iclog or it completes. That is - the transaction that is first in an
iclog is the transaction that the log header reservation is accounted to. If
the transaction is larger than the iclog, then it will use the entire iclog
itself. Document this assumption.

Further, the current calculation uses the rule that we can fit iclog_size bytes
of transaction data into an iclog. This is in correct - the amount of space
available in an iclog for transaction data is the size of the iclog minus the
space used for log record headers. This means that the calculation is out by
512 bytes per 32k of log space the transaction can consume. This is rarely an
issue because maximally sized transactions are extremely uncommon, and for 4k
block size filesystems maximal transaction reservations are about 400kb. Hence
the error in this case is less than the size of an iclog, so that makes it even
harder to hit.

However, anyone using larger directory blocks (16k directory blocks push the
maximum transaction size to approx. 900k on a 4k block size filesystem) or
larger block size (e.g. 64k blocks push transactions to the 3-4MB size) could
see the error grow to more than an iclog and at this point the transaction is
guaranteed to get a reservation underrun and shutdown the filesystem.

Fix this by adjusting the calculation to calculate the correct number of iclogs
required and account for them all up front.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: Clean up xfs_trans_committed code after factoring
Dave Chinner [Mon, 22 Mar 2010 23:11:05 +0000 (10:11 +1100)]
xfs: Clean up xfs_trans_committed code after factoring

Now that the code has been factored, clean up all the remaining
style cruft, simplify the code and re-order functions so that it
doesn't need forward declarations.

Also move the remaining functions that require forward declarations
(xfs_trans_uncommit, xfs_trans_free) so that all the forward
declarations can be removed from the file.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: update and factor xfs_trans_committed()
Dave Chinner [Mon, 8 Mar 2010 04:06:22 +0000 (15:06 +1100)]
xfs: update and factor xfs_trans_committed()

The function header to xfs-trans_committed has long had this
comment:

 * THIS SHOULD BE REWRITTEN TO USE xfs_trans_next_item()

To prepare for different methods of committing items, convert the
code to use xfs_trans_next_item() and factor the code into smaller,
more digestible chunks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: clean up xfs_trans_commit logic even more
Christoph Hellwig [Mon, 15 Mar 2010 01:52:49 +0000 (12:52 +1100)]
xfs: clean up xfs_trans_commit logic even more

> +shut_us_down:
> + shutdown = XFS_FORCED_SHUTDOWN(mp) ? EIO : 0;
> + if (!(tp->t_flags & XFS_TRANS_DIRTY) || shutdown) {
> + xfs_trans_unreserve_and_mod_sb(tp);
> + /*

This whole area in _xfs_trans_commit is still a complete mess.

So while touching this code, unravel this mess as well to make the
whole flow of the function simpler and clearer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
14 years agoxfs: split out iclog writing from xfs_trans_commit()
Dave Chinner [Mon, 8 Mar 2010 00:28:28 +0000 (11:28 +1100)]
xfs: split out iclog writing from xfs_trans_commit()

Split the the part of xfs_trans_commit() that deals with writing the
transaction into the iclog into a separate function. This isolates the
physical commit process from the logical commit operation and makes
it easier to insert different transaction commit paths without affecting
the existing algorithm adversely.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: fix reservation release commit flag in xfs_bmap_add_attrfork()
Dave Chinner [Mon, 8 Mar 2010 00:26:23 +0000 (11:26 +1100)]
xfs: fix reservation release commit flag in xfs_bmap_add_attrfork()

xfs_bmap_add_attrfork() passes XFS_TRANS_PERM_LOG_RES to xfs_trans_commit()
to indicate that the commit should release the permanent log reservation
as part of the commit. This is wrong - the correct flag is
XFS_TRANS_RELEASE_LOG_RES - and it is only by the chance that both these
flags have the value of 0x4 that the code is doing the right thing.

Fix it by changing to use the correct flag.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: remove stale parameter from ->iop_unpin method
Dave Chinner [Mon, 8 Mar 2010 00:26:03 +0000 (11:26 +1100)]
xfs: remove stale parameter from ->iop_unpin method

The staleness of a object being unpinned can be directly derived
from the object itself - there is no need to extract it from the
object then pass it as a parameter into IOP_UNPIN().

This means we can kill the XFS_LID_BUF_STALE flag - it is set,
checked and cleared in the same places XFS_BLI_STALE flag in the
xfs_buf_log_item so it is now redundant and hence safe to remove.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: Add inode pin counts to traces
Dave Chinner [Mon, 8 Mar 2010 00:24:07 +0000 (11:24 +1100)]
xfs: Add inode pin counts to traces

We don't record pin counts in inode events right now, and this makes
it difficult to track down problems related to pinning inodes. Add
the pin count to the inode trace class and add trace events for
pinning and unpinning inodes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: factor log item initialisation
Dave Chinner [Mon, 22 Mar 2010 23:10:00 +0000 (10:10 +1100)]
xfs: factor log item initialisation

Each log item type does manual initialisation of the log item.
Delayed logging introduces new fields that need initialisation, so
factor all the open coded initialisation into a common function
first.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
14 years agoxfs: add blockdev name to kthreads
Jan Engelhardt [Mon, 22 Mar 2010 22:52:55 +0000 (09:52 +1100)]
xfs: add blockdev name to kthreads

This allows to see in `ps` and similar tools which kthreads are
allotted to which block device/filesystem, similar to what jbd2
does. As the process name is a fixed 16-char array, no extra
space is needed in tasks.

  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
  197 ?        S      0:00  \_ [jbd2/sda2-8]
  198 ?        S      0:00  \_ [ext4-dio-unwrit]
  204 ?        S      0:00  \_ [flush-8:0]
 2647 ?        S      0:00  \_ [xfs_mru_cache]
 2648 ?        S      0:00  \_ [xfslogd/0]
 2649 ?        S      0:00  \_ [xfsdatad/0]
 2650 ?        S      0:00  \_ [xfsconvertd/0]
 2651 ?        S      0:00  \_ [xfsbufd/ram0]
 2652 ?        S      0:00  \_ [xfsaild/ram0]
 2653 ?        S      0:00  \_ [xfssyncd/ram0]

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
14 years agoxfs: Fix integer overflow in fs/xfs/linux-2.6/xfs_ioctl*.c
Zhitong Wang [Mon, 22 Mar 2010 22:51:22 +0000 (09:51 +1100)]
xfs: Fix integer overflow in fs/xfs/linux-2.6/xfs_ioctl*.c

The am_hreq.opcount field in the xfs_attrmulti_by_handle() interface
is not bounded correctly. The opcount is used to determine the size
of the buffer required. The size is bounded, but can overflow and so
the size checks may not be sufficient to catch invalid opcounts.
Fix it by catching opcount values that would cause overflows before
calculating the size.

Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
14 years agoLinus 2.6.34
Linus Torvalds [Sun, 16 May 2010 21:17:36 +0000 (14:17 -0700)]
Linus 2.6.34

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Sun, 16 May 2010 18:11:53 +0000 (11:11 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  rtnetlink: make SR-IOV VF interface symmetric
  sctp: delete active ICMP proto unreachable timer when free transport
  tcp: fix MD5 (RFC2385) support

14 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Sun, 16 May 2010 18:11:31 +0000 (11:11 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Oprofile: Fix Loongson irq handler
  MIPS: N32: Use compat version for sys_ppoll.
  MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1

14 years agortnetlink: make SR-IOV VF interface symmetric
Chris Wright [Sun, 16 May 2010 08:05:45 +0000 (01:05 -0700)]
rtnetlink: make SR-IOV VF interface symmetric

Now we have a set of nested attributes:

  IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
      IFLA_VF_MAC
      IFLA_VF_VLAN
      IFLA_VF_TX_RATE

This allows a single set to operate on multiple attributes if desired.
Among other things, it means a dump can be replayed to set state.

The current interface has yet to be released, so this seems like
something to consider for 2.6.34.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agosctp: delete active ICMP proto unreachable timer when free transport
Wei Yongjun [Sun, 9 May 2010 16:56:07 +0000 (16:56 +0000)]
sctp: delete active ICMP proto unreachable timer when free transport

transport may be free before ICMP proto unreachable timer expire, so
we should delete active ICMP proto unreachable timer when transport
is going away.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agotcp: fix MD5 (RFC2385) support
Eric Dumazet [Sun, 16 May 2010 07:34:04 +0000 (00:34 -0700)]
tcp: fix MD5 (RFC2385) support

TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.

We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.

Fix is to disable preemption and BH.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years ago MIPS: Oprofile: Fix Loongson irq handler
Wu Zhangjin [Thu, 6 May 2010 16:59:46 +0000 (00:59 +0800)]
MIPS: Oprofile: Fix Loongson irq handler

    The interrupt enable bit for the performance counters is in the Control
    Register $24, not in the counter register.
    loongson2_perfcount_handler(), we need to use

Reported-by: Xu Hengyang <hengyang@mail.ustc.edu.cn>
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
Cc: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/1198/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---

14 years ago MIPS: N32: Use compat version for sys_ppoll.
Chandrakala Chavva [Tue, 11 May 2010 00:11:54 +0000 (17:11 -0700)]
MIPS: N32: Use compat version for sys_ppoll.

    The sys_ppoll() takes struct 'struct timespec'. This is different for the
    N32 and N64 ABIs. Use the compat version to do the proper conversions.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
    To: linux-mips@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/1210/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---

14 years ago MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1
Shane McDonald [Fri, 7 May 2010 05:26:57 +0000 (23:26 -0600)]
MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1

    In the FPU emulator code of the MIPS, the Cause bits of the FCSR register
    are not currently writeable by the ctc1 instruction.  In odd corner cases,
    this can cause problems.  For example, a case existed where a divide-by-zero
    exception was generated by the FPU, and the signal handler attempted to
    restore the FPU registers to their state before the exception occurred.  In
    this particular setup, writing the old value to the FCSR register would
    cause another divide-by-zero exception to occur immediately.  The solution
    is to change the ctc1 instruction emulator code to allow the Cause bits of
    the FCSR register to be writeable.  This is the behaviour of the hardware
    that the code is emulating.

    This problem was found by Shane McDonald, but the credit for the fix goes
    to Kevin Kissell.  In Kevin's words:

    I submit that the bug is indeed in that ctc_op:  case of the emulator.  The
    Cause bits (17:12) are supposed to be writable by that instruction, but the
    CTC1 emulation won't let them be updated by the instruction.  I think that
    actually if you just completely removed lines 387-388 [...] things would
    work a good deal better.  At least, it would be a more accurate emulation of
    the architecturally defined FPU.  If I wanted to be really, really pedantic
    (which I sometimes do), I'd also protect the reserved bits that aren't
    necessarily writable.

Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
    To: anemo@mba.ocn.ne.jp
    To: kevink@paralogos.com
    To: sshtylyov@mvista.com
    Patchwork: http://patchwork.linux-mips.org/patch/1205/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
Linus Torvalds [Sat, 15 May 2010 19:55:31 +0000 (12:55 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: check for read permission on src file in the clone ioctl

14 years agolib/btree: fix possible NULL pointer dereference
kirjanov@gmail.com [Sat, 15 May 2010 16:32:34 +0000 (12:32 -0400)]
lib/btree: fix possible NULL pointer dereference

mempool_alloc() can return null in atomic case.

Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agommc: at91_mci: modify cache flush routines
Nicolas Ferre [Sat, 15 May 2010 16:32:31 +0000 (12:32 -0400)]
mmc: at91_mci: modify cache flush routines

As we were using an internal dma flushing routine, this patch changes to
the DMA API flush_kernel_dcache_page().  Driver is able to compile now.

[akpm@linux-foundation.org: flush_kernel_dcache_page() comes before kunmap_atomic()]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoBtrfs: check for read permission on src file in the clone ioctl
Dan Rosenberg [Sat, 15 May 2010 15:27:37 +0000 (11:27 -0400)]
Btrfs: check for read permission on src file in the clone ioctl

The existing code would have allowed you to clone a file that was
only open for writing

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
Linus Torvalds [Sat, 15 May 2010 16:03:15 +0000 (09:03 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  JFS: Free sbi memory in error path
  fs/sysv: dereferencing ERR_PTR()
  Fix double-free in logfs
  Fix the regression created by "set S_DEAD on unlink()..." commit

14 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 15 May 2010 16:03:02 +0000 (09:03 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf record: Add a fallback to the reference relocation symbol

14 years agoJFS: Free sbi memory in error path
Jan Blunck [Mon, 12 Apr 2010 23:44:08 +0000 (16:44 -0700)]
JFS: Free sbi memory in error path

I spotted the missing kfree() while removing the BKL.

[akpm@linux-foundation.org: avoid multiple returns so it doesn't happen again]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agofs/sysv: dereferencing ERR_PTR()
Dan Carpenter [Wed, 21 Apr 2010 10:30:32 +0000 (12:30 +0200)]
fs/sysv: dereferencing ERR_PTR()

I moved the dir_put_page() inside the if condition so we don't dereference
"page", if it's an ERR_PTR().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoFix double-free in logfs
Al Viro [Thu, 29 Apr 2010 00:57:02 +0000 (20:57 -0400)]
Fix double-free in logfs

iput() is needed *until* we'd done successful d_alloc_root()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoFix the regression created by "set S_DEAD on unlink()..." commit
Al Viro [Fri, 30 Apr 2010 21:17:09 +0000 (17:17 -0400)]
Fix the regression created by "set S_DEAD on unlink()..." commit

1) i_flags simply doesn't work for mount/unlink race prevention;
we may have many links to file and rm on one of those obviously
shouldn't prevent bind on top of another later on.  To fix it
right way we need to mark _dentry_ as unsuitable for mounting
upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
i_mutex on the inode in question.  Set it (with dont_mount(dentry))
in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
in namespace.c that used to check for S_DEAD.  Setting S_DEAD
is still needed in places where we used to set it (for directories
getting killed), since we rely on it for readdir/rmdir race
prevention.

2) rename()/mount() protection has another bogosity - we unhash
the target before we'd checked that it's not a mountpoint.  Fixed.

3) ancient bogosity in pivot_root() - we locked i_mutex on the
right directory, but checked S_DEAD on the different (and wrong)
one.  Noticed and fixed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
14 years agoMerge master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Sat, 15 May 2010 04:28:42 +0000 (21:28 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6126/1: ARM mpcore_wdt: fix build failure and other fixes
  ARM: 6125/1: ARM TWD: move TWD registers to common header
  ARM: 6110/1: Fix Thumb-2 kernel builds when UACCESS_WITH_MEMCPY is enabled
  ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
  ARM: 6111/1: Implement read/write for ownership in the ARMv6 DMA cache ops
  ARM: 6106/1: Implement copy_to_user_page() for noMMU
  ARM: 6105/1: Fix the __arm_ioremap_caller() definition in nommu.c

14 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 15 May 2010 04:28:23 +0000 (21:28 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mrst: Don't blindly access extended config space

14 years agoprofile: fix stats and data leakage
Hugh Dickins [Sat, 15 May 2010 02:44:10 +0000 (19:44 -0700)]
profile: fix stats and data leakage

If the kernel is large or the profiling step small, /proc/profile
leaks data and readprofile shows silly stats, until readprofile -r
has reset the buffer: clear the prof_buffer when it is vmalloc()ed.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agohughd: update email address
Hugh Dickins [Sat, 15 May 2010 02:40:35 +0000 (19:40 -0700)]
hughd: update email address

My old address will shut down in a couple of weeks: update the tree.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agox86, mrst: Don't blindly access extended config space
H. Peter Anvin [Fri, 14 May 2010 20:55:57 +0000 (13:55 -0700)]
x86, mrst: Don't blindly access extended config space

Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform.  The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.

This fixes booting certain VMware configurations with CONFIG_MRST=y.

Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.

Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>

14 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 May 2010 19:20:09 +0000 (12:20 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
  x86, k8: Fix build error when K8_NB is disabled
  x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
  x86: Fix fake apicid to node mapping for numa emulation

14 years agox86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
Frank Arnold [Thu, 22 Apr 2010 14:06:59 +0000 (16:06 +0200)]
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments

When running a quest kernel on xen we get:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
 ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
<0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
<0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
 <IRQ>
 [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
 [<ffffffff81059140>] ? mod_timer+0x23/0x25
 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
 [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
 [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
 [<ffffffff810114eb>] ? default_idle+0x36/0x53
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81423a9a>] rest_init+0x7e/0x80
 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
 ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
 RSP <ffff880002203e08>
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---

The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.

Check if northbridge devices are present and do not enable the feature
if there are none.

[ hpa: backported to 2.6.34 ]

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
14 years agox86, k8: Fix build error when K8_NB is disabled
Borislav Petkov [Sat, 24 Apr 2010 07:56:53 +0000 (09:56 +0200)]
x86, k8: Fix build error when K8_NB is disabled

K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100503183036.GJ26107@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
14 years agoMerge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
Linus Torvalds [Fri, 14 May 2010 18:49:42 +0000 (11:49 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  inotify: don't leak user struct on inotify release
  inotify: race use after free/double free in inotify inode marks
  inotify: clean up the inotify_add_watch out path
  Inotify: undefined reference to `anon_inode_getfd'

Manual merge to remove duplicate "select ANON_INODES" from Kconfig file

14 years agoMerge branch 'davinci-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 May 2010 18:43:52 +0000 (11:43 -0700)]
Merge branch 'davinci-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci

* 'davinci-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci:
  DA830: fix USB 2.0 clock entry

14 years agoDA830: fix USB 2.0 clock entry
Sergei Shtylyov [Thu, 13 May 2010 18:51:51 +0000 (22:51 +0400)]
DA830: fix USB 2.0 clock entry

DA8xx OHCI driver fails to load due to failing clk_get() call for the USB 2.0
clock. Arrange matching USB 2.0 clock by the clock name instead of the device.
(Adding another CLK() entry for "ohci.0" device won't do -- in the future I'll
also have to enable USB 2.0 clock to configure CPPI 4.1 module, in which case
I won't have any device at all.)

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
14 years agoinotify: don't leak user struct on inotify release
Pavel Emelyanov [Wed, 12 May 2010 22:34:07 +0000 (15:34 -0700)]
inotify: don't leak user struct on inotify release

inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user.  The problem is that free_uid() is
never called on it.

Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
14 years agoinotify: race use after free/double free in inotify inode marks
Eric Paris [Tue, 11 May 2010 21:17:40 +0000 (17:17 -0400)]
inotify: race use after free/double free in inotify inode marks

There is a race in the inotify add/rm watch code.  A task can find and
remove a mark which doesn't have all of it's references.  This can
result in a use after free/double free situation.

Task A Task B
------------ -----------
inotify_new_watch()
 allocate a mark (refcnt == 1)
 add it to the idr
inotify_rm_watch()
 inotify_remove_from_idr()
  fsnotify_put_mark()
      refcnt hits 0, free
 take reference because we are on idr
 [at this point it is a use after free]
 [time goes on]
 refcnt may hit 0 again, double free

The fix is to take the reference BEFORE the object can be found in the
idr.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: <stable@kernel.org>
14 years agoinotify: clean up the inotify_add_watch out path
Eric Paris [Tue, 11 May 2010 21:16:23 +0000 (17:16 -0400)]
inotify: clean up the inotify_add_watch out path

inotify_add_watch explictly frees the unused inode mark, but it can just
use the generic code.  Just do that.

Signed-off-by: Eric Paris <eparis@redhat.com>
14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 14 May 2010 14:56:45 +0000 (07:56 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  vhost: fix barrier pairing

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Fri, 14 May 2010 14:55:42 +0000 (07:55 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  mmap_min_addr check CAP_SYS_RAWIO only for write

14 years agoMerge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
Linus Torvalds [Fri, 14 May 2010 14:29:29 +0000 (07:29 -0700)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze

* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Fix module loading on system with WB cache
  microblaze: export assembly functions used by modules
  microblaze: Remove powerpc code from Microblaze port
  microblaze: Remove compilation warnings in cache macro
  microblaze: export assembly functions used by modules
  microblaze: fix get_user/put_user side-effects
  microblaze: re-enable interrupts before calling schedule

14 years agoMerge branch 'net-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
David S. Miller [Fri, 14 May 2010 10:42:49 +0000 (03:42 -0700)]
Merge branch 'net-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

14 years agommap_min_addr check CAP_SYS_RAWIO only for write
Kees Cook [Thu, 22 Apr 2010 19:19:17 +0000 (12:19 -0700)]
mmap_min_addr check CAP_SYS_RAWIO only for write

Redirecting directly to lsm, here's the patch discussed on lkml:
http://lkml.org/lkml/2010/4/22/219

The mmap_min_addr value is useful information for an admin to see without
being root ("is my system vulnerable to kernel NULL pointer attacks?") and
its setting is trivially easy for an attacker to determine by calling
mmap() in PAGE_SIZE increments starting at 0, so trying to keep it private
has no value.

Only require CAP_SYS_RAWIO if changing the value, not reading it.

Comment from Serge :

  Me, I like to write my passwords with light blue pen on dark blue
  paper, pasted on my window - if you're going to get my password, you're
  gonna get a headache.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
(cherry picked from commit 822cceec7248013821d655545ea45d1c6a9d15b3)

14 years agomicroblaze: Fix module loading on system with WB cache
Michal Simek [Fri, 14 May 2010 05:40:46 +0000 (07:40 +0200)]
microblaze: Fix module loading on system with WB cache

There is necessary to flush whole dcache. Icache work should be
done in kernel/module.c.

Signed-off-by: Michal Simek <monstr@monstr.eu>
14 years agox86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
Andreas Herrmann [Tue, 27 Apr 2010 10:13:48 +0000 (12:13 +0200)]
x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs

If host CPU is exposed to a guest the OSVW MSRs are not guaranteed
to be present and a GP fault occurs. Thus checking the feature flag is
essential.

Cc: <stable@kernel.org> # .32.x .33.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100427101348.GC4489@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Linus Torvalds [Thu, 13 May 2010 21:48:10 +0000 (14:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: Clean up after WM83xx AUXADC interrupt if it arrives late

14 years agoMerge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 13 May 2010 21:36:19 +0000 (14:36 -0700)]
Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Keep index within boundaries in kvmppc_44x_emul_tlbwe()
  KVM: VMX: blocked-by-sti must not defer NMI injections
  KVM: x86: Call vcpu_load and vcpu_put in cpuid_update
  KVM: SVM: Fix wrong intercept masks on 32 bit
  KVM: convert ioapic lock to spinlock

14 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
Linus Torvalds [Thu, 13 May 2010 19:21:44 +0000 (12:21 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  serial: imx.c: fix CTS trigger level lower to avoid lost chars
  tty: Fix unbalanced BKL handling in error path
  serial: mpc52xx_uart: fix null pointer dereference

14 years agoserial: imx.c: fix CTS trigger level lower to avoid lost chars
Valentin Longchamp [Wed, 5 May 2010 09:47:07 +0000 (11:47 +0200)]
serial: imx.c: fix CTS trigger level lower to avoid lost chars

The imx CTS trigger level is left at its reset value that is 32
chars. Since the RX FIFO has 32 entries, when CTS is raised, the
FIFO already is full. However, some serial port devices first empty
their TX FIFO before stopping when CTS is raised, resulting in lost
chars.

This patch sets the trigger level lower so that other chars arrive
after CTS is raised, there is still room for 16 of them.

Signed-off-by: Valentin Longchamp<valentin.longchamp@epfl.ch>
Tested-by: Philippe Rétornaz<philippe.retornaz@epfl.ch>
Acked-by: Wolfram Sang<w.sang@pengutronix.de>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agotty: Fix unbalanced BKL handling in error path
Alan Cox [Tue, 4 May 2010 19:42:36 +0000 (20:42 +0100)]
tty: Fix unbalanced BKL handling in error path

Arnd noted:

After the "retry_open:" label, we first get the tty_mutex
and then the BKL. However a the end of tty_open, we jump
back to retry_open with the BKL still held. If we run into
this case, the tty_open function will be left with the BKL
still held.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoserial: mpc52xx_uart: fix null pointer dereference
Anatolij Gustschin [Tue, 4 May 2010 22:18:59 +0000 (00:18 +0200)]
serial: mpc52xx_uart: fix null pointer dereference

Commit 6acc6833510db8f72b5ef343296d97480555fda9
introduced NULL pointer dereference and kernel crash
on ppc32 machines while booting. Fix this bug now.

Reported-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Tested-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench...
Linus Torvalds [Thu, 13 May 2010 17:36:16 +0000 (10:36 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: guard against hardlinking directories

14 years agovfs: Fix O_NOFOLLOW behavior for paths with trailing slashes
Jan Kara [Thu, 13 May 2010 10:52:57 +0000 (12:52 +0200)]
vfs: Fix O_NOFOLLOW behavior for paths with trailing slashes

According to specification

mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY)

should return success but currently it returns ELOOP.  This is a
regression caused by path lookup cleanup patch series.

Fix the code to ignore O_NOFOLLOW in case the provided path has trailing
slashes.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Thu, 13 May 2010 14:35:26 +0000 (07:35 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: ice1724 - Fix ESI Maya44 capture source control
  ALSA: pcm - Use pgprot_noncached() for MIPS non-coherent archs
  ALSA: virtuoso: fix Xonar D1/DX front panel microphone
  ALSA: hda - Add hp-dv4 model for IDT 92HD71bx
  ALSA: hda - Fix mute-LED GPIO pin for HP dv series
  ALSA: hda: Fix 0 dB for Lenovo models using Conexant CX20549 (Venice)

14 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Thu, 13 May 2010 14:28:43 +0000 (07:28 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ad7877 - keep dma rx buffers in seperate cache lines
  Input: psmouse - reset all types of mice before reconnecting
  Input: elantech - use all 3 bytes when checking version
  Input: iforce - fix Guillemot Jet Leader 3D entry
  Input: iforce - add Guillemot Jet Leader Force Feedback

14 years agomfd: Clean up after WM83xx AUXADC interrupt if it arrives late
Mark Brown [Fri, 2 Apr 2010 12:08:39 +0000 (13:08 +0100)]
mfd: Clean up after WM83xx AUXADC interrupt if it arrives late

In certain circumstances, especially under heavy load, the AUXADC
completion interrupt may be detected after we've timed out waiting for
it.  That conversion would still succeed but the next conversion will
see the completion that was signalled by the interrupt for the previous
conversion and therefore not wait for the AUXADC conversion to run,
causing it to report failure.

Provide a simple, non-invasive cleanup by using try_wait_for_completion()
to ensure that the completion is not signalled before we wait.  Since
the AUXADC is run within a mutex we know there can only have been at
most one AUXADC interrupt outstanding.  A more involved change should
follow for the next merge window.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
14 years agomicroblaze: export assembly functions used by modules
Michal Simek [Thu, 13 May 2010 10:11:42 +0000 (12:11 +0200)]
microblaze: export assembly functions used by modules

Export __strncpy_user, memory_size, ioremap_bot for modules.

Signed-off-by: Michal Simek <monstr@monstr.eu>
14 years agomicroblaze: Remove powerpc code from Microblaze port
Michal Simek [Thu, 13 May 2010 10:09:54 +0000 (12:09 +0200)]
microblaze: Remove powerpc code from Microblaze port

Remove eeh_add_device_tree_late which is powerpc specific code.

Signed-off-by: Michal Simek <monstr@monstr.eu>
14 years agomicroblaze: Remove compilation warnings in cache macro
Michal Simek [Thu, 13 May 2010 08:55:47 +0000 (10:55 +0200)]
microblaze: Remove compilation warnings in cache macro

CC      arch/microblaze/kernel/cpu/cache.o
arch/microblaze/kernel/cpu/cache.c: In function '__invalidate_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:398: warning: ISO C90 forbids mixed declarations and code
arch/microblaze/kernel/cpu/cache.c: In function '__flush_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:509: warning: ISO C90 forbids mixed declara

Signed-off-by: Michal Simek <monstr@monstr.eu>
14 years agomicroblaze: export assembly functions used by modules
Steven J. Magnani [Tue, 27 Apr 2010 18:00:35 +0000 (13:00 -0500)]
microblaze: export assembly functions used by modules

Modules that use copy_{to,from}_user(), memcpy(), and memset() fail to build
in certain circumstances.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
14 years agoMerge branch 'fix/hda' into for-linus
Takashi Iwai [Thu, 13 May 2010 08:07:15 +0000 (10:07 +0200)]
Merge branch 'fix/hda' into for-linus

14 years agoInput: ad7877 - keep dma rx buffers in seperate cache lines
Oskar Schirmer [Thu, 13 May 2010 07:42:23 +0000 (00:42 -0700)]
Input: ad7877 - keep dma rx buffers in seperate cache lines

With dma based spi transmission, data corruption is observed
occasionally. With dma buffers located right next to msg and
xfer fields, cache lines correctly flushed in preparation for
dma usage may be polluted again when writing to fields in the
same cache line.

Make sure cache fields used with dma do not share cache lines
with fields changed during dma handling. As both fields are part
of a struct that is allocated via kzalloc, thus cache aligned,
moving the fields to the 1st position and insert padding for
alignment does the job.

Signed-off-by: Oskar Schirmer <os@emlix.com>
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Oliver Schneidewind <osw@emlix.com>
Signed-off-by: Johannes Weiner <jw@emlix.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
[dtor@mail.ru - changed to use ___cacheline_aligned as suggested
 by akpm]
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
14 years agoInput: psmouse - reset all types of mice before reconnecting
Dmitry Torokhov [Thu, 13 May 2010 07:42:23 +0000 (00:42 -0700)]
Input: psmouse - reset all types of mice before reconnecting

Synaptics hardware requires resetting device after suspend to ram
in order for the device to be operational. The reset lives in
synaptics-specific reconnect handler, but it is not being invoked
if synaptics support is disabled and the device is handled as a
standard PS/2 device (bare or IntelliMouse protocol).

Let's add reset into generic reconnect handler as well.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
14 years agoInput: elantech - use all 3 bytes when checking version
Dmitry Torokhov [Thu, 13 May 2010 07:41:15 +0000 (00:41 -0700)]
Input: elantech - use all 3 bytes when checking version

Apparently all 3 bytes returned by ETP_FW_VERSION_QUERY are significant
and should be taken into account when matching hardware version/features.

Tested-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
14 years agomicroblaze: fix get_user/put_user side-effects
Steven J. Magnani [Thu, 6 May 2010 21:38:33 +0000 (16:38 -0500)]
microblaze: fix get_user/put_user side-effects

The Microblaze implementations of get_user() and (MMU) put_user() evaluate
the address argument more than once. This causes unexpected side-effects for
invocations that include increment operators, i.e. get_user(foo, bar++).

This patch also removes the distinction between MMU and noMMU put_user().

Without the patch:
  $ echo 1234567890 > /proc/sys/kernel/core_pattern
  $ cat /proc/sys/kernel/core_pattern
  12345

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
14 years agomicroblaze: re-enable interrupts before calling schedule
Steven J. Magnani [Tue, 27 Apr 2010 18:00:23 +0000 (13:00 -0500)]
microblaze: re-enable interrupts before calling schedule

schedule() should not be called with interrupts disabled.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
14 years agoperf record: Add a fallback to the reference relocation symbol
Arnaldo Carvalho de Melo [Tue, 30 Mar 2010 21:27:39 +0000 (18:27 -0300)]
perf record: Add a fallback to the reference relocation symbol

Usually "_text" is enough, but I received reports that its not always
available, so fallback to "_stext" for the symbol we use to check if we
need to apply any relocation to all the symbols in the kernel symtab,
for when, for instance, kexec is being used.

Reported-by: Darren Hart <dvhltc@us.ibm.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
14 years agoKVM: PPC: Keep index within boundaries in kvmppc_44x_emul_tlbwe()
Roel Kluin [Sun, 9 May 2010 15:26:47 +0000 (17:26 +0200)]
KVM: PPC: Keep index within boundaries in kvmppc_44x_emul_tlbwe()

An index of KVM44x_GUEST_TLB_SIZE is already one too large.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: VMX: blocked-by-sti must not defer NMI injections
Jan Kiszka [Tue, 11 May 2010 13:16:46 +0000 (15:16 +0200)]
KVM: VMX: blocked-by-sti must not defer NMI injections

As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.

Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
KVM-Stable-Tag
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: x86: Call vcpu_load and vcpu_put in cpuid_update
Dongxiao Xu [Tue, 11 May 2010 10:21:33 +0000 (18:21 +0800)]
KVM: x86: Call vcpu_load and vcpu_put in cpuid_update

cpuid_update may operate VMCS, so vcpu_load() and vcpu_put()
should be called to ensure correctness.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
14 years agoKVM: SVM: Fix wrong intercept masks on 32 bit
Joerg Roedel [Wed, 5 May 2010 14:04:43 +0000 (16:04 +0200)]
KVM: SVM: Fix wrong intercept masks on 32 bit

This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.

Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
14 years agoKVM: convert ioapic lock to spinlock
Marcelo Tosatti [Fri, 23 Apr 2010 17:03:38 +0000 (14:03 -0300)]
KVM: convert ioapic lock to spinlock

kvm_set_irq is used from non sleepable contexes, so convert ioapic from
mutex to spinlock.

KVM-Stable-Tag.
Tested-by: Ralf Bonenkamp <ralf.bonenkamp@swyx.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>