git.karo-electronics.de Git - karo-tx-linux.git/log

The bio prison code will be useful for a new DM targets.

Prepare to move this code into a separate module, adding a dm prefix
to structures and functions that will be exported.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Support discards when the pool's block size is not a power of 2.
The block layer assumes discard_granularity is a power of 2 (in
blkdev_issue_discard), so we set this to the largest power of 2 that is
a divides into the number of sectors in each block, but never less than
DATA_DEV_BLOCK_SIZE_MIN_SECTORS.

This patch eliminates the "Discard support must be disabled when the
block size is not a power of 2" constraint that was imposed in commit
55f2b8b ("dm thin: support for non power of 2 pool blocksize"). That
commit was incomplete: using a block size that is not a power of 2
shouldn't mean disabling discard support on the device completely.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Use the ACCESS_ONCE macro in dm-bufio and dm-verity where a variable
can be modified asynchronously (through sysfs) and we want to prevent
compiler optimizations that assume that the variable hasn't changed.
(See Documentation/atomic_ops.txt.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Use list_move() instead of list_del() + list_add().

spatch with a semantic match was used to find this.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

The mpio dereference should be moved below the BUG_ON NULL test
in multipath_end_io().

spatch with a semantic match was used to found this.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

This patch fixes sector_t overflow checking in dm-verity.

Without this patch, the code checks for overflow only if sector_t is
smaller than long long, not if sector_t and long long have the same size.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

The discard limits that get established for a thin-pool or thin device
may be incompatible with the pool's data device.  Avoid this by checking
the discard limits of the pool's data device.  If an incompatibility is
found then the pool's 'discard passdown' feature is disabled.

Change thin_io_hints to ensure that a thin device always uses the same
queue limits as its pool device.

Introduce requested_pf to track whether or not the table line originally
contained the no_discard_passdown flag and use this directly for table
output.  We prepare the correct setting for discard_passdown directly in
bind_control_target (called from pool_io_hints) and store it in
adjusted_pf rather than waiting until we have access to pool->pf in
pool_preresume.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

A little thin discard code refactoring to make the next patch (dm thin:
fix discard support for data devices) more readable.
Pull out a couple of functions (and uses bools instead of unsigned for
features).

No functional changes.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Add a safety net that will re-use the DM device's existing limits in the
event that DM device has a temporary table that doesn't have any
component devices.  This is to reduce the chance that requests not
respecting the hardware limits will reach the device.

DM recalculates queue limits based only on devices which currently exist
in the table.  This creates a problem in the event all devices are
temporarily removed such as all paths being lost in multipath.  DM will
reset the limits to the maximum permissible, which can then assemble
requests which exceed the limits of the paths when the paths are
restored.  The request will fail the blk_rq_check_limits() test when
sent to a path with lower limits, and will be retried without end by
multipath.  This became a much bigger issue after v3.6 commit fe86cdcef
("block: do not artificially constrain max_sectors for stacking
drivers").

Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Always clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
have it set. Otherwise devices with predictable characteristics may
contribute entropy.

QUEUE_FLAG_ADD_RANDOM specifies whether or not queue IO timings
contribute to the random pool.

For bio-based targets this flag is always 0 because such devices have no
real queue.

For request-based devices this flag was always set to 1 by default.

Now set it according to the flags on underlying devices. If there is at
least one device which should not contribute, set the flag to zero: If a
device, such as fast SSD storage, is not suitable for supplying entropy,
a request-based queue stacked over it will not be either.

Because the checking logic is exactly same as for the rotational flag,
share the iteration function with device_is_nonrot().

Signed-off-by: Milan Broz <mbroz@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

The access beyond the end of device BUG_ON that was introduced to
dm_request_fn via commit 29e4013de7ad950280e4b2208 ("dm: implement
REQ_FLUSH/FUA support for request-based dm") was an overly
drastic (but simple) response to this situation.

I have received a report that this BUG_ON was hit and now think
it would be better to use dm_kill_unmapped_request() to fail the clone
and original request with -EIO.

map_request() will assign the valid target returned by
dm_table_find_target to tio->ti. But when the target
isn't valid tio->ti is never assigned (because map_request isn't
called); so add a check for tio->ti != NULL to dm_done().

Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

When there are no paths and multipath receives an ioctl, it waits until
a path becomes available.  This behaviour is incorrect if the
"queue_if_no_path" setting was not specified, as then the ioctl should
be rejected immediately, which this patch now does.

commit 35991652b ("dm mpath: allow ioctls to trigger pg init") should
have checked if queue_if_no_path was configured before queueing IO.

Checking for the queue_if_no_path feature, like is done in map_io(),
allows the following table load to work without blocking in the
multipath_ioctl retry loop:

  echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs

Without this fix the multipath_ioctl will block with the following stack
trace:

  blkid           D 0000000000000002     0 23936      1 0x00000000
   ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
   ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
   ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
  Call Trace:
   [<ffffffff814ce099>] schedule+0x29/0x70
   [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
   [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
   [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
   [<ffffffff8104f840>] msleep+0x20/0x30
   [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
   [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
   [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
   [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
   [<ffffffff811970ac>] block_ioctl+0x3c/0x40
   [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
   [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
   [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
   [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.5+
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

The dm thin pool target claims to support the zeroing of discarded
data areas.  This turns out to be incorrect when processing discards
that do not exactly cover a complete number of blocks, so the target
must always set discard_zeroes_data_unsupported.

The thin pool target will zero blocks when they are allocated if the
skip_block_zeroing feature is not specified.  The block layer
may send a discard that only partly covers a block.  If a thin pool
block is partially discarded then there is no guarantee that the
discarded data will get zeroed before it is accessed again.
Due to this, thin devices cannot claim discards will always zero data.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

Merge branch 'for-3.7/drivers' into for-next

loop: Make explicit loop device destruction lazy

xfstests has always had random failures of tests due to loop devices
failing to be torn down and hence leaving filesytems that cannot be
unmounted. This causes test runs to immediately stop.

Over the past 6 or 7 years we've added hacks like explicit unmount
-d commands for loop mounts, losetup -d after unmount -d fails, etc,
but still the problems persist.  Recently, the frequency of loop
related failures increased again to the point that xfstests 259 will
reliably fail with a stray loop device that was not torn down.

That is despite the fact the test is above as simple as it gets -
loop 5 or 6 times running mkfs.xfs with different paramters:

        lofile=$(losetup -f)
        losetup $lofile "$testfile"
        "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
        sync
        losetup -d $lofile

And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
every time the test is run.

Turns out that blkid is running simultaneously with losetup -d, and
so it sees an elevated reference count and returns EBUSY.  But why
is blkid running? It's obvious, isn't it? udev has decided to try
and find out what is on the block device as a result of a creation
notification. And it is racing with mkfs, so might still be scanning
the device when mkfs finishes and we try to tear it down.

So, make losetup -d force autoremove behaviour. That is, when the
last reference goes away, tear down the device. xfstests wants it
*gone*, not causing random teardown failures when we know that all
the operations the tests have specifically run on the device have
completed and are no longer referencing the loop device.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/core' into for-next

block: makes bio_split support bio without data

discard bio hasn't data attached. We hit a BUG_ON with such bio. This makes
bio_split works for such bio.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

scatterlist: refactor the sg_nents

Replace 'while' with 'for' as suggested by Tejun Heo

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/drivers' into for-next

mtip32xx:Added appropriate timeout value for secure erase

Added appropriate timeout value for secure erase based on identify device data

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/drivers' into for-next

memstick: add support for legacy memorysticks

Based partially on MS standard spec quotes from Alex Dubov.

As any code that works with user data this driver isn't recommended to use
to write cards that contain valuable data.

It tries its best though to avoid data corruption
and possible damage to the card.

Tested on MS DUO 64 MB card on Ricoh R592 card reader.

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/core' into for-3.7/drivers

We need the SG number-of-entries patch for the memory stick
block support.

scatterlist: add sg_nents

Useful helper to know the number of entries in scatterlist.

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Alex Dubov <oakad@yahoo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/core' into for-next

fs: fix include/percpu-rwsem.h export error

We get the following export error on the include file:

usr/include/linux/fs.h:13: included file 'linux/percpu-rwsem.h' is not exported

Move the include inside the __KERNEL__ section.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/core' into for-next

percpu-rw-semaphore: fix documentation typos

One more patch for this thing, fixing some typos in the documentation.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/drivers' into for-next

Merge branch 'stable/for-jens-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.7/drivers

Merge branch 'for-3.7/core' into for-next

fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared

blkdev_mmap() isn't used outside of fs/block_dev.c, mark it as
static.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

blockdev: turn a rw semaphore into a percpu rw semaphore

This avoids cache line bouncing when many processes lock the semaphore
for read.

New percpu lock implementation

The lock consists of an array of percpu unsigned integers, a boolean
variable and a mutex.

When we take the lock for read, we enter rcu read section, check for a
"locked" variable. If it is false, we increase a percpu counter on the
current cpu and exit the rcu section. If "locked" is true, we exit the
rcu section, take the mutex and drop it (this waits until a writer
finished) and retry.

Unlocking for read just decreases percpu variable. Note that we can
unlock on a difference cpu than where we locked, in this case the
counter underflows. The sum of all percpu counters represents the number
of processes that hold the lock for read.

When we need to lock for write, we take the mutex, set "locked" variable
to true and synchronize rcu. Since RCU has been synchronized, no
processes can create new read locks. We wait until the sum of percpu
counters is zero - when it is, there are no readers in the critical
section.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Fix a crash when block device is read and block size is changed at the same time

The kernel may crash when block size is changed and I/O is issued
simultaneously.

Because some subsystems (udev or lvm) may read any block device anytime,
the bug actually puts any code that changes a block device size in
jeopardy.

The crash can be reproduced if you place "msleep(1000)" to
blkdev_get_blocks just before "bh->b_size = max_blocks <<
inode->i_blkbits;".
Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct"
While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0"
You get a BUG.

The direct and non-direct I/O is written with the assumption that block
size does not change. It doesn't seem practical to fix these crashes
one-by-one there may be many crash possibilities when block size changes
at a certain place and it is impossible to find them all and verify the
code.

This patch introduces a new rw-lock bd_block_size_semaphore. The lock is
taken for read during I/O. It is taken for write when changing block
size. Consequently, block size can't be changed while I/O is being
submitted.

For asynchronous I/O, the patch only prevents block size change while
the I/O is being submitted. The block size can change when the I/O is in
progress or when the I/O is being finished. This is acceptable because
there are no accesses to block size when asynchronous I/O is being
finished.

The patch prevents block size changing while the device is mapped with
mmap.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool

Changing the type of bdev parameters to be unsigned int :1, rather than bool.
This is more consistent with the types of other features in the block drivers.

Signed-off-by: Oliver Chick <oliver.chick@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Merge branch 'for-3.7/core' into for-next

block: fix request_queue->flags initialization

A queue newly allocated with blk_alloc_queue_node() has only
QUEUE_FLAG_BYPASS set. For request-based drivers,
blk_init_allocated_queue() is called and q->queue_flags is overwritten
with QUEUE_FLAG_DEFAULT which doesn't include BYPASS even though the
initial bypass is still in effect.

In blk_init_allocated_queue(), or QUEUE_FLAG_DEFAULT to q->queue_flags
instead of overwriting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()

b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
request_queues bypassed on allocation to avoid switching on and off
bypass mode on a queue being initialized.  Some drivers allocate and
then destroy a lot of queues without fully initializing them and
incurring bypass latency overhead on each of them could add upto
significant overhead.

Unfortunately, blk_init_allocated_queue() is never used by queues of
bio-based drivers, which means that all bio-based driver queues are in
bypass mode even after initialization and registration complete
successfully.

Due to the limited way request_queues are used by bio drivers, this
problem is hidden pretty well but it shows up when blk-throttle is
used in combination with a bio-based driver.  Trying to configure
(echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
indefinitely in blkg_conf_prep() waiting for bypass mode to end.

This patch moves the initial blk_queue_bypass_end() call from
blk_init_allocated_queue() to blk_register_queue() which is called for
any userland-visible queues regardless of its type.

I believe this is correct because I don't think there is any block
driver which needs or wants working elevator and blk-cgroup on a queue
which isn't visible to userland.  If there are such users, we need a
different solution.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Joseph Glanville <joseph.glanville@orionvm.com.au>
Cc: stable@vger.kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/drivers' into for-next

userns: Convert loop to use kuid_t instead of uid_t

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/core' into for-next

block: ioctl to zero block ranges

Introduce a BLKZEROOUT ioctl which can be used to clear block ranges by
way of blkdev_issue_zeroout().

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Make blkdev_issue_zeroout use WRITE SAME

If the device supports WRITE SAME, use that to optimize zeroing of
blocks. If the device does not support WRITE SAME or if the operation
fails, fall back to writing zeroes the old-fashioned way.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Implement support for WRITE SAME

The WRITE SAME command supported on some SCSI devices allows the same
block to be efficiently replicated throughout a block range. Only a
single logical block is transferred from the host and the storage device
writes the same data to all blocks described by the I/O.

This patch implements support for WRITE SAME in the block layer. The
blkdev_issue_write_same() function can be used by filesystems and block
drivers to replicate a buffer across a block range. This can be used to
efficiently initialize software RAID devices, etc.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Consolidate command flag and queue limit checks for merges

- blk_check_merge_flags() verifies that cmd_flags / bi_rw are
   compatible. This function is called for both req-req and req-bio
   merging.

- blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used
   to query the maximum sector count for a given request or queue. The
   calls will return the right value from the queue limits given the
   type of command (RW, discard, write same, etc.)

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Clean up special command handling logic

Remove special-casing of non-rw fs style requests (discard). The nomerge
flags are consolidated in blk_types.h, and rq_mergeable() and
bio_mergeable() have been modified to use them.

bio_is_rw() is used in place of bio_has_data() a few places. This is
done to to distinguish true reads and writes from other fs type requests
that carry a payload (e.g. write same).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block/blk-tag.c: Remove useless kfree

Remove useless kfree() and clean up code related to the removal.

The semantic patch that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
position p1,p2;
expression x;
@@

if (x@p1 == NULL) { ... kfree@p2(x); ... return ...; }

@unchanged exists@
position r.p1,r.p2;
expression e <= r.x,x,e1;
iterator I;
statement S;
@@

if (x@p1 == NULL) { ... when != I(x,...) S
                        when != e = e1
                        when != e += e1
                        when != e -= e1
                        when != ++e
                        when != --e
                        when != e++
                        when != e--
                        when != &e
   kfree@p2(x); ... return ...; }

@ok depends on unchanged exists@
position any r.p1;
position r.p2;
expression x;
@@

... when != true x@p1 == NULL
kfree@p2(x);

@depends on !ok && unchanged@
position r.p2;
expression x;
@@

*kfree@p2(x);
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/drivers' into for-next

mtip32xx: fix user_buffer check in exec_drive_command

Current user_buffer check is incorrect and causes hdparm to fail

# hdparm -I /dev/rssda
HDIO_DRIVE_CMD(identify) failed: Input/output error

/dev/rssda:

Patching linux-3.6-rc5 hdparm works as expected

# hdparm -I /dev/rssda
/dev/rssda:

ATA device, with non-removable media
Model Number:       DELL_P320h-MTFDGAL350SAH
Serial Number:      00000000121302025F01
Firmware Revision:  B1442808
<snip>

Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/drivers' into for-next

cciss: select CONFIG_CHECK_SIGNATURE

The patch cciss-use-check_signature.patch in -mm tree introduced
a build error:

drivers/built-in.o: In function `CISS_signature_present':
drivers/block/cciss.c:4270: undefined reference to `check_signature'

Add missing CONFIG_CHECK_SIGNATURE to fix this issue.

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Fengguang Wu <wfg@linux.intel.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: "Stephen M. Cameron" <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-3.7/drivers' into for-next

block: remove the duplicated setting for congestion_threshold

Before call the blk_queue_congestion_threshold(),
the blk_queue_congestion_threshold() is already called at blk_queue_make_rquest().
Because this code is the duplicated, it has removed.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/linux-block into for-3.7/drivers

cciss: remove unneeded memset()

The memory return by kzalloc() or kmem_cache_zalloc() has already be set
to zero, so remove useless memset(0).

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'stable/for-jens-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.7/drivers

block: reject invalid queue attribute values

Instead of using simple_strtoul which "converts" invalid numbers to 0,
use strict_strtoul and perform error checking to ensure that userspace
passes us a valid unsigned long. This addresses problems with functions
such as writev, which might want to write a trailing newline -- the
newline should rightfully be rejected, but the value preceeding it
should be preserved.

Fixes BZ#46981.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mtip32xx: Remove dead code

Removed the dead code in mtip_hw_read_registers() and mtip_hw_read_flags().

Reported-by: Coverity
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mtip32xx: Change printk to pr_xxxx

Changed printk to be compliant with latest style changes

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mtip32xx: Proper reporting of write protect status on big-endian

Proper reporting of write protect status on big-endian

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mtip32xx: Increase timeout for standby command

Increased timeout for standby command to work with larger capacity drives

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mtip32xx: Handle NCQ commands during the security locked state

Return error for NCQ commands when the drive is in security locked state

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

mtip32xx: Add support for new devices

Added supported device IDs in pci table

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Add bio_clone_bioset(), bio_clone_kmalloc()

Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().

This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.

This will also help in a later patch changing how bio cloning works.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: Boaz Harrosh <bharrosh@panasas.com>
CC: Jeff Garzik <jeff@garzik.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Consolidate bio_alloc_bioset(), bio_kmalloc()

Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
different because there was some almost-duplicated code - this fixes
some of that.

The important change is that previously bio_kmalloc() always set
bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
bio_alloc_bioset(). This would cause bio_has_data() to return true; I
don't know if this resulted in any actual bugs but it was certainly
wrong.

bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
(BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
at least they're enforced closer together and hopefully they will be
fixed in a later patch.

This'll also help with some future cleanups - there are a fair number of
functions that allocate bios (e.g. bio_clone()), and now they don't have
to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
v7: Re-add dropped comments, improv patch description
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Kill bi_destructor

Now that we've got generic code for freeing bios allocated from bio
pools, this isn't needed anymore.

This patch also makes bio_free() static, since without bi_destructor
there should be no need for it to be called anywhere else.

bio_free() is now only called from bio_put, so we can refactor those a
bit - move some code from bio_put() to bio_free() and kill the redundant
bio->bi_next = NULL.

v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
v7: No #define BIO_KMALLOC_POOL anymore

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

pktcdvd: Switch to bio_kmalloc()

This is prep work for killing bi_destructor - previously, pktcdvd had
its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
necessitating its own bi_destructor implementation.

v5: Un-reorder some functions, to make the patch easier to review

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Add bio_reset()

Reusing bios is something that's been highly frowned upon in the past,
but driver code keeps doing it anyways. If it's going to happen anyways,
we should provide a generic method.

This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
was open coding it, by doing a bio_init() and resetting bi_destructor.

This required reordering struct bio, but the block layer is not yet
nearly fast enough for any cacheline effects to matter here.

v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
bio->bi_flags are saved.
v6: Further commenting verbosity, per Tejun
v9: Add a function comment

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

dm: Use bioset's front_pad for dm_rq_clone_bio_info

Previously, dm_rq_clone_bio_info needed to be freed by the bio's
destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
This gets rid of a memory allocation and means we can kill
dm_rq_bio_destructor.

The _rq_bio_info_cache kmem cache is unused now and needs to be deleted,
but due to the way io_pool is used and overloaded this looks not quite
trivial so I'm leaving it for a later patch.

v6: Fix comment on struct dm_rq_clone_bio_info, per Tejun

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Alasdair Kergon <agk@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Ues bi_pool for bio_integrity_alloc()

Now that bios keep track of where they were allocated from,
bio_integrity_alloc_bioset() becomes redundant.

Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
related functions and make them use bio->bi_pool.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Generalized bio pool freeing

With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.

This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.

v6: Explain the temporary if statement in bio_put

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: Nicholas Bellinger <nab@linux-iscsi.org>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC bug fixes from Olof Johansson:
"Mostly Renesas and Atmel bugfixes this time, targeting boot and build
  problems.  A couple of patches for gemini and kirkwood as well.  On a
  whole nothing very controversial."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: gemini: fix the gemini build
  ARM: shmobile: armadillo800eva: enable rw rootfs mount
  ARM: Kirkwood: Fix 'SZ_1M' undeclared here for db88f6281-bp-setup.c
  ARM: shmobile: mackerel: fixup usb module order
  ARM: shmobile: armadillo800eva: fixup: sound card detection order
  ARM: shmobile: marzen: fixup smsc911x id for regulator
  ARM: at91/feature-removal-schedule: delay at91_mci removal
  ARM: mach-shmobile: armadillo800eva: Enable power button as wakeup source
  ARM: mach-shmobile: armadillo800eva: Fix GPIO buttons descriptions
  ARM: at91/dts: remove partial parameter in at91sam9g25ek.dts
  ARM: at91/clock: fix PLLA overclock warning
  ARM: at91: fix rtc-at91sam9 irq issue due to sparse irq support
  ARM: at91: fix system timer irq issue due to sparse irq support
  ARM: shmobile: sh73a0: fixup RELOC_BASE of intca_irq_pins_desc

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull a hwmon fix from Guenter Roeck:
"One patch, fixing DIV_ROUND_CLOSEST to support negative dividends.

  While the changes are not in the drivers/hwmon directory, the problem
  primarily affects hwmon drivers, and it makes sense to push the patch
  through the hwmon tree."

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  linux/kernel.h: Fix DIV_ROUND_CLOSEST to support negative dividends

Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull kbuild fixes from Michal Marek:
"These are two fixes that should go into 3.6.  The link-vmlinux.sh one
  is obvious.

  The other one fixes make firmware_install with certain configurations,
  where a file in the toplevel firmware tree gets installed first, and
  $(INSTALL_FW_PATH)/$$(dir <file>) results in /lib/firmware/./, which
  confuses make 3.82 for some reason."

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  firmware: fix directory creation rule matching with make 3.82
  link-vmlinux.sh: Fix stray "echo" in error message

Remove user-triggerable BUG from mpol_to_str

Trivially triggerable, found by trinity:

  kernel BUG at mm/mempolicy.c:2546!
  Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670)
  Call Trace:
    show_numa_map+0xd5/0x450
    show_pid_numa_map+0x13/0x20
    traverse+0xf2/0x230
    seq_read+0x34b/0x3e0
    vfs_read+0xac/0x180
    sys_pread64+0xa2/0xc0
    system_call_fastpath+0x1a/0x1f
  RIP: mpol_to_str+0x156/0x360

Cc: stable@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'mmc-fixes-for-3.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC fixes from Chris Ball:
- a firmware bug on several Samsung MoviNAND eMMC models causes
   permanent corruption on the device when secure erase and secure trim
   requests are made, so we disable those requests on these eMMC devices.
- atmel-mci: fix a hang with some SD cards by waiting for not-busy flag.
- dw_mmc: low-power mode breaks SDIO interrupts; fix PIO error handling;
   fix handling of error interrupts.
- mxs-mmc: fix deadlocks; fix compile error due to dma.h arch change.
- omap: fix broken PIO mode causing memory corruption.
- sdhci-esdhc: fix card detection.

* tag 'mmc-fixes-for-3.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: omap: fix broken PIO mode
  mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.
  mmc: dw_mmc: Disable low power mode if SDIO interrupts are used
  mmc: dw_mmc: fix error handling in PIO mode
  mmc: dw_mmc: correct mishandling error interrupt
  mmc: dw_mmc: amend using error interrupt status
  mmc: atmel-mci: not busy flag has also to be used for read operations
  mmc: sdhci-esdhc: break out early if clock is 0
  mmc: mxs-mmc: fix deadlock caused by recursion loop
  mmc: mxs-mmc: fix deadlock in SDIO IRQ case
  mmc: bfin_sdh: fix dma_desc_array build error

uml: fix compile error in deliver_alarm()

Fix the following compile error on UML.

  arch/um/os-Linux/time.c: In function 'deliver_alarm':
  arch/um/os-Linux/time.c:117:3: error: too few arguments to function 'alarm_handler'
  arch/um/os-Linux/internal.h:1:6: note: declared here

The error was introduced by commit d3c1cfcd ("um: pass siginfo to guest
process") in 3.6-rc1.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Martin Pärtel <martin.partel@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dj: memory scribble in logi_dj

Allocate a structure not a pointer to it !

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc fixes from Benjamin Herrenschmidt:
"Here are a few fixes for 3.6 that were piling up while I was away or
  busy (I was mostly MIA a week or two before San Diego).

  Some fixes from Anton fixing up issues with our relatively new DSCR
  control feature, and a few other fixes that are either regressions or
  bugs nasty enough to warrant not waiting."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Don't use __put_user() in patch_instruction
  powerpc: Make sure IPI handlers see data written by IPI senders
  powerpc: Restore correct DSCR in context switch
  powerpc: Fix DSCR inheritance in copy_thread()
  powerpc: Keep thread.dscr and thread.dscr_inherit in sync
  powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
  powerpc/powernv: Always go into nap mode when CPU is offline
  powerpc: Give hypervisor decrementer interrupts their own handler
  powerpc/vphn: Fix arch_update_cpu_topology() return value

Merge tag 'gpio-fixes-for-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"These are some GPIO regression fixes for v3.6:
   - Erroneous debug message from of_get_named_gpio_flags()
   - Make sure the MC9S08DZ60 GPIO driver depend on I2C being compiled
     in (not module) or allmodconfig breaks.
   - Check return value from irq_alloc_descs() in the Emma Mobile GPIO
     driver.
   - Assign the owner field for the rdc321x driver so the module won't
     be removed if it has active GPIOs."

* tag 'gpio-fixes-for-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: rdc321x: Prevent removal of modules exporting active GPIOs
  gpio: em: Fix checking return value of irq_alloc_descs
  gpio: mc9s08dz60: Fix build error if I2C=m
  gpio: Fix debug message in of_get_named_gpio_flags()

Merge tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"There are nothing scaring, contains only small fixes for HD-audio and
  USB-audio:
   - EPSS regression fix and GPIO fix for HD-audio IDT codecs
   - A series of USB-audio regression fixes that are found since 3.5
     kernel"

* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: snd-usb: fix cross-interface streaming devices
  ALSA: snd-usb: fix calls to next_packet_size
  ALSA: snd-usb: restore delay information
  ALSA: snd-usb: use list_for_each_safe for endpoint resources
  ALSA: snd-usb: Fix URB cancellation at stream start
  ALSA: hda - Don't trust codec EPSS bit for IDT 92HD83xx & co
  ALSA: hda - Avoid unnecessary parameter read for EPSS
  ALSA: hda - Do not set GPIOs for speakers on IDT if there are no speakers

Merge tag 'fbdev-fixes-for-3.6-1' of git://github.com/schandinat/linux-2.6

Pull fbdev fixes from Florian Tobias Schandinat:
- a fix by Paul Cercueil to prevent a possible buffer overflow
- a fix by Bruno Prémont to prevent a rare sleep in invalid context
- a fix by Julia Lawall for a double free in auo_k190x
- a fix by Dan Carpenter to prevent a division by zero in mb862xxfb
- a regression fix by Tomi Valkeinen for the SDI output in OMAP
- a fix by Grazvydas Ignotas to fix the console colors in OMAP

* tag 'fbdev-fixes-for-3.6-1' of git://github.com/schandinat/linux-2.6:
  OMAPFB: fix framebuffer console colors
  OMAPDSS: Fix SDI PLL locking
  video: mb862xxfb: prevent divide by zero bug
  drivers/video/auo_k190x.c: drop kfree of devm_kzalloc's data
  fbcon: Fix bit_putcs() call to kmalloc(s, GFP_KERNEL)
  fbcon: prevent possible buffer overflow.

Merge tag 'upstream-3.6-rc5' of git://git.infradead.org/linux-ubi

Pull ubi fix from Artem Bityutskiy:
"A single small fix for memory deallocation: we allocated memory using
  'kmem_cache_alloc()' but were freeing it using 'kfree()' in some
  cases.  Now we fix this by using 'kmem_cache_free()' instead."

* tag 'upstream-3.6-rc5' of git://git.infradead.org/linux-ubi:
  UBI: fix a horrible memory deallocation bug

Fix order of arguments to compat_put_time[spec|val]

Commit 644595f89620 ("compat: Handle COMPAT_USE_64BIT_TIME in
net/socket.c") introduced a bug where the helper functions to take
either a 64-bit or compat time[spec|val] got the arguments in the wrong
order, passing the kernel stack pointer off as a user pointer (and vice
versa).

Because of the user address range check, that in turn then causes an
EFAULT due to the user pointer range checking failing for the kernel
address. Incorrectly resuling in a failed system call for 32-bit
processes with a 64-bit kernel.

On odder architectures like HP-PA (with separate user/kernel address
spaces), it can be used read kernel memory.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset

Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

powerpc: Don't use __put_user() in patch_instruction

patch_instruction() can be called very early on ppc32, when the kernel
isn't yet running at it's linked address. That can cause the !
is_kernel_addr() test in __put_user() to trip and call might_sleep()
which is very bad at that point during boot.

Use a lower level function instead for now, at least until we get to
rework ppc32 boot process to do the code patching later, like ppc64
does.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Make sure IPI handlers see data written by IPI senders

We have been observing hangs, both of KVM guest vcpu tasks and more
generally, where a process that is woken doesn't properly wake up and
continue to run, but instead sticks in TASK_WAKING state.  This
happens because the update of rq->wake_list in ttwu_queue_remote()
is not ordered with the update of ipi_message in
smp_muxed_ipi_message_pass(), and the reading of rq->wake_list in
scheduler_ipi() is not ordered with the reading of ipi_message in
smp_ipi_demux().  Thus it is possible for the IPI receiver not to see
the updated rq->wake_list and therefore conclude that there is nothing
for it to do.

In order to make sure that anything done before smp_send_reschedule()
is ordered before anything done in the resulting call to scheduler_ipi(),
this adds barriers in smp_muxed_message_pass() and smp_ipi_demux().
The barrier in smp_muxed_message_pass() is a full barrier to ensure that
there is a full ordering between the smp_send_reschedule() caller and
scheduler_ipi().  In smp_ipi_demux(), we use xchg() rather than
xchg_local() because xchg() includes release and acquire barriers.
Using xchg() rather than xchg_local() makes sense given that
ipi_message is not just accessed locally.

This moves the barrier between setting the message and calling the
cause_ipi() function into the individual cause_ipi implementations.
Most of them -- those that used outb, out_8 or similar -- already had
a full barrier because out_8 etc. include a sync before the MMIO
store.  This adds an explicit barrier in the two remaining cases.

These changes made no measurable difference to the speed of IPIs as
measured using a simple ping-pong latency test across two CPUs on
different cores of a POWER7 machine.

The analysis of the reason why processes were not waking up properly
is due to Milton Miller.

Cc: stable@vger.kernel.org # v3.0+
Reported-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Restore correct DSCR in context switch

During a context switch we always restore the per thread DSCR value.
If we aren't doing explicit DSCR management
(ie thread.dscr_inherit == 0) and the default DSCR changed while
the process has been sleeping we end up with the wrong value.

Check thread.dscr_inherit and select the default DSCR or per thread
DSCR as required.

This was found with the following test case, when running with
more threads than CPUs (ie forcing context switching):

http://ozlabs.org/~anton/junkcode/dscr_default_test.c

With the four patches applied I can run a combination of all
test cases successfully at the same time:

http://ozlabs.org/~anton/junkcode/dscr_default_test.c
http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Fix DSCR inheritance in copy_thread()

If the default DSCR is non zero we set thread.dscr_inherit in
copy_thread() meaning the new thread and all its children will ignore
future updates to the default DSCR. This is not intended and is
a change in behaviour that a number of our users have hit.

We just need to inherit thread.dscr and thread.dscr_inherit from
the parent which ends up being much simpler.

This was found with the following test case:

http://ozlabs.org/~anton/junkcode/dscr_default_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Keep thread.dscr and thread.dscr_inherit in sync

When we update the DSCR either via emulation of mtspr(DSCR) or via
a change to dscr_default in sysfs we don't update thread.dscr.
We will eventually update it at context switch time but there is
a period where thread.dscr is incorrect.

If we fork at this point we will copy the old value of thread.dscr
into the child. To avoid this, always keep thread.dscr in sync with
reality.

This issue was found with the following testcase:

http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Update DSCR on all CPUs when writing sysfs dscr_default

Writing to dscr_default in sysfs doesn't actually change the DSCR -
we rely on a context switch on each CPU to do the work. There is no
guarantee we will get a context switch in a reasonable amount of time
so fire off an IPI to force an immediate change.

This issue was found with the following test case:

http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Always go into nap mode when CPU is offline

The CPU hotplug code for the powernv platform currently only puts
offline CPUs into nap mode if the powersave_nap variable is set.
However, HV-style KVM on this platform requires secondary CPU threads
to be offline and in nap mode. Since we know nap mode works just
fine on all POWER7 machines, and the only machines that support the
powernv platform are POWER7 machines, this changes the code to
always put offline CPUs into nap mode, regardless of powersave_nap.
Powersave_nap still controls whether or not CPUs go into nap mode
when idle, as before.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Give hypervisor decrementer interrupts their own handler

At the moment the handler for hypervisor decrementer interrupts is
the same as for decrementer interrupts, i.e. timer_interrupt().
This is bogus; if we ever do get a hypervisor decrementer interrupt
it won't have anything to do with the next timer event.  In fact
the only time we get hypervisor decrementer interrupts is when one
is left pending on exit from a KVM guest.

When we get a hypervisor decrementer interrupt we don't need to do
anything special to clear it, since they are edge-triggered on the
transition of HDEC from 0 to -1.  Thus this adds an empty handler
function for them.  We don't need to have them masked when interrupts
are soft-disabled, so we use STD_EXCEPTION_HV instead of
MASKABLE_EXCEPTION_HV.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/vphn: Fix arch_update_cpu_topology() return value

arch_update_cpu_topology() should only return 1 when the topology has
actually changed, and should return 0 otherwise.

This patch fixes a potential bug where rebuild_sched_domains() would
reinitialize the sched domains even when the topology hasn't changed.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ARM: gemini: fix the gemini build

Test-compiling obscure machines I notice that the gemini (which
by the way lacks a defconfig) is broken since some time back.
Adding a simple missing include makes it build again.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes

Two regression fixes and one boot-loader compatibility fix from Simon Horman.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  ARM: shmobile: armadillo800eva: enable rw rootfs mount
  ARM: shmobile: mackerel: fixup usb module order
  ARM: shmobile: armadillo800eva: fixup: sound card detection order

mmc: omap: fix broken PIO mode

After commit 26b88520b80695a6fa5fd95b5d97c03f4daf87e0 ("mmc:
omap_hsmmc: remove private DMA API implementation"), the Nokia N800
here stopped booting:

[    2.086181] Waiting for root device /dev/mmcblk0p1...
[    2.324066] Unhandled fault: imprecise external abort (0x406) at 0x00000000
[    2.331451] Internal error: : 406 [#1] ARM
[    2.335784] Modules linked in:
[    2.339050] CPU: 0    Not tainted  (3.6.0-rc3 #60)
[    2.344146] PC is at default_idle+0x28/0x30
[    2.348602] LR is at trace_hardirqs_on_caller+0x15c/0x1b0

...

This turned out to be due to memory corruption caused by long-broken
PIO code in drivers/mmc/host/omap.c.  (Previously, this driver had
been using DMA; but the above commit caused the MMC driver to fall
back to PIO mode with an unmodified Kconfig.)

The PIO code, added with the rest of the driver in commit
730c9b7e6630f786fcec026fb11d2e6f2c90fdcb ("[MMC] Add OMAP MMC host
driver"), confused bytes with 16-bit words.  This bug caused memory
located after the PIO transfer buffer to be corrupted with transfers
larger than 32 bytes.  The driver also did not increment the buffer
pointer after the transfer occurred.  This bug resulted in data
corruption during any transfer larger than 64 bytes.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.

For several MoviNAND eMMC parts, there are known issues with secure
erase and secure trim. For these specific MoviNAND devices, we skip
these operations.

Specifically, there is a bug in the eMMC firmware that causes
unrecoverable corruption when the MMC is erased with MMC_CAP_ERASE
enabled.

References:

http://forum.xda-developers.com/showthread.php?t=1644364
https://plus.google.com/111398485184813224730/posts/21pTYfTsCkB#111398485184813224730/posts/21pTYfTsCkB

Signed-off-by: Ian Chen <ian.cy.chen@samsung.com>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: stable <stable@vger.kernel.org> [3.0+]
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: Disable low power mode if SDIO interrupts are used

The documentation for the dw_mmc part says that the low power
mode should normally only be set for MMC and SD memory and should
be turned off for SDIO cards that need interrupts detected.

The best place I could find to do this is when the SDIO interrupt
was first enabled. I rely on the fact that dw_mci_setup_bus()
will be called when it's time to reenable.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>