]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agomfd: wm831x: Feed the device UUID into device_add_randomness()
Mark Brown [Thu, 5 Jul 2012 20:23:21 +0000 (20:23 +0000)]
mfd: wm831x: Feed the device UUID into device_add_randomness()

commit 27130f0cc3ab97560384da437e4621fc4e94f21c upstream.

wm831x devices contain a unique ID value. Feed this into the newly added
device_add_randomness() to add some per device seed data to the pool.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agortc: wm831x: Feed the write counter into device_add_randomness()
Mark Brown [Thu, 5 Jul 2012 20:19:17 +0000 (20:19 +0000)]
rtc: wm831x: Feed the write counter into device_add_randomness()

commit 9dccf55f4cb011a7552a8a2749a580662f5ed8ed upstream.

The tamper evident features of the RTC include the "write counter" which
is a pseudo-random number regenerated whenever we set the RTC. Since this
value is unpredictable it should provide some useful seeding to the random
number generator.

Only do this on boot since the goal is to seed the pool rather than add
useful entropy.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoMAINTAINERS: Theodore Ts'o is taking over the random driver
Theodore Ts'o [Wed, 4 Jul 2012 15:32:48 +0000 (11:32 -0400)]
MAINTAINERS: Theodore Ts'o is taking over the random driver

commit 330e0a01d54c2b8606c56816f99af6ebc58ec92c upstream.

Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.

Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: add tracepoints for easier debugging and verification
Theodore Ts'o [Wed, 4 Jul 2012 20:19:30 +0000 (16:19 -0400)]
random: add tracepoints for easier debugging and verification

commit 00ce1db1a634746040ace24c09a4e3a7949a3145 upstream.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: add new get_random_bytes_arch() function
Theodore Ts'o [Thu, 5 Jul 2012 14:35:23 +0000 (10:35 -0400)]
random: add new get_random_bytes_arch() function

commit c2557a303ab6712bb6e09447df828c557c710ac9 upstream.

Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present.  Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.

The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door.  (For example, an
increasing counter encrypted by an AES key known to the NSA.)

It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
  --- especially since Bull Mountain is documented to use AES as a
whitener.  Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel.  Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.

Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.

This way we get almost of the benefits of the HW RNG without any
potential liabilities.  The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.

For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: use the arch-specific rng in xfer_secondary_pool
Theodore Ts'o [Thu, 5 Jul 2012 14:21:01 +0000 (10:21 -0400)]
random: use the arch-specific rng in xfer_secondary_pool

commit e6d4947b12e8ad947add1032dd754803c6004824 upstream.

If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.

Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet: feed /dev/random with the MAC address when registering a device
Theodore Ts'o [Thu, 5 Jul 2012 01:23:25 +0000 (21:23 -0400)]
net: feed /dev/random with the MAC address when registering a device

commit 7bf2357524408b97fec58344caf7397f8140c3fd upstream.

Cc: David Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agousb: feed USB device information to the /dev/random driver
Theodore Ts'o [Wed, 4 Jul 2012 15:22:20 +0000 (11:22 -0400)]
usb: feed USB device information to the /dev/random driver

commit b04b3156a20d395a7faa8eed98698d1e17a36000 upstream.

Send the USB device's serial, product, and manufacturer strings to the
/dev/random driver to help seed its pools.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: create add_device_randomness() interface
Linus Torvalds [Wed, 4 Jul 2012 15:16:01 +0000 (11:16 -0400)]
random: create add_device_randomness() interface

commit a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream.

Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot).  This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).

[ Modified by tytso to mix in a timestamp, since there may be some
  variability caused by the time needed to detect/configure the hardware
  in question. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: use lockless techniques in the interrupt path
Theodore Ts'o [Wed, 4 Jul 2012 14:38:30 +0000 (10:38 -0400)]
random: use lockless techniques in the interrupt path

commit 902c098a3663de3fa18639efbb71b6080f0bcd3c upstream.

The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agorandom: make 'add_interrupt_randomness()' do something sane
Theodore Ts'o [Mon, 2 Jul 2012 11:52:16 +0000 (07:52 -0400)]
random: make 'add_interrupt_randomness()' do something sane

commit 775f4b297b780601e61787b766f306ed3e1d23eb upstream.

We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.

This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool.  Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool.  This
assures that /dev/urandom is returning unpredictable data as soon as
possible.

(Based on an original patch by Linus, but significantly modified by
tytso.)

Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoInput: synaptics - handle out of bounds values from the hardware
Seth Forshee [Wed, 25 Jul 2012 06:54:11 +0000 (23:54 -0700)]
Input: synaptics - handle out of bounds values from the hardware

commit c0394506e69b37c47d391c2a7bbea3ea236d8ec8 upstream.

The touchpad on the Acer Aspire One D250 will report out of range values
in the extreme lower portion of the touchpad. These appear as abrupt
changes in the values reported by the hardware from very low values to
very high values, which can cause unexpected vertical jumps in the
position of the mouse pointer.

What seems to be happening is that the value is wrapping to a two's
compliment negative value of higher resolution than the 13-bit value
reported by the hardware, with the high-order bits being truncated. This
patch adds handling for these values by converting them to the
appropriate negative values.

The only tricky part about this is deciding when to treat a number as
negative. It stands to reason that if out of range values can be
reported on the low end then it could also happen on the high end, so
not all out of range values should be treated as negative. The approach
taken here is to split the difference between the maximum legitimate
value for the axis and the maximum possible value that the hardware can
report, treating values greater than this number as negative and all
other values as positive. This can be tweaked later if hardware is found
that operates outside of these parameters.

BugLink: http://bugs.launchpad.net/bugs/1001251
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agox86-64, kcmp: The kcmp system call can be common
H. Peter Anvin [Wed, 1 Aug 2012 22:59:58 +0000 (15:59 -0700)]
x86-64, kcmp: The kcmp system call can be common

commit eaf4ce6c5fed6b4c55f7efcd5fc3477435cab5e9 upstream.

We already use the same system call handler for i386 and x86-64, there
is absolutely no reason x32 can't use the same system call, too.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-vwzk3qbcr3yjyxjg2j38vgy9@git.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agox86, nops: Missing break resulting in incorrect selection on Intel
Alan Cox [Wed, 25 Jul 2012 15:28:19 +0000 (16:28 +0100)]
x86, nops: Missing break resulting in incorrect selection on Intel

commit d6250a3f12edb3a86db9598ffeca3de8b4a219e9 upstream.

The Intel case falls through into the generic case which then changes
the values.  For cases like the P6 it doesn't do the right thing so
this seems to be a screwup.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-lww2uirad4skzjlmrm0vru8o@git.kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agowireless: reg: restore previous behaviour of chan->max_power calculations
Stanislaw Gruszka [Tue, 24 Jul 2012 06:35:39 +0000 (08:35 +0200)]
wireless: reg: restore previous behaviour of chan->max_power calculations

commit 5e31fc0815a4e2c72b1b495fe7a0d8f9bfb9e4b4 upstream.

commit eccc068e8e84c8fe997115629925e0422a98e4de
Author: Hong Wu <Hong.Wu@dspg.com>
Date:   Wed Jan 11 20:33:39 2012 +0200

    wireless: Save original maximum regulatory transmission power for the calucation of the local maximum transmit pow

changed the way we calculate chan->max_power as min(chan->max_power,
chan->max_reg_power). That broke rt2x00 (and perhaps some other
drivers) that do not set chan->max_power. It is not so easy to fix this
problem correctly in rt2x00.

According to commit eccc068e8 changelog, change claim only to save
maximum regulatory power - changing setting of chan->max_power was side
effect. This patch restore previous calculations of chan->max_power and
do not touch chan->max_reg_power.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoath9k: Add PID/VID support for AR1111
Mohammed Shafi Shajakhan [Thu, 2 Aug 2012 06:28:50 +0000 (11:58 +0530)]
ath9k: Add PID/VID support for AR1111

commit d4e5979c0da95791aa717c18e162540c7a596360 upstream.

AR1111 is same as AR9485. The h/w
difference between them is quite insignificant,
Felix suggests only very few baseband features
may not be available in AR1111. The h/w code for
AR9485 is already present, so AR1111 should
work fine with the addition of its PID/VID.

Reported-by: Tim Bentley <Tim.Bentley@Gmail.com>
Cc: Felix Bitterli <felixb@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Tested-by: Tim Bentley <Tim.Bentley@Gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomac80211: cancel mesh path timer
Johannes Berg [Wed, 1 Aug 2012 19:03:21 +0000 (21:03 +0200)]
mac80211: cancel mesh path timer

commit dd4c9260e7f23f2e951cbfb2726e468c6d30306c upstream.

The mesh path timer needs to be canceled when
leaving the mesh as otherwise it could fire
after the interface has been removed already.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomISDN: Bugfix for layer2 fixed TEI mode
Karsten Keil [Sat, 4 Aug 2012 00:14:25 +0000 (00:14 +0000)]
mISDN: Bugfix for layer2 fixed TEI mode

commit 25099335944a23db75d4916644122c746684e093 upstream.

If a fixed TEI is used, the initial state of the layer 2 statmachine need to be
4 (TEI assigned). This was true only for Point to Point connections, but not
for the other fixed TEIs. It was not found before, because usually only the
TEI 0 is used as fixed TEI for PtP mode, but if you try X31 packet mode
connections with SAPI 16, TEI 1, it did fail.

Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoACPI processor: Fix tick_broadcast_mask online/offline regression
Feng Tang [Tue, 31 Jul 2012 04:44:43 +0000 (12:44 +0800)]
ACPI processor: Fix tick_broadcast_mask online/offline regression

commit b7db60f45d74497c723dc7ae1370cf0b37dfb0d8 upstream.

In commit 99b725084 "ACPI processor hotplug: Delay acpi_processor_start()
call for hotplugged cores", acpi_processor_hotplug(pr) was wrongly replaced
by acpi_processor_cst_has_changed() inside the acpi_cpu_soft_notify(). This
patch will restore it back, fixing the tick_broadcast_mask regression:
https://lkml.org/lkml/2012/7/30/169

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoore: Fix out-of-bounds access in _ios_obj()
Boaz Harrosh [Wed, 1 Aug 2012 14:48:36 +0000 (17:48 +0300)]
ore: Fix out-of-bounds access in _ios_obj()

commit 9e62bb4458ad2cf28bd701aa5fab380b846db326 upstream.

_ios_obj() is accessed by group_index not device_table index.

The oc->comps array is only a group_full of devices at a time
it is not like ore_comp_dev() which is indexed by a global
device_table index.

This did not BUG until now because exofs only uses a single
COMP for all devices. But with other FSs like PanFS this is
not true.

This bug was only in the write_path, all other users were
using it correctly

[This is a bug since 3.2 Kernel]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agosh: Fix up recursive fault in oops with unset TTB.
Paul Mundt [Tue, 24 Jul 2012 04:15:54 +0000 (13:15 +0900)]
sh: Fix up recursive fault in oops with unset TTB.

commit 90eed7d87b748f9c0d11b9bad64a4c41e31b78c4 upstream.

Presently the oops code looks for the pgd either from the mm context or
the cached TTB value. There are presently cases where the TTB can be
unset or otherwise cleared by hardware, which we weren't handling,
resulting in recursive faults on the NULL pgd. In these cases we can
simply reload from swapper_pg_dir and continue on as normal.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoblock: uninitialized ioc->nr_tasks triggers WARN_ON
Olof Johansson [Wed, 1 Aug 2012 10:17:27 +0000 (12:17 +0200)]
block: uninitialized ioc->nr_tasks triggers WARN_ON

commit 4638a83e8615de9c16c39dfed234951d0f468cf1 upstream.

Hi,

I'm using the old-fashioned 'dump' backup tool, and I noticed that it spews the
below warning as of 3.5-rc1 and later (3.4 is fine):

[   10.886893] ------------[ cut here ]------------
[   10.886904] WARNING: at include/linux/iocontext.h:140 copy_process+0x1488/0x1560()
[   10.886905] Hardware name: Bochs
[   10.886906] Modules linked in:
[   10.886908] Pid: 2430, comm: dump Not tainted 3.5.0-rc7+ #27
[   10.886908] Call Trace:
[   10.886911]  [<ffffffff8107ce8a>] warn_slowpath_common+0x7a/0xb0
[   10.886912]  [<ffffffff8107ced5>] warn_slowpath_null+0x15/0x20
[   10.886913]  [<ffffffff8107c088>] copy_process+0x1488/0x1560
[   10.886914]  [<ffffffff8107c244>] do_fork+0xb4/0x340
[   10.886918]  [<ffffffff8108effa>] ? recalc_sigpending+0x1a/0x50
[   10.886919]  [<ffffffff8108f6b2>] ? __set_task_blocked+0x32/0x80
[   10.886920]  [<ffffffff81091afa>] ? __set_current_blocked+0x3a/0x60
[   10.886923]  [<ffffffff81051db3>] sys_clone+0x23/0x30
[   10.886925]  [<ffffffff8179bd73>] stub_clone+0x13/0x20
[   10.886927]  [<ffffffff8179baa2>] ? system_call_fastpath+0x16/0x1b
[   10.886928] ---[ end trace 32a14af7ee6a590b ]---

Reproducing is easy, I can hit it on a KVM system with a very basic
config (x86_64 make defconfig + enable the drivers needed). To hit it,
just install dump (on debian/ubuntu, not sure what the package might be
called on Fedora), and:

dump -o -f /tmp/foo /

You'll see the warning in dmesg once it forks off the I/O process and
starts dumping filesystem contents.

I bisected it down to the following commit:

commit f6e8d01bee036460e03bd4f6a79d014f98ba712e
Author: Tejun Heo <tj@kernel.org>
Date:   Mon Mar 5 13:15:26 2012 -0800

    block: add io_context->active_ref

    Currently ioc->nr_tasks is used to decide two things - whether an ioc
    is done issuing IOs and whether it's shared by multiple tasks.  This
    patch separate out the first into ioc->active_ref, which is acquired
    and released using {get|put}_io_context_active() respectively.

    This will be used to associate bio's with a given task.  This patch
    doesn't introduce any visible behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It seems like the init of ioc->nr_tasks was removed in that patch,
so it starts out at 0 instead of 1.

Tejun, is the right thing here to add back the init, or should something else
be done?

The below patch removes the warning, but I haven't done any more extensive
testing on it.

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agovideo/smscufx: fix line counting in fb_write
Alexander Holler [Fri, 20 Apr 2012 22:11:07 +0000 (00:11 +0200)]
video/smscufx: fix line counting in fb_write

commit 2fe2d9f47cfe1a3e66e7d087368b3d7155b04c15 upstream.

Line 0 and 1 were both written to line 0 (on the display) and all subsequent
lines had an offset of -1. The result was that the last line on the display
was never overwritten by writes to /dev/fbN.

The origin of this bug seems to have been udlfb.

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomd/raid1: don't abort a resync on the first badblock.
NeilBrown [Tue, 31 Jul 2012 00:05:34 +0000 (10:05 +1000)]
md/raid1: don't abort a resync on the first badblock.

commit b7219ccb33aa0df9949a60c68b5e9f712615e56f upstream.

If a resync of a RAID1 array with 2 devices finds a known bad block
one device it will neither read from, or write to, that device for
this block offset.
So there will be one read_target (The other device) and zero write
targets.
This condition causes md/raid1 to abort the resync assuming that it
has finished - without known bad blocks this would be true.

When there are no write targets because of the presence of bad blocks
we should only skip over the area covered by the bad block.
RAID10 already gets this right, raid1 doesn't.  Or didn't.

As this can cause a 'sync' to abort early and appear to have succeeded
it could lead to some data corruption, so it suitable for -stable.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomm: mmu_notifier: fix freed page still mapped in secondary MMU
Xiao Guangrong [Tue, 31 Jul 2012 23:45:52 +0000 (16:45 -0700)]
mm: mmu_notifier: fix freed page still mapped in secondary MMU

commit 3ad3d901bbcfb15a5e4690e55350db0899095a68 upstream.

mmu_notifier_release() is called when the process is exiting.  It will
delete all the mmu notifiers.  But at this time the page belonging to the
process is still present in page tables and is present on the LRU list, so
this race will happen:

      CPU 0                 CPU 1
mmu_notifier_release:    try_to_unmap:
   hlist_del_init_rcu(&mn->hlist);
                            ptep_clear_flush_notify:
                                  mmu nofifler not found
                            free page  !!!!!!
                            /*
                             * At the point, the page has been
                             * freed, but it is still mapped in
                             * the secondary MMU.
                             */

  mn->ops->release(mn, mm);

Then the box is not stable and sometimes we can get this bug:

[  738.075923] BUG: Bad page state in process migrate-perf  pfn:03bec
[  738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping:          (null) index:0x8076
[  738.075936] page flags: 0x20000000000014(referenced|dirty)

The same issue is present in mmu_notifier_unregister().

We can call ->release before deleting the notifier to ensure the page has
been unmapped from the secondary MMU before it is freed.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomm: setup pageblock_order before it's used by sparsemem
Xishi Qiu [Tue, 31 Jul 2012 23:43:19 +0000 (16:43 -0700)]
mm: setup pageblock_order before it's used by sparsemem

commit ca57df79d4f64e1a4886606af4289d40636189c5 upstream.

On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as
Itanium, pageblock_order is a variable with default value of 0.  It's set
to the right value by set_pageblock_order() in function
free_area_init_core().

But pageblock_order may be used by sparse_init() before free_area_init_core()
is called along path:
sparse_init()
    ->sparse_early_usemaps_alloc_node()
->usemap_size()
    ->SECTION_BLOCKFLAGS_BITS
->((1UL << (PFN_SECTION_SHIFT - pageblock_order)) *
NR_PAGEBLOCK_BITS)

The uninitialized pageblock_size will cause memory wasting because
usemap_size() returns a much bigger value then it's really needed.

For example, on an Itanium platform,
sparse_init() pageblock_order=0 usemap_size=24576
free_area_init_core() before pageblock_order=0, usemap_size=24576
free_area_init_core() after pageblock_order=12, usemap_size=8

That means 24K memory has been wasted for each section, so fix it by calling
set_pageblock_order() from sparse_init().

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix double quirk for Quanta FL1 / Lenovo Ideapad
David Henningsson [Wed, 8 Aug 2012 06:43:37 +0000 (08:43 +0200)]
ALSA: hda - Fix double quirk for Quanta FL1 / Lenovo Ideapad

commit 012e7eb1e501d0120e0383b81477f63091f5e365 upstream.

The same ID is twice in the quirk table, so the second one is not used.

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - remove quirk for Dell Vostro 1015
David Henningsson [Tue, 7 Aug 2012 12:03:29 +0000 (14:03 +0200)]
ALSA: hda - remove quirk for Dell Vostro 1015

commit e9fc83cb2e5877801a255a37ddbc5be996ea8046 upstream.

This computer is confirmed working with model=auto on kernel 3.2.
Also, parsing fails with hda-emu with the current model.

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - add dock support for Thinkpad X230
Felix Kaechele [Mon, 6 Aug 2012 21:02:01 +0000 (23:02 +0200)]
ALSA: hda - add dock support for Thinkpad X230

commit c8415a48fcb7a29889f4405d38c57db351e4b50a upstream.

As with the ThinkPad Models X230 Tablet and T530 the X230 needs a qurik to
correctly set up the pins for the dock port.

Signed-off-by: Felix Kaechele <felix@fetzig.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - add dock support for Thinkpad T430s
Philipp A. Mohrenweiser [Mon, 6 Aug 2012 11:14:18 +0000 (13:14 +0200)]
ALSA: hda - add dock support for Thinkpad T430s

commit 4407be6ba217514b1bc01488f8b56467d309e416 upstream.

Add a model/fixup string "lenovo-dock", for Thinkpad T430s, to allow
sound in docking station.

Tested on Lenovo T430s with ThinkPad Mini Dock Plus Series 3

Signed-off-by: Philipp A. Mohrenweiser <phiamo@googlemail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: Fix undefined instruction exception handling
Russell King [Mon, 30 Jul 2012 18:42:10 +0000 (19:42 +0100)]
ARM: Fix undefined instruction exception handling

commit 15ac49b65024f55c4371a53214879a9c77c4fbf9 upstream.

While trying to get a v3.5 kernel booted on the cubox, I noticed that
VFP does not work correctly with VFP bounce handling.  This is because
of the confusion over 16-bit vs 32-bit instructions, and where PC is
supposed to point to.

The rule is that FP handlers are entered with regs->ARM_pc pointing at
the _next_ instruction to be executed.  However, if the exception is
not handled, regs->ARM_pc points at the faulting instruction.

This is easy for ARM mode, because we know that the next instruction and
previous instructions are separated by four bytes.  This is not true of
Thumb2 though.

Since all FP instructions are 32-bit in Thumb2, it makes things easy.
We just need to select the appropriate adjustment.  Do this by moving
the adjustment out of do_undefinstr() into the assembly code, as only
the assembly code knows whether it's dealing with a 32-bit or 16-bit
instruction.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7480/1: only call smp_send_stop() on SMP
Javier Martinez Canillas [Sat, 28 Jul 2012 14:19:55 +0000 (15:19 +0100)]
ARM: 7480/1: only call smp_send_stop() on SMP

commit c5dff4ffd327088d85035bec535b7d0c9ea03151 upstream.

On reboot or poweroff (machine_shutdown()) a call to smp_send_stop() is
made (to stop the others CPU's) when CONFIG_SMP=y.

arch/arm/kernel/process.c:

void machine_shutdown(void)
{
 #ifdef CONFIG_SMP
       smp_send_stop();
 #endif
}

smp_send_stop() calls the function pointer smp_cross_call(), which is set
on the smp_init_cpus() function for OMAP processors.

arch/arm/mach-omap2/omap-smp.c:

void __init smp_init_cpus(void)
{
...
set_smp_cross_call(gic_raise_softirq);
...
}

But the ARM setup_arch() function only calls smp_init_cpus()
if CONFIG_SMP=y && is_smp().

arm/kernel/setup.c:

void __init setup_arch(char **cmdline_p)
{
...
 #ifdef CONFIG_SMP
if (is_smp())
smp_init_cpus();
 #endif
...
}

Newer OMAP CPU's are SMP machines so omap2plus_defconfig sets
CONFIG_SMP=y. Unfortunately on an OMAP UP machine is_smp()
returns false and smp_init_cpus() is never called and the
smp_cross_call() function remains NULL.

If the machine is rebooted or powered off, smp_send_stop() will
be called (since CONFIG_SMP=y) leading to the following error:

[   42.815551] Restarting system.
[   42.819030] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   42.827667] pgd = d7a74000
[   42.830566] [00000000] *pgd=96ce7831, *pte=00000000, *ppte=00000000
[   42.837249] Internal error: Oops: 80000007 [#1] SMP ARM
[   42.842773] Modules linked in:
[   42.846008] CPU: 0    Not tainted  (3.5.0-rc3-next-20120622-00002-g62e87ba-dirty #44)
[   42.854278] PC is at 0x0
[   42.856994] LR is at smp_send_stop+0x4c/0xe4
[   42.861511] pc : [<00000000>]    lr : [<c00183a4>]    psr: 60000013
[   42.861511] sp : d6c85e70  ip : 00000000  fp : 00000000
[   42.873626] r10: 00000000  r9 : d6c84000  r8 : 00000002
[   42.879150] r7 : c07235a0  r6 : c06dd2d0  r5 : 000f4241  r4 : d6c85e74
[   42.886047] r3 : 00000000  r2 : 00000000  r1 : 00000006  r0 : d6c85e74
[   42.892944] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   42.900482] Control: 10c5387d  Table: 97a74019  DAC: 00000015
[   42.906555] Process reboot (pid: 1166, stack limit = 0xd6c842f8)
[   42.912902] Stack: (0xd6c85e70 to 0xd6c86000)
[   42.917510] 5e60:                                     c07235a0 00000000 00000000 d6c84000
[   42.926177] 5e80: 01234567 c00143d0 4321fedc c00511bc d6c85ebc 00000168 00000460 00000000
[   42.934814] 5ea0: c1017950 a0000013 c1017900 d8014390 d7ec3858 c0498e48 c1017950 00000000
[   42.943481] 5ec0: d6ddde10 d6c85f78 00000003 00000000 d6ddde10 d6c84000 00000000 00000000
[   42.952117] 5ee0: 00000002 00000000 00000000 c0088c88 00000002 00000000 00000000 c00f4b90
[   42.960784] 5f00: 00000000 d6c85ebc d8014390 d7e311c8 60000013 00000103 00000002 d6c84000
[   42.969421] 5f20: c00f3274 d6e00a00 00000001 60000013 d6c84000 00000000 00000000 c00895d4
[   42.978057] 5f40: 00000002 d8007c80 d781f000 c00f6150 d8010cc0 c00f3274 d781f000 d6c84000
[   42.986694] 5f60: c0013020 d6e00a00 00000001 20000010 0001257c ef000000 00000000 c00895d4
[   42.995361] 5f80: 00000002 00000001 00000003 00000000 00000001 00000003 00000000 00000058
[   43.003997] 5fa0: c00130c8 c0012f00 00000001 00000003 fee1dead 28121969 01234567 00000002
[   43.012634] 5fc0: 00000001 00000003 00000000 00000058 00012584 0001257c 00000001 00000000
[   43.021270] 5fe0: 000124bc bec5cc6c 00008f9c 4a2f7c40 20000010 fee1dead 00000000 00000000
[   43.029968] [<c00183a4>] (smp_send_stop+0x4c/0xe4) from [<c00143d0>] (machine_restart+0xc/0x4c)
[   43.039154] [<c00143d0>] (machine_restart+0xc/0x4c) from [<c00511bc>] (sys_reboot+0x144/0x1f0)
[   43.048278] [<c00511bc>] (sys_reboot+0x144/0x1f0) from [<c0012f00>] (ret_fast_syscall+0x0/0x3c)
[   43.057464] Code: bad PC value
[   43.060760] ---[ end trace c3988d1dd0b8f0fb ]---

Add a check so smp_cross_call() is only called when there is more than one CPU on-line.

Signed-off-by: Javier Martinez Canillas <javier at dowhile0.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches
Will Deacon [Mon, 23 Jul 2012 13:18:13 +0000 (14:18 +0100)]
ARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches

commit b74253f78400f9a4b42da84bb1de7540b88ce7c4 upstream.

The vivt_flush_cache_{range,page} functions check that the mm_struct
of the VMA being flushed has been active on the current CPU before
performing the cache maintenance.

The gate_vma has a NULL mm_struct pointer and, as such, will cause a
kernel fault if we try to flush it with the above operations. This
happens during ELF core dumps, which include the gate_vma as it may be
useful for debugging purposes.

This patch adds checks to the VIVT cache flushing functions so that VMAs
with a NULL mm_struct are flushed unconditionally (the vectors page may
be dirty if we use it to store the current TLS pointer).

Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Tested-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7478/1: errata: extend workaround for erratum #720789
Will Deacon [Fri, 20 Jul 2012 17:24:55 +0000 (18:24 +0100)]
ARM: 7478/1: errata: extend workaround for erratum #720789

commit 5a783cbc48367cfc7b65afc75430953dfe60098f upstream.

Commit cdf357f1 ("ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS
operations can broadcast a faulty ASID") replaced by-ASID TLB flushing
operations with all-ASID variants to workaround A9 erratum #720789.

This patch extends the workaround to include the tlb_range operations,
which were overlooked by the original patch.

Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP
Colin Cross [Fri, 20 Jul 2012 01:03:42 +0000 (02:03 +0100)]
ARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP

commit 24b35521b8ddf088531258f06f681bb7b227bf47 upstream.

vfp_pm_suspend should save the VFP state in suspend after
any lazy context switch.  If it only saves when the VFP is enabled,
the state can get lost when, on a UP system:
  Thread 1 uses the VFP
  Context switch occurs to thread 2, VFP is disabled but the
     VFP context is not saved
  Thread 2 initiates suspend
  vfp_pm_suspend is called with the VFP disabled, and the unsaved
     VFP context of Thread 1 in the registers

Modify vfp_pm_suspend to save the VFP context whenever
vfp_current_hw_state is not NULL.

Includes a fix from Ido Yariv <ido@wizery.com>, who pointed out that on
SMP systems, the state pointer can be pointing to a freed task struct if
a task exited on another cpu, fixed by using #ifndef CONFIG_SMP in the
new if clause.

Signed-off-by: Colin Cross <ccross@android.com>
Cc: Barry Song <bs14@csr.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ido Yariv <ido@wizery.com>
Cc: Daniel Drake <dsd@laptop.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend
Colin Cross [Fri, 20 Jul 2012 01:03:43 +0000 (02:03 +0100)]
ARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend

commit a84b895a2348f0dbff31b71ddf954f70a6cde368 upstream.

vfp_pm_suspend runs on each cpu, only clear the hardware state
pointer for the current cpu.  Prevents a possible crash if one
cpu clears the hw state pointer when another cpu has already
checked if it is valid.

Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoARM: 7466/1: disable interrupt before spinning endlessly
Shawn Guo [Fri, 13 Jul 2012 07:19:34 +0000 (08:19 +0100)]
ARM: 7466/1: disable interrupt before spinning endlessly

commit 98bd8b96b26db3399a48202318dca4aaa2515355 upstream.

The CPU will endlessly spin at the end of machine_halt and
machine_restart calls.  However, this will lead to a soft lockup
warning after about 20 seconds, if CONFIG_LOCKUP_DETECTOR is enabled,
as system timer is still alive.

Disable interrupt before going to spin endlessly, so that the lockup
warning will never be seen.

Reported-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomm: fix wrong argument of migrate_huge_pages() in soft_offline_huge_page()
Joonsoo Kim [Mon, 30 Jul 2012 21:39:04 +0000 (14:39 -0700)]
mm: fix wrong argument of migrate_huge_pages() in soft_offline_huge_page()

commit dc32f63453f56d07a1073a697dcd843dd3098c09 upstream.

Commit a6bc32b89922 ("mm: compaction: introduce sync-light migration for
use by compaction") changed the declaration of migrate_pages() and
migrate_huge_pages().

But it missed changing the argument of migrate_huge_pages() in
soft_offline_huge_page().  In this case, we should call
migrate_huge_pages() with MIGRATE_SYNC.

Additionally, there is a mismatch between type the of argument and the
function declaration for migrate_pages().

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomemcg: further prevent OOM with too many dirty pages
Hugh Dickins [Tue, 31 Jul 2012 23:45:59 +0000 (16:45 -0700)]
memcg: further prevent OOM with too many dirty pages

commit c3b94f44fcb0725471ecebb701c077a0ed67bd07 upstream.

The may_enter_fs test turns out to be too restrictive: though I saw no
problem with it when testing on 3.5-rc6, it very soon OOMed when I tested
on 3.5-rc6-mm1.  I don't know what the difference there is, perhaps I just
slightly changed the way I started off the testing: dd if=/dev/zero
of=/mnt/temp bs=1M count=1024; rm -f /mnt/temp; sync repeatedly, in 20M
memory.limit_in_bytes cgroup to ext4 on USB stick.

ext4 (and gfs2 and xfs) turn out to allocate new pages for writing with
AOP_FLAG_NOFS: that seems a little worrying, and it's unclear to me why
the transaction needs to be started even before allocating pagecache
memory.  But it may not be worth worrying about these days: if direct
reclaim avoids FS writeback, does __GFP_FS now mean anything?

Anyway, we insisted on the may_enter_fs test to avoid hangs with the loop
device; but since that also masks off __GFP_IO, we can test for __GFP_IO
directly, ignoring may_enter_fs and __GFP_FS.

But even so, the test still OOMs sometimes: when originally testing on
3.5-rc6, it OOMed about one time in five or ten; when testing just now on
3.5-rc6-mm1, it OOMed on the first iteration.

This residual problem comes from an accumulation of pages under ordinary
writeback, not marked PageReclaim, so rightly not causing the memcg check
to wait on their writeback: these too can prevent shrink_page_list() from
freeing any pages, so many times that memcg reclaim fails and OOMs.

Deal with these in the same way as direct reclaim now deals with dirty FS
pages: mark them PageReclaim.  It is appropriate to rotate these to tail
of list when writepage completes, but more importantly, the PageReclaim
flag makes memcg reclaim wait on them if encountered again.  Increment
NR_VMSCAN_IMMEDIATE?  That's arguable: I chose not.

Setting PageReclaim here may occasionally race with end_page_writeback()
clearing it: lru_deactivate_fn() already faced the same race, and
correctly concluded that the window is small and the issue non-critical.

With these changes, the test runs indefinitely without OOMing on ext4,
ext3 and ext2: I'll move on to test with other filesystems later.

Trivia: invert conditions for a clearer block without an else, and goto
keep_locked to do the unlock_page.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomemcg: prevent OOM with too many dirty pages
Michal Hocko [Tue, 31 Jul 2012 23:45:55 +0000 (16:45 -0700)]
memcg: prevent OOM with too many dirty pages

commit e62e384e9da8d9a0c599795464a7e76fd490931c upstream.

The current implementation of dirty pages throttling is not memcg aware
which makes it easy to have memcg LRUs full of dirty pages.  Without
throttling, these LRUs can be scanned faster than the rate of writeback,
leading to memcg OOM conditions when the hard limit is small.

This patch fixes the problem by throttling the allocating process
(possibly a writer) during the hard limit reclaim by waiting on
PageReclaim pages.  We are waiting only for PageReclaim pages because
those are the pages that made one full round over LRU and that means that
the writeback is much slower than scanning.

The solution is far from being ideal - long term solution is memcg aware
dirty throttling - but it is meant to be a band aid until we have a real
fix.  We are seeing this happening during nightly backups which are placed
into containers to prevent from eviction of the real working set.

The change affects only memcg reclaim and only when we encounter
PageReclaim pages which is a signal that the reclaim doesn't catch up on
with the writers so somebody should be throttled.  This could be
potentially unfair because it could be somebody else from the group who
gets throttled on behalf of the writer but as writers need to allocate as
well and they allocate in higher rate the probability that only innocent
processes would be penalized is not that high.

I have tested this change by a simple dd copying /dev/zero to tmpfs or
ext3 running under small memcg (1G copy under 5M, 60M, 300M and 2G
containers) and dd got killed by OOM killer every time.  With the patch I
could run the dd with the same size under 5M controller without any OOM.
The issue is more visible with slower devices for output.

* With the patch
================
* tmpfs size=2G
---------------
$ vim cgroup_cache_oom_test.sh
$ ./cgroup_cache_oom_test.sh 5M
using Limit 5M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 30.4049 s, 34.5 MB/s
$ ./cgroup_cache_oom_test.sh 60M
using Limit 60M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 31.4561 s, 33.3 MB/s
$ ./cgroup_cache_oom_test.sh 300M
using Limit 300M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 20.4618 s, 51.2 MB/s
$ ./cgroup_cache_oom_test.sh 2G
using Limit 2G for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 1.42172 s, 738 MB/s

* ext3
------
$ ./cgroup_cache_oom_test.sh 5M
using Limit 5M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 27.9547 s, 37.5 MB/s
$ ./cgroup_cache_oom_test.sh 60M
using Limit 60M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 30.3221 s, 34.6 MB/s
$ ./cgroup_cache_oom_test.sh 300M
using Limit 300M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 24.5764 s, 42.7 MB/s
$ ./cgroup_cache_oom_test.sh 2G
using Limit 2G for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.35828 s, 312 MB/s

* Without the patch
===================
* tmpfs size=2G
---------------
$ ./cgroup_cache_oom_test.sh 5M
using Limit 5M for group
./cgroup_cache_oom_test.sh: line 46:  4668 Killed                  dd if=/dev/zero of=$OUT/zero bs=1M count=$count
$ ./cgroup_cache_oom_test.sh 60M
using Limit 60M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 25.4989 s, 41.1 MB/s
$ ./cgroup_cache_oom_test.sh 300M
using Limit 300M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 24.3928 s, 43.0 MB/s
$ ./cgroup_cache_oom_test.sh 2G
using Limit 2G for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 1.49797 s, 700 MB/s

* ext3
------
$ ./cgroup_cache_oom_test.sh 5M
using Limit 5M for group
./cgroup_cache_oom_test.sh: line 46:  4689 Killed                  dd if=/dev/zero of=$OUT/zero bs=1M count=$count
$ ./cgroup_cache_oom_test.sh 60M
using Limit 60M for group
./cgroup_cache_oom_test.sh: line 46:  4692 Killed                  dd if=/dev/zero of=$OUT/zero bs=1M count=$count
$ ./cgroup_cache_oom_test.sh 300M
using Limit 300M for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 20.248 s, 51.8 MB/s
$ ./cgroup_cache_oom_test.sh 2G
using Limit 2G for group
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.85201 s, 368 MB/s

[akpm@linux-foundation.org: tweak changelog, reordered the test to optimize for CONFIG_CGROUP_MEM_RES_CTLR=n]
[hughd@google.com: fix deadlock with loop driver]
Reviewed-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agopcdp: use early_ioremap/early_iounmap to access pcdp table
Greg Pearson [Mon, 30 Jul 2012 21:39:05 +0000 (14:39 -0700)]
pcdp: use early_ioremap/early_iounmap to access pcdp table

commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream.

efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
EFI system table and setup an early console for printk output.  The
routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
information.

The call to ioremap is happening early in the boot process which leads
to a panic on x86_64 systems:

    panic+0x01ca
    do_exit+0x043c
    oops_end+0x00a7
    no_context+0x0119
    __bad_area_nosemaphore+0x0138
    bad_area_nosemaphore+0x000e
    do_page_fault+0x0321
    page_fault+0x0020
    reserve_memtype+0x02a1
    __ioremap_caller+0x0123
    ioremap_nocache+0x0012
    efi_setup_pcdp_console+0x002b
    setup_arch+0x03a9
    start_kernel+0x00d4
    x86_64_start_reservations+0x012c
    x86_64_start_kernel+0x00fe

This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
with calls to early_ioremap/early_iounmap which can be called during
early boot.

This patch was tested on an x86_64 prototype system which uses the
HCDP/PCDP table for early console setup.

Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Khalid Aziz <khalid.aziz@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomedia: videobuf-dma-contig: restore buffer mapping for uncached bufers
Lad, Prabhakar [Fri, 22 Jun 2012 09:19:28 +0000 (06:19 -0300)]
media: videobuf-dma-contig: restore buffer mapping for uncached bufers

commit 4099040eaaa4fe543c4e915b8cab51b1d843edee upstream.

from commit a8f3c203e19b702fa5e8e83a9b6fb3c5a6d1cce4
restore the mapping scheme for uncached buffers,
which was changed in a common scheme for cached and uncached.
This apparently was wrong, and was probably intended only for cached buffers.
the fix fixes the crash observed while mapping uncached buffers.

Signed-off-by: Lad, Prabhakar <prabhakar.lad@ti.com>
Signed-off-by: Hadli, Manjunath <manjunath.hadli@ti.com>
Acked-by: Federico Vaga <federico.vaga@gmail.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomedia: m5mols: Correct reported ISO values
Sylwester Nawrocki [Tue, 24 Jul 2012 15:12:07 +0000 (12:12 -0300)]
media: m5mols: Correct reported ISO values

commit 6126b912c84240692e26c1b820a7097610eddf34 upstream.

The V4L2_CID_ISO_SENSITIVITY control menu values should be
standard ISO values multiplied by 1000. Multiply all menu
items by 1000 so ISO is properly reported as 50...3200 range.

This applies to kernels 3.5+.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomedia: ene_ir: Fix driver initialisation
Luis Henriques [Tue, 19 Jun 2012 14:29:49 +0000 (11:29 -0300)]
media: ene_ir: Fix driver initialisation

commit b31b021988fed9e3741a46918f14ba9b063811db upstream.

commit 9ef449c6b31bb6a8e6dedc24de475a3b8c79be20 ("[media] rc: Postpone ISR
registration") fixed an early ISR registration on several drivers.  It did
however also introduced a bug by moving the invocation of pnp_port_start()
to the end of the probe function.

This patch fixes this issue by moving the invocation of pnp_port_start() to
an earlier stage in the probe function.

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonilfs2: fix deadlock issue between chcp and thaw ioctls
Ryusuke Konishi [Mon, 30 Jul 2012 21:42:07 +0000 (14:42 -0700)]
nilfs2: fix deadlock issue between chcp and thaw ioctls

commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream.

An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command:

 chcp            D ffff88013870f3d0     0  1325   1324 0x00000004
 ...
 Call Trace:
   nilfs_transaction_begin+0x11c/0x1a0 [nilfs2]
   wake_up_bit+0x20/0x20
   copy_from_user+0x18/0x30 [nilfs2]
   nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2]
   nilfs_ioctl+0x252/0x61a [nilfs2]
   do_page_fault+0x311/0x34c
   get_unmapped_area+0x132/0x14e
   do_vfs_ioctl+0x44b/0x490
   __set_task_blocked+0x5a/0x61
   vm_mmap_pgoff+0x76/0x87
   __set_current_blocked+0x30/0x4a
   sys_ioctl+0x4b/0x6f
   system_call_fastpath+0x16/0x1b
 thaw            D ffff88013870d890     0  1352   1351 0x00000004
 ...
 Call Trace:
   rwsem_down_failed_common+0xdb/0x10f
   call_rwsem_down_write_failed+0x13/0x20
   down_write+0x25/0x27
   thaw_super+0x13/0x9e
   do_vfs_ioctl+0x1f5/0x490
   vm_mmap_pgoff+0x76/0x87
   sys_ioctl+0x4b/0x6f
   filp_close+0x64/0x6c
   system_call_fastpath+0x16/0x1b

where the thaw ioctl deadlocked at thaw_super() when called while chcp was
waiting at nilfs_transaction_begin() called from
nilfs_ioctl_change_cpmode().  This deadlock is 100% reproducible.

This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in
read mode and then waits for unfreezing in nilfs_transaction_begin(),
whereas thaw_super() locks sb->s_umount in write mode.  The locking of
sb->s_umount here was intended to make snapshot mounts and the downgrade
of snapshots to checkpoints exclusive.

This fixes the deadlock issue by replacing the sb->s_umount usage in
nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot
mounts.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agomISDN: Bugfix only few bytes are transfered on a connection
Karsten Keil [Sun, 29 Jul 2012 07:15:13 +0000 (07:15 +0000)]
mISDN: Bugfix only few bytes are transfered on a connection

commit b41a9a66f67817f8acd85bd650e012a14da39faa upstream.

The test for the fillempty condition was wrong in one place.
Changed the variable to the right boolean type.

Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoSUNRPC: return negative value in case rpcbind client creation error
Stanislav Kinsbursky [Fri, 20 Jul 2012 11:57:48 +0000 (15:57 +0400)]
SUNRPC: return negative value in case rpcbind client creation error

commit caea33da898e4e14f0ba58173e3b7689981d2c0b upstream.

Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agosunrpc: clnt: Add missing braces
Joe Perches [Wed, 18 Jul 2012 18:17:11 +0000 (11:17 -0700)]
sunrpc: clnt: Add missing braces

commit cac5d07e3ca696dcacfb123553cf6c722111cfd3 upstream.

Add a missing set of braces that commit 4e0038b6b24
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agolib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q)
Dan Rosenberg [Mon, 30 Jul 2012 21:40:26 +0000 (14:40 -0700)]
lib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q)

commit 3715c5309f6d175c3053672b73fd4f73be16fd07 upstream.

When using ALT+SysRq+Q all the pointers are replaced with "pK-error" like
this:

[23153.208033]   .base:               pK-error

with echo h > /proc/sysrq-trigger it works:

[23107.776363]   .base:       ffff88023e60d540

The intent behind this behavior was to return "pK-error" in cases where
the %pK format specifier was used in interrupt context, because the
CAP_SYSLOG check wouldn't be meaningful.  Clearly this should only apply
when kptr_restrict is actually enabled though.

Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com>
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoselinux: fix selinux_inode_setxattr oops
Al Viro [Sat, 9 Jun 2012 07:15:16 +0000 (08:15 +0100)]
selinux: fix selinux_inode_setxattr oops

commit e3fea3f70fd68af0574a5f24246cdb4ed07f2b74 upstream.

OK, what we have so far is e.g.
setxattr(path, name, whatever, 0, XATTR_REPLACE)
with name being good enough to get through xattr_permission().
Then we reach security_inode_setxattr() with the desired value and size.
Aha.  name should begin with "security.selinux", or we won't get that
far in selinux_inode_setxattr().  Suppose we got there and have enough
permissions to relabel that sucker.  We call security_context_to_sid()
with value == NULL, size == 0.  OK, we want ss_initialized to be non-zero.
I.e. after everything had been set up and running.  No problem...

We do 1-byte kmalloc(), zero-length memcpy() (which doesn't oops, even
thought the source is NULL) and put a NUL there.  I.e. form an empty
string.  string_to_context_struct() is called and looks for the first
':' in there.  Not found, -EINVAL we get.  OK, security_context_to_sid_core()
has rc == -EINVAL, force == 0, so it silently returns -EINVAL.
All it takes now is not having CAP_MAC_ADMIN and we are fucked.

All right, it might be a different bug (modulo strange code quoted in the
report), but it's real.  Easily fixed, AFAICS:

Deal with size == 0, value == NULL case in selinux_inode_setxattr()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Dave Jones <davej@redhat.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoasus-wmi: use ASUS_WMI_METHODID_DSTS2 as default DSTS ID.
Alex Hung [Wed, 20 Jun 2012 03:47:35 +0000 (11:47 +0800)]
asus-wmi: use ASUS_WMI_METHODID_DSTS2 as default DSTS ID.

commit 63a78bb1051b240417daad3a3fa9c1bb10646dca upstream.

According to responses from the BIOS team, ASUS_WMI_METHODID_DSTS2
(0x53545344) will be used as future DSTS ID. In addition, calling
asus_wmi_evaluate_method(ASUS_WMI_METHODID_DSTS2, 0, 0, NULL) returns
ASUS_WMI_UNSUPPORTED_METHOD in new ASUS laptop PCs. This patch fixes
no DSTS ID will be assigned in this case.

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoRedefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts
Tony Luck [Thu, 26 Jul 2012 17:55:26 +0000 (10:55 -0700)]
Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts

commit a119365586b0130dfea06457f584953e0ff6481d upstream.

The following build error occured during a ia64 build with
swap-over-NFS patches applied.

net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant

This is identical to a parisc build error. Fengguang Wu, Mel Gorman
and James Bottomley did all the legwork to track the root cause of
the problem. This fix and entire commit log is shamelessly copied
from them with one extra detail to change a dubious runtime use of
ATOMIC_INIT() to atomic_set() in drivers/char/mspec.c

Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });

The above line contains two compound literals.  It also uses a designated
initializer to initialize the field enabled.  A compound literal is not a
constant expression.

The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agovirtio-blk: Use block layer provided spinlock
Asias He [Fri, 25 May 2012 08:03:27 +0000 (16:03 +0800)]
virtio-blk: Use block layer provided spinlock

commit 2c95a3290919541b846bee3e0fbaa75860929f53 upstream.

Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.

        if (q->queue_lock != &q->__queue_lock)
                q->queue_lock = &q->__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
 #0:  (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.

Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.

Changes in v2: Improve commit log as Michael suggested.

Signed-off-by: Asias He <asias@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agovirtio-blk: Reset device after blk_cleanup_queue()
Asias He [Fri, 25 May 2012 02:34:48 +0000 (10:34 +0800)]
virtio-blk: Reset device after blk_cleanup_queue()

commit 483001c765af6892b3fc3726576cb42f17d1d6b5 upstream.

blk_cleanup_queue() will call blk_drian_queue() to drain all the
requests before queue DEAD marking. If we reset the device before
blk_cleanup_queue() the drain would fail.

1) if the queue is stopped in do_virtblk_request() because device is
full, the q->request_fn() will not be called.

blk_drain_queue() {
   while(true) {
      ...
      if (!list_empty(&q->queue_head))
        __blk_run_queue(q) {
    if (queue is not stoped)
q->request_fn()
}
      ...
   }
}

Do no reset the device before blk_cleanup_queue() gives the chance to
start the queue in interrupt handler blk_done().

2) In commit b79d866c8b7014a51f611a64c40546109beaf24a, We abort requests
dispatched to driver before blk_cleanup_queue(). There is a race if
requests are dispatched to driver after the abort and before the queue
DEAD mark. To fix this, instead of aborting the requests explicitly, we
can just reset the device after after blk_cleanup_queue so that the
device can complete all the requests before queue DEAD marking in the
drain process.

Signed-off-by: Asias He <asias@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agovirtio-blk: Call del_gendisk() before disable guest kick
Asias He [Fri, 25 May 2012 02:34:47 +0000 (10:34 +0800)]
virtio-blk: Call del_gendisk() before disable guest kick

commit 02e2b124943648fba0a2ccee5c3656a5653e0151 upstream.

del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.

virtblk_remove()
  vdev->config->reset() : guest will not kick us through interrupt
    del_gendisk()
      device_del()
        kobject_del(): got stuck, sysfs entry ref count non zero

sysfs_open_file(): user space process read /sys/block/vda/serial
   sysfs_get_active() : got sysfs entry ref count
      dev_attr_show()
        virtblk_serial_show()
           blk_execute_rq() : got stuck, interrupt is disabled
                              request cannot be finished

This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.

This fixes another race in hot-unplug process.

It is save to call del_gendisk(vblk->disk) before
flush_work(&vblk->config_work) which might access vblk->disk, because
vblk->disk is not freed until put_disk(vblk->disk).

Signed-off-by: Asias He <asias@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoLinux 3.5.1 v3.5.1
Greg Kroah-Hartman [Thu, 9 Aug 2012 15:23:56 +0000 (08:23 -0700)]
Linux 3.5.1

12 years agofutex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
Darren Hart [Fri, 20 Jul 2012 18:53:31 +0000 (11:53 -0700)]
futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()

commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agofutex: Fix bug in WARN_ON for NULL q.pi_state
Darren Hart [Fri, 20 Jul 2012 18:53:30 +0000 (11:53 -0700)]
futex: Fix bug in WARN_ON for NULL q.pi_state

commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream.

The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing
the address (&q.pi_state) of the pointer instead of the value
(q.pi_state) of the pointer. Correct it accordingly.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agofutex: Test for pi_mutex on fault in futex_wait_requeue_pi()
Darren Hart [Fri, 20 Jul 2012 18:53:29 +0000 (11:53 -0700)]
futex: Test for pi_mutex on fault in futex_wait_requeue_pi()

commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream.

If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test
for pi_mutex != NULL before testing the owner against current
and possibly unlocking it.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agom68k: Make sys_atomic_cmpxchg_32 work on classic m68k
Andreas Schwab [Fri, 27 Jul 2012 22:20:34 +0000 (00:20 +0200)]
m68k: Make sys_atomic_cmpxchg_32 work on classic m68k

commit 9e2760d18b3cf179534bbc27692c84879c61b97c upstream.

User space access must always go through uaccess accessors, since on
classic m68k user space and kernel space are completely separate.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoASoC: wm8994: Ensure there are enough BCLKs for four channels
Mark Brown [Fri, 22 Jun 2012 16:21:17 +0000 (17:21 +0100)]
ASoC: wm8994: Ensure there are enough BCLKs for four channels

commit b8edf3e5522735c8ce78b81845f7a1a2d4a08626 upstream.

Otherwise if someone tries to use all four channels on AIF1 with the
device in master mode we won't be able to clock out all the data.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoASoC: wm8962: Allow VMID time to fully ramp
Mark Brown [Mon, 30 Jul 2012 17:24:19 +0000 (18:24 +0100)]
ASoC: wm8962: Allow VMID time to fully ramp

commit 9d40e5582c9c4cfb6977ba2a0ca9c2ed82c56f21 upstream.

Required for reliable power up from cold.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Support dock on Lenovo Thinkpad T530 with ALC269VC
Takashi Iwai [Thu, 2 Aug 2012 07:04:39 +0000 (09:04 +0200)]
ALSA: hda - Support dock on Lenovo Thinkpad T530 with ALC269VC

commit 707fba3fa76a4c8855552f5d4c1a12430c09bce8 upstream.

Lenovo Thinkpad T530 with ALC269VC codec has a dock port but BIOS
doesn't set up the pins properly.  Enable the pins as well as on
Thinkpad X230 Tablet.

Reported-and-tested-by: Mario <anyc@hadiko.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix mute-LED GPIO initialization for IDT codecs
Takashi Iwai [Tue, 31 Jul 2012 08:40:05 +0000 (10:40 +0200)]
ALSA: hda - Fix mute-LED GPIO initialization for IDT codecs

commit 1f43f6c1bc8d740e75b4177eb29110858bb5fea2 upstream.

The IDT codecs initializes the GPIO setup for mute LEDs via
snd_hda_sync_vmaster_hook().  This works in most cases except for the
very first call, which is called before PCM and control creations.
Thus before Master switch is set manually via alsactl, the mute LED
may show the wrong state, depending on the polarity.

Now it's fixed by calling the LED-status update function manually when
no vmaster is set yet.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix polarity of mute LED on HP Mini 210
Takashi Iwai [Tue, 31 Jul 2012 08:16:59 +0000 (10:16 +0200)]
ALSA: hda - Fix polarity of mute LED on HP Mini 210

commit ff8a1e274cbc11da6b57849f925b895a212b56c9 upstream.

The commit a3e199732b made the LED working again on HP Mini 210 but
with a wrong polarity.  This patch fixes the polarity for this
machine, and also introduce a new model string "hp-inv-led".

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=772923

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix mute-LED GPIO setup for HP Mini 210
Takashi Iwai [Thu, 26 Jul 2012 06:17:20 +0000 (08:17 +0200)]
ALSA: hda - Fix mute-LED GPIO setup for HP Mini 210

commit a3e199732b8e2b272e82cc1ccc49c35239ed6c5a upstream.

BIOS on HP Mini 210 doesn't provide the proper "HP_Mute_LED" DMI
string, thus the driver doesn't initialize the GPIO, too.  In the
earlier kernel, the driver falls back to GPIO1, but since 3.3 we've
stopped this due to other wrongly advertised machines.

For fixing this particular case, add a new model type to specify the
default polarity explicitly so that the fallback to GPIO1 is handled.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=772923

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs
Takashi Iwai [Wed, 25 Jul 2012 11:54:55 +0000 (13:54 +0200)]
ALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs

commit 6162552b0de6ba80937c3dd53e084967851cd199 upstream.

We've got a bug report about the silent output from the headphone on a
mobo with VT2021, and spotted out that this was because of the wrong
D3 state on the DAC for the headphone output.  The bug is triggered by
the incomplete check for this DAC in set_widgets_power_state_vt1718S().
It checks only the connectivity of the primary output (0x27) but
doesn't consider the path from the headphone pin (0x28).

Now this patch fixes the problem by checking both pins for DAC 0x0b.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: mpu401: Fix missing initialization of irq field
Takashi Iwai [Mon, 23 Jul 2012 09:35:55 +0000 (11:35 +0200)]
ALSA: mpu401: Fix missing initialization of irq field

commit bc733d495267a23ef8660220d696c6e549ce30b3 upstream.

The irq field of struct snd_mpu401 is supposed to be initialized to -1.
Since it's set to zero as of now, a probing error before the irq
installation results in a kernel warning "Trying to free already-free
IRQ 0".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoALSA: snd-usb: fix clock source validity index
Daniel Mack [Wed, 1 Aug 2012 08:16:53 +0000 (10:16 +0200)]
ALSA: snd-usb: fix clock source validity index

commit aff252a848ce21b431ba822de3dab9c4c94571cb upstream.

uac_clock_source_is_valid() uses the control selector value to access
the bmControls bitmap of the clock source unit. This is wrong, as
control selector values start from 1, while the bitmap uses all
available bits.

In other words, "Clock Validity Control" is stored in D3..2, not D5..4
of the clock selector unit's bmControls.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Andreas Koch <andreas@akdesigninc.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoUSB: echi-dbgp: increase the controller wait time to come out of halt.
Colin Ian King [Mon, 30 Jul 2012 15:06:42 +0000 (16:06 +0100)]
USB: echi-dbgp: increase the controller wait time to come out of halt.

commit f96a4216e85050c0a9d41a41ecb0ae9d8e39b509 upstream.

The default 10 microsecond delay for the controller to come out of
halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond.

This is based on emperical testing on various USB debug ports on
modern machines such as a Lenovo X220i and an Ivybridge development
platform that needed to wait ~450-950 microseconds.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet/tun: fix ioctl() based info leaks
Mathias Krause [Sun, 29 Jul 2012 19:45:14 +0000 (19:45 +0000)]
net/tun: fix ioctl() based info leaks

[ Upstream commits a117dacde0288f3ec60b6e5bcedae8fa37ee0dfc
  and 8bbb181308bc348e02bfdbebdedd4e4ec9d452ce ]

The tun module leaks up to 36 bytes of memory by not fully initializing
a structure located on the stack that gets copied to user memory by the
TUNGETIFF and SIOCGIFHWADDR ioctl()s.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotcp: perform DMA to userspace only if there is a task waiting for it
Jiri Kosina [Fri, 27 Jul 2012 10:38:50 +0000 (10:38 +0000)]
tcp: perform DMA to userspace only if there is a task waiting for it

[ Upstream commit 59ea33a68a9083ac98515e4861c00e71efdc49a1 ]

Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling
Jiri Benc [Fri, 27 Jul 2012 02:58:22 +0000 (02:58 +0000)]
net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling

[ Upstream commit b1beb681cba5358f62e6187340660ade226a5fcc ]

When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
IFF_ALLMULTI bits in dev->gflags according to the passed value but
do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
from dev->flags.

This can be easily trigerred by doing:

tcpdump -i eth0 &
ip l s up eth0

ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
IFF_PROMISC in gflags.

Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoUSB: kaweth.c: use GFP_ATOMIC under spin_lock
Dan Carpenter [Fri, 27 Jul 2012 01:46:51 +0000 (01:46 +0000)]
USB: kaweth.c: use GFP_ATOMIC under spin_lock

[ Upstream commit e4c7f259c5be99dcfc3d98f913590663b0305bf8 ]

The problem is that we call this with a spin lock held.  The call tree
is:
kaweth_start_xmit() holds kaweth->device_lock.
-> kaweth_async_set_rx_mode()
   -> kaweth_control()
      -> kaweth_internal_control_msg()

The kaweth_internal_control_msg() function is only called from
kaweth_control() which used GFP_ATOMIC for its allocations.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agotcp: Add TCP_USER_TIMEOUT negative value check
Hangbin Liu [Thu, 26 Jul 2012 22:52:21 +0000 (22:52 +0000)]
tcp: Add TCP_USER_TIMEOUT negative value check

[ Upstream commit 42493570100b91ef663c4c6f0c0fdab238f9d3c2 ]

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
values. If a user assign -1 to it, the socket will set successfully and wait
for 4294967295 miliseconds. This patch add a negative value check to avoid
this issue.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agowanmain: comparing array with NULL
Alan Cox [Tue, 24 Jul 2012 08:16:25 +0000 (08:16 +0000)]
wanmain: comparing array with NULL

[ Upstream commit 8b72ff6484fe303e01498b58621810a114f3cf09 ]

gcc really should warn about these !

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agocaif: fix NULL pointer check
Alan Cox [Tue, 24 Jul 2012 02:42:14 +0000 (02:42 +0000)]
caif: fix NULL pointer check

[ Upstream commit c66b9b7d365444b433307ebb18734757cb668a02 ]

Reported-by: <rucsoftsec@gmail.com>
Resolves-bug: http://bugzilla.kernel.org/show_bug?44441
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agor8169: revert "add byte queue limit support".
Francois Romieu [Mon, 23 Jul 2012 20:55:55 +0000 (22:55 +0200)]
r8169: revert "add byte queue limit support".

[ Upstream commit 17bcb684f08649a2ab6a7dcd8288332e72d208f1 ]

This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.

First it appears in bisection, then reverting it solves the usual
netdev watchdog problem for different people. I don't have a proper
fix yet so get rid of it.

Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonet: Fix references to out-of-scope variables in put_cmsg_compat()
Jesper Juhl [Sun, 22 Jul 2012 11:37:20 +0000 (11:37 +0000)]
net: Fix references to out-of-scope variables in put_cmsg_compat()

[ Upstream commit 818810472b129004c16fc51bf0a570b60776bfb7 ]

In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
either the 'ctv' or 'cts' local variables inside the 'if
(!COMPAT_USE_64BIT_TIME)' branch.

Those variables go out of scope at the end of the 'if' statement, so
when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
it may be refering to - not good.

Fix the problem by simply giving 'ctv' and 'cts' function scope.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: use s_csum_seed instead of i_csum_seed for xattr block
Tao Ma [Mon, 9 Jul 2012 20:29:27 +0000 (16:29 -0400)]
ext4: use s_csum_seed instead of i_csum_seed for xattr block

commit 41eb70dde42b2360074a559a6f1fc49860a50179 upstream.

In xattr block operation, we use h_refcount to indicate whether the
xattr block is shared among many inodes. And xattr block csum uses
s_csum_seed if it is shared and i_csum_seed if it belongs to
one inode. But this has a problem. So consider the block is shared
first bewteen inode A and B, and B has some xattr update and CoW
the xattr block. When it updates the *old* xattr block(because
of the h_refcount change) and calls ext4_xattr_release_block, we
has no idea that inode A is the real owner of the *old* xattr
block and we can't use the i_csum_seed of inode A either in xattr
block csum calculation. And I don't think we have an easy way to
find inode A.

So this patch just removes the tricky i_csum_seed and we now uses
s_csum_seed every time for the xattr block csum. The corresponding
patch for the e2fsprogs will be sent in another patch.

This is spotted by xfstests 117.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: use proper csum calculation in ext4_rename
Tao Ma [Mon, 9 Jul 2012 20:29:05 +0000 (16:29 -0400)]
ext4: use proper csum calculation in ext4_rename

commit ef58f69c3c34f6377f1e21d3533c806dbd980ad0 upstream.

In ext4_rename, when the old name is a dir, we need to
change ".." to its new parent and journal the change, so
with metadata_csum enabled, we have to re-calc the csum.

As the first block of the dir can be either a htree root
or a normal directory block and we have different csum
calculation for these 2 types, we have to choose the right
one in ext4_rename.

btw, it is found by xfstests 013.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: undo ext4_calc_metadata_amount if we fail to claim space
Theodore Ts'o [Mon, 23 Jul 2012 04:00:20 +0000 (00:00 -0400)]
ext4: undo ext4_calc_metadata_amount if we fail to claim space

commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream.

The function ext4_calc_metadata_amount() has side effects, although
it's not obvious from its function name.  So if we fail to claim
space, regardless of whether we retry to claim the space again, or
return an error, we need to undo these side effects.

Otherwise we can end up incorrectly calculating the number of metadata
blocks needed for the operation, which was responsible for an xfstests
failure for test #271 when using an ext2 file system with delalloc
enabled.

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: don't let i_reserved_meta_blocks go negative
Brian Foster [Mon, 23 Jul 2012 03:59:40 +0000 (23:59 -0400)]
ext4: don't let i_reserved_meta_blocks go negative

commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.

If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks.  In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.

This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:

Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost

270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair.  The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: fix hole punch failure when depth is greater than 0
Ashish Sangwan [Mon, 23 Jul 2012 02:49:08 +0000 (22:49 -0400)]
ext4: fix hole punch failure when depth is greater than 0

commit 968dee77220768a5f52cf8b21d0bdb73486febef upstream.

Whether to continue removing extents or not is decided by the return
value of function ext4_ext_more_to_rm() which checks 2 conditions:
a) if there are no more indexes to process.
b) if the number of entries are decreased in the header of "depth -1".

In case of hole punch, if the last block to be removed is not part of
the last extent index than this index will not be deleted, hence the
number of valid entries in the extent header of "depth - 1" will
remain as it is and ext4_ext_more_to_rm will return 0 although the
required blocks are not yet removed.

This patch fixes the above mentioned problem as instead of removing
the extents from the end of file, it starts removing the blocks from
the particular extent from which removing blocks is actually required
and continue backward until done.

Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: fix overhead calculation used by ext4_statfs()
Theodore Ts'o [Mon, 9 Jul 2012 20:27:05 +0000 (16:27 -0400)]
ext4: fix overhead calculation used by ext4_statfs()

commit 952fc18ef9ec707ebdc16c0786ec360295e5ff15 upstream.

Commit f975d6bcc7a introduced bug which caused ext4_statfs() to
miscalculate the number of file system overhead blocks.  This causes
the f_blocks field in the statfs structure to be larger than it should
be.  This would in turn cause the "df" output to show the number of
data blocks in the file system and the number of data blocks used to
be larger than they should be.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoext4: pass a char * to ext4_count_free() instead of a buffer_head ptr
Theodore Ts'o [Sat, 30 Jun 2012 23:14:57 +0000 (19:14 -0400)]
ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr

commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream.

Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonouveau: Fix alignment requirements on src and dst addresses
Maarten Lankhorst [Mon, 4 Jun 2012 10:00:31 +0000 (12:00 +0200)]
nouveau: Fix alignment requirements on src and dst addresses

commit ce806a30470bcd846d148bf39d46de3ad7748228 upstream.

Linear copy works by adding the offset to the buffer address,
which may end up not being 16-byte aligned.

Some tests I've written for prime_pcopy show that the engine
allows this correctly, so the restriction on lowest 4 bits of
address can be lifted safely.

The comments added were by envyas, I think because I used
a newer version.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoNFS: Fix a number of bugs in the idmapper
David Howells [Wed, 25 Jul 2012 15:53:36 +0000 (16:53 +0100)]
NFS: Fix a number of bugs in the idmapper

commit a427b9ec4eda8cd6e641ea24541d30b641fc3140 upstream.

Fix a number of bugs in the NFS idmapper code:

 (1) Only registered key types can be passed to the core keys code, so
     register the legacy idmapper key type.

     This is a requirement because the unregister function cleans up keys
     belonging to that key type so that there aren't dangling pointers to the
     module left behind - including the key->type pointer.

 (2) Rename the legacy key type.  You can't have two key types with the same
     name, and (1) would otherwise require that.

 (3) complete_request_key() must be called in the error path of
     nfs_idmap_legacy_upcall().

 (4) There is one idmap struct for each nfs_client struct.  This means that
     idmap->idmap_key_cons is shared without the use of a lock.  This is a
     problem because key_instantiate_and_link() - as called indirectly by
     idmap_pipe_downcall() - releases anyone waiting for the key to be
     instantiated.

     What happens is that idmap_pipe_downcall() running in the rpc.idmapd
     thread, releases the NFS filesystem in whatever thread that is running in
     to continue.  This may then make another idmapper call, overwriting
     idmap_key_cons before idmap_pipe_downcall() gets the chance to call
     complete_request_key().

     I *think* that reading idmap_key_cons only once, before
     key_instantiate_and_link() is called, and then caching the result in a
     variable is sufficient.

Bug (4) is the cause of:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] SMP
CPU 1
Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff8801a3645d40  EFLAGS: 00010246
RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
FS:  00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
Stack:
 ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
Call Trace:
 [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100
 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0
 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
 [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc]
 [<ffffffff8117f833>] vfs_write+0xb3/0x180
 [<ffffffff8117fb5a>] sys_write+0x4a/0x90
 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8801a3645d40>
CR2: 0000000000000000

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonfs: skip commit in releasepage if we're freeing memory for fs-related reasons
Jeff Layton [Mon, 23 Jul 2012 17:58:51 +0000 (13:58 -0400)]
nfs: skip commit in releasepage if we're freeing memory for fs-related reasons

commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonfsd4: fix cr_principal comparison check in same_creds
Vivek Trivedi [Tue, 24 Jul 2012 15:48:20 +0000 (21:18 +0530)]
nfsd4: fix cr_principal comparison check in same_creds

commit 5559b50acdcdcad7e362882d3261bf934c9436f6 upstream.

This fixes a wrong check for same cr_principal in same_creds

Introduced by 8fbba96e5b327665265ad02b7f331b68536828bf "nfsd4: stricter
cred comparison for setclientid/exchange_id".

Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agonfsd4: our filesystems are normally case sensitive
J. Bruce Fields [Tue, 5 Jun 2012 20:52:06 +0000 (16:52 -0400)]
nfsd4: our filesystems are normally case sensitive

commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream.

Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodm thin: fix memory leak in process_prepared_mapping error paths
Joe Thornber [Fri, 27 Jul 2012 14:08:05 +0000 (15:08 +0100)]
dm thin: fix memory leak in process_prepared_mapping error paths

commit 905386f82d08f66726912f303f3e6605248c60a3 upstream.

Fix memory leak in process_prepared_mapping by always freeing
the dm_thin_new_mapping structs from the mapping_pool mempool on
the error paths.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodm thin: reduce endio_hook pool size
Alasdair G Kergon [Fri, 27 Jul 2012 14:07:57 +0000 (15:07 +0100)]
dm thin: reduce endio_hook pool size

commit 7768ed33ccdc02801c4483fc5682dc66ace14aea upstream.

Reduce the slab size used for the dm_thin_endio_hook mempool.

Allocation has been seen to fail on machines with smaller amounts
of memory due to fragmentation.

  lvm: page allocation failure. order:5, mode:0xd0
  device-mapper: table: 253:38: thin-pool: Error creating pool's endio_hook mempool

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agoposix_types.h: Cleanup stale __NFDBITS and related definitions
Josh Boyer [Wed, 25 Jul 2012 14:40:34 +0000 (10:40 -0400)]
posix_types.h: Cleanup stale __NFDBITS and related definitions

commit 8ded2bbc1845e19c771eb55209aab166ef011243 upstream.

Recently, glibc made a change to suppress sign-conversion warnings in
FD_SET (glibc commit ceb9e56b3d1).  This uncovered an issue with the
kernel's definition of __NFDBITS if applications #include
<linux/types.h> after including <sys/select.h>.  A build failure would
be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
flags to gcc.

It was suggested that the kernel should either match the glibc
definition of __NFDBITS or remove that entirely.  The current in-kernel
uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
uses of the related __FDELT and __FDMASK defines.  Given that, we'll
continue the cleanup that was started with commit 8b3d1cda4f5f
("posix_types: Remove fd_set macros") and drop the remaining unused
macros.

Additionally, linux/time.h has similar macros defined that expand to
nothing so we'll remove those at the same time.

Reported-by: Jeff Law <law@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
[ .. and fix up whitespace as per akpm ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodrm/nouveau: init vblank requests list
Marcin Slusarz [Wed, 25 Jul 2012 18:42:05 +0000 (20:42 +0200)]
drm/nouveau: init vblank requests list

commit 715855457e6bc93e148caf8cb3b5dcabbf605b0d upstream.

Fixes kernel panic when vblank interrupt triggers before first sync to
vblank request.

(Besides init, remove some relevant leftovers from vblank rework)

Reported-by: Ortwin Glück <odi@odi.ch>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodrm/radeon: fix dpms on/off on trinity/aruba v2
Jerome Glisse [Tue, 24 Jul 2012 21:06:11 +0000 (17:06 -0400)]
drm/radeon: fix dpms on/off on trinity/aruba v2

commit fcedac670c3da0d17aaa5db1708694971e8024a9 upstream.

The external encoder need to be setup again before enabling the
transmiter. This seems to be only needed on some trinity/aruba
to fix dpms on.

v2: Add comment, only setup again on dce6 ie aruba or newer.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodrm/radeon: on hotplug force link training to happen (v2)
Jerome Glisse [Thu, 19 Jul 2012 21:25:55 +0000 (17:25 -0400)]
drm/radeon: on hotplug force link training to happen (v2)

commit ca2ccde5e2f24a792caa4cca919fc5c6f65d1887 upstream.

To have DP behave like VGA/DVI we need to retrain the link
on hotplug. For this to happen we need to force link
training to happen by setting connector dpms to off
before asking it turning it on again.

v2: agd5f
- drop the dp_get_link_status() change in atombios_dp.c
  for now.  We still need the dpms OFF change.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodrm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2)
Jerome Glisse [Thu, 19 Jul 2012 21:15:56 +0000 (17:15 -0400)]
drm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2)

commit 266dcba541a1ef7e5d82d9e67c67fde2910636e8 upstream.

No need to retrain the link for passive adapters.

v2: agd5f
- no passive DP to VGA adapters, update comments
- assign radeon_connector_atom_dig after we are sure
  we have a digital connector as analog connectors
  have different private data.
- get new sink type before checking for retrain.  No
  need to check if it's no longer a DP connection.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodrm/radeon: fix non revealent error message
Jerome Glisse [Tue, 17 Jul 2012 21:17:16 +0000 (17:17 -0400)]
drm/radeon: fix non revealent error message

commit 8d1c702aa0b2c4b22b0742b72a1149d91690674b upstream.

We want to print link status query failed only if it's
an unexepected fail. If we query to see if we need
link training it might be because there is nothing
connected and thus link status query have the right
to fail in that case.

To avoid printing failure when it's expected, move the
failure message to proper place.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 years agodrm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns.
Michel Dänzer [Tue, 17 Jul 2012 17:02:09 +0000 (19:02 +0200)]
drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns.

commit f60ec4c7df043df81e62891ac45383d012afe0da upstream.

This could previously fail if either of the enabled displays was using a
horizontal resolution that is a multiple of 128, and only the leftmost column
of the cursor was (supposed to be) visible at the right edge of that display.

The solution is to move the cursor one pixel to the left in that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33183

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>