dm snapshot: fix header corruption race on invalidation
If a persistent snapshot fills up, a race can corrupt the on-disk header
which causes a crash on any future attempt to activate the snapshot
(typically while booting). This patch fixes the race.
When the snapshot overflows, __invalidate_snapshot is called, which calls
snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
calls write_header. write_header constructs the new header in the "area"
location.
Concurrently, an existing kcopyd job may finish, call copy_callback
and commit_exception method, that goes to persistent_commit_exception.
persistent_commit_exception doesn't do locking, relying on the fact that
callbacks are single-threaded, but it can race with snapshot invalidation and
overwrite the header that is just being written while the snapshot is being
invalidated.
The result of this race is a corrupted header being written that can
lead to a crash on further reactivation (if chunk_size is zero in the
corrupted header).
The fix is to use separate memory areas for each.
See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506
dm log: userspace add luid to distinguish between concurrent log instances
Device-mapper userspace logs (like the clustered log) are
identified by a universally unique identifier (UUID). This
identifier is used to associate requests from the kernel to
a specific log in userspace. The UUID must be unique everywhere,
since multiple machines may use this identifier when communicating
about a particular log, as is the case for cluster logs.
Sometimes, device-mapper/LVM may re-use a UUID. This is the
case during pvmoves, when moving from one segment of an LV
to another, or when resizing a mirror, etc. In these cases,
a new log is created with the same UUID and loaded in the
"inactive" slot. When a device-mapper "resume" is issued,
the "live" table is deactivated and the new "inactive" table
becomes "live". (The "inactive" table can also be removed
via a device-mapper 'clear' command.)
The above two issues were colliding. More than one log was being
created with the same UUID, and there was no way to distinguish
between them. So, sometimes the wrong log would be swapped
out during the exchange.
The solution is to create a locally unique identifier,
'luid', to go along with the UUID. This new identifier is used
to determine exactly which log is being referenced by the kernel
when the log exchange is made. The identifier is not
universally safe, but it does not need to be, since
create/destroy/suspend/resume operations are bound to a specific
machine; and these are the operations that make up the exchange.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
dm raid1: do not allow log_failure variable to unset after being set
This patch fixes a bug which was triggering a case where the primary leg
could not be changed on failure even when the mirror was in-sync.
The case involves the failure of the primary device along with
the transient failure of the log device. The problem is that
bios can be put on the 'failures' list (due to log failure)
before 'fail_mirror' is called due to the primary device failure.
Normally, this is fine, but if the log device failure is transient,
a subsequent iteration of the work thread, 'do_mirror', will
reset 'log_failure'. The 'do_failures' function then resets
the 'in_sync' variable when processing bios on the failures list.
The 'in_sync' variable is what is used to determine if the
primary device can be switched in the event of a failure. Since
this has been reset, the primary device is incorrectly assumed
to be not switchable.
The case has been seen in the cluster mirror context, where one
machine realizes the log device is dead before the other machines.
As the responsibilities of the server migrate from one node to
another (because the mirror is being reconfigured due to the failure),
the new server may think for a moment that the log device is fine -
thus resetting the 'log_failure' variable.
In any case, it is inappropiate for us to reset the 'log_failure'
variable. The above bug simply illustrates that it can actually
hurt us.
Cc: stable@kernel.org Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
dm log: remove incorrect field from userspace table output
The output of 'dmsetup table' includes an internal field that should not
be there. This patch removes it. To make the fix simpler, we first
reorder a constructor argument
The 'device size' argument is generated internally. Currently it is
placed as the last space-separated word of the constructor string.
However, we need to use a version of the string without this word, so we
move it to the beginning instead so it is trivial to skip past it.
We keep a copy of the arguments passed to userspace for creating a log,
just in case we need to resend them. These are the same arguments that
are desired in the STATUSTYPE_TABLE request, except for one. When
creating the userspace log, the userspace daemon must know the size of
the mirror, so that is added to the arguments given in the constructor
table. We were printing this extra argument out as well, which is a
mistake.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Fri, 4 Sep 2009 19:40:25 +0000 (20:40 +0100)]
dm stripe: expose correct io hints
Set sensible I/O hints for striped DM devices in the topology
infrastructure added for 2.6.31 for userspace tools to
obtain via sysfs.
Add .io_hints to 'struct target_type' to allow the I/O hints portion
(io_min and io_opt) of the 'struct queue_limits' to be set by each
target and implement this for dm-stripe.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Fri, 4 Sep 2009 19:40:24 +0000 (20:40 +0100)]
dm table: add more context to terse warning messages
A couple of recent warning messages make it difficult for the reader to
determine exactly what is wrong. This patch adds more information to
those messages.
The logic to check for valid device areas is inverted relative to proper
use with iterate_devices.
The iterate_devices method calls its callback for every underlying
device in the target. If any callback returns non-zero, iterate_devices
exits immediately. But the callback device_area_is_valid() returns 0 on
error and 1 on success. The overall effect without is that an error is
issued only if every device is invalid.
This patch renames device_area_is_valid to device_area_is_invalid and
inverts the logic so that one invalid device is sufficient to raise
an error.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Fri, 4 Sep 2009 19:40:19 +0000 (20:40 +0100)]
dm snapshot: implement iterate devices
Implement the .iterate_devices for the origin and snapshot targets.
dm-snapshot's lack of .iterate_devices resulted in the inability to
properly establish queue_limits for both targets.
With 4K sector drives: an unfortunate side-effect of not establishing
proper limits in either targets' DM device was that IO to the devices
would fail even though both had been created without error.
Commit af4874e03ed82f050d5872d8c39ce64bf16b5c38 ("dm target:s introduce
iterate devices fn") in 2.6.31-rc1 should have implemented .iterate_devices
for dm-snap.c's origin and snapshot targets.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If map_request() calls dm_kill_unmapped_request() to complete a cloned
bio without dispatching it, clone->bio is still set when
dm_end_request() is called and the BUG_ON(clone->bio) is incorrect.
The patch fixes this bug by freeing bio in dm_end_request() if the clone
has bio. I've redone my tests to cover all I/O paths and confirmed
there's no other regression.
Here is the oops I hit in request-based dm when I do I/O to a multipath
device which doesn't have any active path nor queue_if_no_path setting:
Ian Kent [Tue, 1 Sep 2009 03:26:22 +0000 (11:26 +0800)]
autofs4 - fix missed case when changing to use struct path
In the recent change by Al Viro that changes verious subsystems
to use "struct path" one case was missed in the autofs4 module
which causes mounts to no longer expire.
Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lmb: Also remove __init from lmb_end_of_RAM() declaration in lmb.h
My previous patch (commit 4f8ee2c9cc: "lmb: Remove __init from
lmb_end_of_DRAM()") removed __init in lmb.c but missed the fact that it
was also marked as such in the .h
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch turns on parallel scanning for the ata_piix driver.
This driver is used on most netbooks (no AHCI for cheap storage it seems).
The scan is the dominating time factor in the kernel boot for these
devices; with this flag it gets cut in half for the device I used
for testing (eeepc).
Alan took a look at the driver source and concluded that it ought to be safe
to do for this driver. Alan has also checked with the hardware team.
and it is all true but once we put all things together additional
constraints for PATA controllers show up (some hardware registers
have per-host not per-port atomicity) and we risk misprogramming
the controller.
I used the following test to check whether the issue is real:
/* Ensure the UDMA bit is off - it will be turned back on if
UDMA is selected */
and it indeed triggered the error message.
Lets fix all such races by adding an extra locking to ->set_piomode
and ->set_dmamode methods for PATA controllers.
[ Alan: would be better to take the host lock in libata-core for these
cases so that we fix all the adapters in one swoop. "Looks fine as a
temproary quickfix tho" ]
Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Alan Cox <alan@linux.intel.com> Cc: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Improve CRTDDC mapping by using VBT info
drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.
drm/i915: Set crtc/clone mask in different output devices
drm/i915: Always use SDVO_B detect bit for SDVO output detection.
drm/i915: Fix typo that broke SVID1 in intel_sdvo_multifunc_encoder()
drm/i915: Check if BIOS enabled dual-channel LVDS on 8xx, not only on 9xx
drm/i915: Set the multiplier for SDVO on G33 platform
Takashi Iwai [Mon, 31 Aug 2009 06:15:26 +0000 (08:15 +0200)]
ALSA: hda - Fix MacBookPro 3,1/4,1 quirk with ALC889A
This patch fixes the wrong headphone output routing for MacBookPro 3,1/4,1
quirk with ALC889A codec, which caused the silent headphone output.
Also, this gives the individual Headphone and Speaker volume controls.
Takashi Iwai [Mon, 31 Aug 2009 06:12:29 +0000 (08:12 +0200)]
ALSA: hda - Add missing mux check for VT1708
In patch_vt1708(), the check of MUX nids is missing and this results in
the -EINVAL error in accessing Input Source mixer element. Simpliy
adding the call of get_mux_nids() fixes the problem.
Joe Perches [Sun, 16 Aug 2009 23:03:51 +0000 (20:03 -0300)]
V4L/DVB (12564a): MAINTAINERS: Update gspca sn9c20x name style
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
V4L/DVB (12449): adds webcam for Micron device MT9M111 0x143A to em28xx
[mchehab@redhat.com: fix merge conflict and a few CodingStyle issues] Signed-off-by: Steve Gotthardt <gotthardt@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
drm/i915: Improve CRTDDC mapping by using VBT info
Use VBT information to determine which DDC bus to use for CRTDCC.
Fall back to GPIOA if VBT info is not available.
Signed-off-by: David Müller <d.mueller@elsoft.ch> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net>
Tested on: 855 (David), and 945GM, 965GM, GM45, and G45 (anholt)
Eric Anholt [Sat, 29 Aug 2009 19:49:51 +0000 (12:49 -0700)]
drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.
The lack of a proper LRU was partially worked around by taking the fence
from the object containing the oldest seqno. But if there are multiple
objects inactive, then they don't have seqnos and the first fence reg
among them would be chosen. If you were trying to copy data between two
mappings, this could result in each page fault stealing the fence from
the other argument, and your application hanging.
Cc: Stable Team <stable@kernel.org> Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Linus Torvalds [Sat, 29 Aug 2009 05:41:05 +0000 (19:41 -1000)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: don't free non-existent backlight in acpi video module
toshiba_acpi: return on a fail path
ACPICA: Windows compatibility fix: same buffer/string store
Linus Torvalds [Sat, 29 Aug 2009 05:39:44 +0000 (19:39 -1000)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: update the group mask on mark addition
inotify: fix length reporting and size checking
inotify: do not send a block of zeros when no pathname is available
Grant Grundler [Fri, 28 Aug 2009 19:00:36 +0000 (15:00 -0400)]
parisc: fix warning in traps.c
On Tue, Aug 18, 2009 at 01:45:17PM -0400, John David Anglin wrote:
> CC arch/parisc/kernel/traps.o
> arch/parisc/kernel/traps.c: In function 'handle_interruption':
> arch/parisc/kernel/traps.c:535:18: warning: operation on 'regs->iasq[0]'
> may be undefined
Yes - Line 535 should use both [0] and [1].
Reported-by: John David Anglin <dave@hiauly1.hia.nrc.ca> Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Trond Myklebust [Fri, 28 Aug 2009 15:12:12 +0000 (11:12 -0400)]
SUNRPC: Fix rpc_task_force_reencode
This patch fixes the bug that was reported in
http://bugzilla.kernel.org/show_bug.cgi?id=14053
If we're in the case where we need to force a reencode and then resend of
the RPC request, due to xprt_transmit failing with a networking error, then
we _must_ retransmit the entire request.
Ingo Molnar [Fri, 28 Aug 2009 08:44:56 +0000 (10:44 +0200)]
modules: Fix build error in the !CONFIG_KALLSYMS case
> James Bottomley (1):
> module: workaround duplicate section names
-tip testing found that this patch breaks the build on x86 if
CONFIG_KALLSYMS is disabled:
kernel/module.c: In function ‘load_module’:
kernel/module.c:2367: error: ‘struct module’ has no member named ‘sect_attrs’
distcc[8269] ERROR: compile kernel/module.c on ph/32 failed
make[1]: *** [kernel/module.o] Error 1
make: *** [kernel] Error 2
make: *** Waiting for unfinished jobs....
Commit 1b364bf misses the fact that section attributes are only
built and dealt with if kallsyms is enabled. The patch below fixes
this.
( note, technically speaking this should depend on CONFIG_SYSFS as
well but this patch is correct too and keeps the #ifdef less
intrusive - in the KALLSYMS && !SYSFS case the code is a NOP. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Replaced patch with a slightly cleaner variation by James Bottomley ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keith Packard [Thu, 6 Aug 2009 22:57:54 +0000 (15:57 -0700)]
ACPI: don't free non-existent backlight in acpi video module
acpi_video_put_one_device was attempting to remove sysfs entries and
unregister a backlight device without first checking that said backlight
device structure had been created.
Signed-off-by: Keith Packard <keithp@keithp.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Jiri Slaby [Thu, 6 Aug 2009 22:57:51 +0000 (15:57 -0700)]
toshiba_acpi: return on a fail path
Return from bt_rfkill_poll() when hci_get_radio_state() fails.
value is invalid in that case and should not be assigned to the rfkill
state.
This also fixes a double unlock bug.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Lin Ming [Wed, 26 Aug 2009 01:01:34 +0000 (09:01 +0800)]
ACPICA: Windows compatibility fix: same buffer/string store
Fix a compatibility issue when the same buffer or string is
stored to itself. This has been seen in the field. Previously,
ACPICA would zero out the buffer/string. Now, the operation is
treated as a NOP.
http://bugzilla.acpica.org/show_bug.cgi?id=803
Reported-by: Rezwanul Kabir <Rezwanul_Kabir@Dell.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Eric Paris [Fri, 28 Aug 2009 16:50:47 +0000 (12:50 -0400)]
inotify: update the group mask on mark addition
Seperating the addition and update of marks in inotify resulted in a
regression in that inotify never gets events. The inotify group mask is
always 0. This mask should be updated any time a new mark is added.
Eric Paris [Fri, 28 Aug 2009 15:57:55 +0000 (11:57 -0400)]
inotify: fix length reporting and size checking
0db501bd0610ee0c0 introduced a regresion in that it now sends a nul
terminator but the length accounting when checking for space or
reporting to userspace did not take this into account. This corrects
all of the rounding logic.
Brian Rogers [Fri, 28 Aug 2009 14:00:05 +0000 (10:00 -0400)]
inotify: do not send a block of zeros when no pathname is available
When an event has no pathname, there's no need to pad it with a null byte and
therefore generate an inotify_event sized block of zeros. This fixes a
regression introduced by commit 0db501bd0610ee0c0aca84d927f90bcccd09e2bd where
my system wouldn't finish booting because some process was being confused by
this.
Signed-off-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Eric Paris <eparis@redhat.com>
James Bottomley [Wed, 26 Aug 2009 12:34:12 +0000 (22:04 +0930)]
module: workaround duplicate section names
The root cause is a duplicate section name (.text); is this legal?
[ Amerigo Wang: "AFAIK, yes." ]
However, there's a problem with commit 6d76013381ed28979cd122eb4b249a88b5e384fa in that if you fail to allocate
a mod->sect_attrs (in this case it's null because of the duplication),
it still gets used without checking in add_notes_attrs()
This should fix it
[ This patch leaves other problems, particularly the sections directory,
but recent parisc toolchains seem to produce these modules and this
prevents a crash and is a minimal change -- RR ]
Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As soon as the framebuffer is registered, our methods may be called by the
kernel. This leads to a crash as xenfb_refresh() gets called before we have
the irq.
Connect to the backend before registering our framebuffer with the kernel.
Linus Torvalds [Thu, 27 Aug 2009 19:26:02 +0000 (12:26 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: Ensure we alwasy write the terminating NULL.
inotify: fix locking around inotify watching in the idr
inotify: do not BUG on idr entries at inotify destruction
inotify: seperate new watch creation updating existing watches
We call lmb_end_of_DRAM() to test whether a DMA mask is ok on a machine
without IOMMU, but this function is marked as __init.
I don't think there's a clean way to get the top of RAM max_pfn doesn't
appear to include highmem or I missed (or we have a bug :-) so for now,
let's just avoid having a broken 2.6.31 by making this function
non-__init and we can revisit later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 27 Aug 2009 19:24:08 +0000 (12:24 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: update documentation pointers
9p: remove unnecessary v9fses->options which duplicates the mount string
net/9p: insulate the client against an invalid error code sent by a 9p server
9p: Add missing cast for the error return value in v9fs_get_inode
9p: Remove redundant inode uid/gid assignment
9p: Fix possible regressions when ->get_sb fails.
9p: Fix v9fs show_options
9p: Fix possible memleak in v9fs_inode_from fid.
9p: minor comment fixes
9p: Fix possible inode leak in v9fs_get_inode.
9p: Check for error in return value of v9fs_fid_add
David Howells [Thu, 27 Aug 2009 12:09:06 +0000 (13:09 +0100)]
AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr
kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.
Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
inotify: Ensure we alwasy write the terminating NULL.
Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename. Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size. Ouch!
So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event. I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: fix locking around inotify watching in the idr
The are races around the idr storage of inotify watches. It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal. Move the
locking and the refcnt'ing so that these have to happen atomically.
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: do not BUG on idr entries at inotify destruction
If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG. This is not a dangerous situation and really
indicates a programming bug and leak of memory. This patch changes it to
use a WARN and a printk rather than killing people's boxes.
Eric Paris [Mon, 24 Aug 2009 20:03:35 +0000 (16:03 -0400)]
inotify: seperate new watch creation updating existing watches
There is nothing known wrong with the inotify watch addition/modification
but this patch seperates the two code paths to make them each easy to
verify as correct.
Linus Torvalds [Thu, 27 Aug 2009 03:16:38 +0000 (20:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k,m68knommu: Wire up rt_tgsigqueueinfo and perf_counter_open
m68k: Fix redefinition of pgprot_noncached
arch/m68k/include/asm/motorola_pgalloc.h: fix kunmap arg
m68k: cnt reaches -1, not 0
m68k: count can reach 51, not 50
leds: after setting inverted attribute, we must update the LED
If we change the inverted attribute to another value, the LED will not be
inverted until we change the GPIO state.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Cc: Samuel R. C. Vale <srcvale@holoscopio.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
leds: fix multiple requests and releases of IRQ for GPIO LED Trigger
When setting the same GPIO number, multiple IRQ shared requests will be
done without freing the previous request. It will also try to free a
failed request or an already freed IRQ if 0 was written to the gpio file.
All these oops and leaks were fixed with the following solution: keep the
previous allocated GPIO (if any) still allocated in case the new request
fails. The alternative solution would desallocate the previous allocated
GPIO and set gpio as 0.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Samuel R. C. Vale <srcvale@holoscopio.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This failure is very common on many platforms. Handling it in the ACPI
processor driver is enough, and we don't need a warning message unless
CONFIG_ACPI_DEBUG is set.
Signed-off-by: Frans Pop <elendil@planet.nl> Acked-by: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frans Pop [Wed, 26 Aug 2009 21:29:29 +0000 (14:29 -0700)]
ACPI processor: force throttling state when BIOS returns incorrect value
If the BIOS reports an invalid throttling state (which seems to be
fairly common after system boot), a reset is done to state T0.
Because of a check in acpi_processor_get_throttling_ptc(), the reset
never actually gets executed, which results in the error reoccurring
on every access of for example /proc/acpi/processor/CPU0/throttling.
Add a 'force' option to acpi_processor_set_throttling() to ensure
the reset really takes effect.
This patch, together with the next one, fixes a regression introduced in
2.6.30, listed on the regression list. They have been available for 2.5
months now in bugzilla, but have not been picked up, despite various
reminders and without any reason given.
Google shows that numerous people are hitting this issue. The issue is in
itself relatively minor, but the bug in the code is clear.
The patches have been in all my kernels and today testing has shown that
throttling works correctly with the patches applied when the system
overheats (http://bugzilla.kernel.org/show_bug.cgi?id=13918#c14).
Signed-off-by: Frans Pop <elendil@planet.nl> Acked-by: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
wmi: fix kernel panic when stack protection enabled.
Summary:
Kernel panic arise when stack protection is enabled, since strncat will
add a null terminating byte '\0'; So in functions
like this one (wmi_query_block):
char wc[4]="WC";
....
strncat(method, block->object_id, 2);
...
the length of wc should be n+1 (wc[5]) or stack protection
fault will arise. This is not noticeable when stack protection is
disabled,but , isn't good either.
Config used: [CONFIG_CC_STACKPROTECTOR_ALL=y,
CONFIG_CC_STACKPROTECTOR=y]
Michael Brunner [Wed, 26 Aug 2009 21:29:25 +0000 (14:29 -0700)]
thermal_sys: check get_temp return value
The return value of the get_temp function is not checked when doing a
thermal zone update. This may lead to a critical shutdown if get_temp
fails and the content of the temp variable is incorrectly set higher than
the critical trip point.
This has been observed on a system with incorrect ACPI implementation
where the corresponding methods were not serialized and therefore
sometimes triggered ACPI errors (AE_ALREADY_EXISTS). The following
critical shutdowns indicated a temperature of 2097 C, which was obviously
wrong.
The patch adds a return value check that jumps over all trip point
evaluations printing a warning if get_temp fails. The trip points are
evaluated again on the next polling interval with successful get_temp
execution.
Signed-off-by: Michael Brunner <mibru@gmx.de> Acked-by: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do {
ret = waitpid(pid, &status, 0);
} while (ret == -1 && errno == EINTR);
}
return 0;
}
quickly creates an unkillable task.
If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()->atomic(signal->count) breaks the signal->notify_count
logic, and the execing thread can hang forever in kernel space.
Change copy_process() to increment count/live only when we know for sure
we can't fail. In this case the forked thread will take care of its
reference to signal correctly.
If copy_process() fails, check CLONE_THREAD flag. If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group. If it is not set, ->signal must be released in any case
(and ->count must be == 1), the forked child is the only thread in the
thread group.
We need more cleanups here, in particular signal->count should not be used
by de_thread/__exit_signal at all. This patch only fixes the bug.
Minchan Kim [Wed, 26 Aug 2009 21:29:23 +0000 (14:29 -0700)]
mm: fix for infinite churning of mlocked pages
An mlocked page might lose the isolatation race. This causes the page to
clear PG_mlocked while it remains in a VM_LOCKED vma. This means it can
be put onto the [in]active list. We can rescue it by using try_to_unmap()
in shrink_page_list().
But now, As Wu Fengguang pointed out, vmscan has a bug. If the page has
PG_referenced, it can't reach try_to_unmap() in shrink_page_list() but is
put into the active list. If the page is referenced repeatedly, it can
remain on the [in]active list without being moving to the unevictable
list.
This patch fixes it.
Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <<kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 26 Aug 2009 21:29:22 +0000 (14:29 -0700)]
flex_array: convert element_nr formals to unsigned
It's problematic to allow signed element_nr's or total's to be passed as
part of the flex array API.
flex_array_alloc() allows total_nr_elements to be set to a negative
quantity, which is obviously erroneous.
flex_array_get() and flex_array_put() allows negative array indices in
dereferencing an array part, which could address memory mapped before
struct flex_array.
The fix is to convert all existing element_nr formals to be qualified as
unsigned. Existing checks to compare it to total_nr_elements or the max
array size based on element_size need not be changed.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 26 Aug 2009 21:29:21 +0000 (14:29 -0700)]
flex_array: declare parts member to have incomplete type
The `parts' member of struct flex_array should evaluate to an incomplete
type so that sizeof() cannot be used and C99 does not require the
zero-length specification.
Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 26 Aug 2009 21:29:20 +0000 (14:29 -0700)]
flex_array: fix get function for elements in base starting at non-zero
If all array elements fit into the base structure and data is copied using
flex_array_put() starting at a non-zero index, flex_array_get() will fail
to return the data.
This fixes the bug by only checking for NULL parts when all elements do
not fit in the base structure when flex_array_get() is used. Otherwise,
fa_element_to_part_nr() will always be 0 since there are no parts
structures needed and such element may never have been put. Thus, it will
remain NULL due to the kzalloc() of the base.
Additionally, flex_array_put() now only checks for a NULL part when all
elements do not fit in the base structure. This is otherwise unnecessary
since the base structure is guaranteed to exist (or we would have already
hit a NULL pointer).
Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Wed, 26 Aug 2009 18:56:48 +0000 (14:56 -0400)]
IMA: iint put in ima_counts_get and put
ima_counts_get() calls ima_iint_find_insert_get() which takes a reference
to the iint in question, but does not put that reference at the end of the
function. This can lead to a nasty memory leak. Easy enough to reproduce:
#include <sys/mman.h>
#include <stdio.h>
int main (void)
{
int i;
void *ptr;
for (i=0; i < 100000; i++) {
ptr = mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED)
return 2;
munmap(ptr, 4096);
}
return 0;
}
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
arch/m68k/include/asm/pgtable_mm.h:148:1: warning: "pgprot_noncached" redefined
In file included from arch/m68k/include/asm/pgtable_mm.h:138,
from arch/m68k/include/asm/pgtable.h:4,
from include/linux/mm.h:40,
from include/linux/pagemap.h:7,
from include/linux/blkdev.h:12,
from arch/m68k/emu/nfblock.c:17:
include/asm-generic/pgtable.h:133:1: warning: this is the location of the previous definition
pgprot_noncached() should be defined _before_ including asm-generic/pgtable.h
arch/m68k/include/asm/motorola_pgalloc.h: In function 'pte_alloc_one':
arch/m68k/include/asm/motorola_pgalloc.h:44: warning: passing argument 1 of 'kunmap' from incompatible pointer type
Also, remove unneeded test for kmap() failure.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Rusty Russell [Wed, 26 Aug 2009 19:22:32 +0000 (12:22 -0700)]
virtio: net refill on out-of-memory
If we run out of memory, use keventd to fill the buffer. There's a
report of this happening: "Page allocation failures in guest",
Message-ID: <20090713115158.0a4892b0@mjolnir.ossman.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that on vSMP systems the CPUID derived
initial-APICIDs are overlapping - so we need to fall
back on hard_smp_processor_id() which reads the local
APIC.
Both come from the hardware (influenced by firmware
though) so it's a tough call which one to trust.
Doing the quirk expresses the vSMP property properly
and also does not affect other systems, so we go for
this solution instead of a revert.
Xen always runs on CPUs which properly support WP enforcement in
privileged mode, so there's no need to test for it.
This also works around a crash reported by Arnd Hannemann, though I
think its just a band-aid for that case.
Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
David S. Miller [Tue, 25 Aug 2009 23:47:46 +0000 (16:47 -0700)]
sparc64: Validate linear D-TLB misses.
When page alloc debugging is not enabled, we essentially accept any
virtual address for linear kernel TLB misses. But with kgdb, kernel
address probing, and other facilities we can try to access arbitrary
crap.
So, make sure the address we miss on will translate to physical memory
that actually exists.
In order to make this work we have to embed the valid address bitmap
into the kernel image. And in order to make that less expensive we
make an adjustment, in that the max physical memory address is
decreased to "1 << 41", even on the chips that support a 42-bit
physical address space. We can do this because bit 41 indicates
"I/O space" and thus covers non-memory ranges.
The result of this is that:
1) kpte_linear_bitmap shrinks from 2K to 1K in size
2) we need 64K more for the valid address bitmap
We can't let the valid address bitmap be dynamically allocated
once we start using it to validate TLB misses, otherwise we have
crazy issues to deal with wrt. recursive TLB misses and such.
If we're in a TLB miss it could be the deepest trap level that's legal
inside of the cpu. So if we TLB miss referencing the bitmap, the cpu
will be out of trap levels and enter RED state.
To guard against out-of-range accesses to the bitmap, we have to check
to make sure no bits in the physical address above bit 40 are set. We
could export and use last_valid_pfn for this check, but that's just an
unnecessary extra memory reference.
On the plus side of all this, since we load all of these translations
into the special 4MB mapping TSB, and we check the TSB first for TLB
misses, there should be absolutely no real cost for these new checks
in the TLB miss path.
Reported-by: heyongli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 25 Aug 2009 18:24:04 +0000 (11:24 -0700)]
Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clockevent: Prevent dead lock on clockevents_lock
timers: Drop write permission on /proc/timer_list
Linus Torvalds [Tue, 25 Aug 2009 18:23:43 +0000 (11:23 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix too large stack usage in do_one_initcall()
tracing: handle broken names in ftrace filter
ftrace: Unify effect of writing to trace_options and option/*
Linus Torvalds [Tue, 25 Aug 2009 18:23:25 +0000 (11:23 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix build with older binutils and consolidate linker script
x86: Fix an incorrect argument of reserve_bootmem()
x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile
xen: rearrange things to fix stackprotector
x86: make sure load_percpu_segment has no stackprotector
i386: Fix section mismatches for init code with !HOTPLUG_CPU
x86, pat: Allow ISA memory range uncacheable mapping requests
Linus Torvalds [Tue, 25 Aug 2009 16:47:36 +0000 (09:47 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Improve error message that changing journaling mode on remount is not possible
ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED
Linus Torvalds [Tue, 25 Aug 2009 16:12:43 +0000 (09:12 -0700)]
tty: make sure to flush any pending work when halting the ldisc
When I rewrote tty ldisc code to use proper reference counts (commits 65b770468e98 and cbe9352fa08f) in order to avoid a race with hangup, the
test-program that Eric Biederman used to trigger the original problem
seems to have exposed another long-standing bug: the hangup code did the
'tty_ldisc_halt()' to stop any buffer flushing activity, but unlike the
other call sites it never actually flushed any pending work.
As a result, if you get just the right timing, the pending work may be
just about to execute (ie the timer has already triggered and thus
cancel_delayed_work() was a no-op), when we then re-initialize the ldisc
from under it.
That, in turn, results in various random problems, usually seen as a
NULL pointer dereference in run_timer_softirq() or a BUG() in
worker_thread (but it can be almost anything).
Fix it by adding the required 'flush_scheduled_work()' after doing the
tty_ldisc_halt() (this also requires us to move the ldisc halt to before
taking the ldisc mutex in order to avoid a deadlock with the workqueue
executing do_tty_hangup, which requires the mutex).
The locking should be cleaned up one day (the requirement to do this
outside the ldisc_mutex is very annoying, and weakens the lock), but
that's a larger and separate undertaking.
Reported-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Xiaotian Feng <xtfeng@gmail.com> Tested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Tested-by: Dave Young <hidave.darkstar@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>