There is a race between dev_create() and find_device().
If the mdptr has not yet been stored against a device, find_device() needs to
behave as though no device was found. It already returns NULL, but there is a
dm_put() missing: it must drop the reference dm_get_md() took.
The bug was introduced by dm-fix-mapped-device-ref-counting.patch.
It manifests itself if another dm ioctl attempts to reference a newly-created
device while the device creation ioctl is still running. The consequence is
that the device cannot be removed until the machine is rebooted. Certain udev
configurations can lead to this happening.
Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <dm-devel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Vivek Goyal [Thu, 9 Nov 2006 01:44:41 +0000 (17:44 -0800)]
[PATCH] i386: Force data segment to be 4K aligned
o Currently there is no specific alignment restriction in linker script
and in some cases it can be placed non 4K aligned addresses. This fails
kexec which checks that segment to be loaded is page aligned.
o I guess, it does not harm data segment to be 4K aligned.
J. Bruce Fields [Thu, 9 Nov 2006 01:44:40 +0000 (17:44 -0800)]
[PATCH] nfsd4: fix open-create permissions
In the case where an open creates the file, we shouldn't be rechecking
permissions to open the file; the open succeeds regardless of what the new
file's mode bits say.
This patch fixes the problem, but only by introducing yet another parameter
to nfsd_create_v3. This is ugly. This will be fixed by later patches.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
J. Bruce Fields [Thu, 9 Nov 2006 01:44:39 +0000 (17:44 -0800)]
[PATCH] nfsd4: reindent do_open_lookup()
Minor rearrangement, cleanup of do_open_lookup(). No change in behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Neil Brown <neilb@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Thu, 9 Nov 2006 01:44:38 +0000 (17:44 -0800)]
[PATCH] A minor fix for set_mb() in Documentation/memory-barriers.txt
set_mb() is used by set_current_state() which needs mb(), not wmb(). I
think it would be right to assume that set_mb() implies mb(), all arches
seem to do just this.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the microcode driver is built in (rather than module) there are some,
ehm, interesting effects happening due to the new "call out to userspace"
behavior that is introduced.. and which runs too early. The result is a
boot hang; which is really nasty.
The patch below is a minimally safe patch to fix this regression for 2.6.19
by just not requesting actual microcode updates during early boot. (That
is a good idea in general anyway)
The "real" fix is a lot more complex given the entire cpu hotplug scenario
(during cpu hotplug you normally need to load the microcode as well); but
the interactions for that are just really messy at this point; this fix at
least makes it work and avoids a full detangle of hotplug.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since the "mask" bit is in the low word, when we write a new entry, we
need to write the high word first, before we potentially unmask it.
The exception is when we actually want to mask the interrupt, in which
case we want to write the low word first to make sure that the high word
doesn't change while the interrupt routing is still active.
Linus Torvalds [Wed, 8 Nov 2006 18:23:03 +0000 (10:23 -0800)]
x86-64: clean up io-apic accesses
This is just commit 130fe05dbc0114609cfef9815c0c5580b42decfa ported to
x86-64, for all the same reasons. It cleans up the IO-APIC accesses in
order to then fix the ordering issues.
We move the accessor functions (that were only used by io_apic.c) out of
a header file, and use proper memory-mapped accesses rather than making
up our own "volatile" pointers.
Linus Torvalds [Wed, 8 Nov 2006 18:09:28 +0000 (10:09 -0800)]
Revert "[PATCH] i386: Add MMCFG resources to i386 too"
This reverts commit de09bddb9d6f96785be470c832b881e6d72d589f. It tried
to reserve the MMCONFIG mmio memory ranges, but since the MMCONFIG
information is broken and often bogus (which is why we don't dare use it
most of the time _anyway_), it does more harm than good.
Cc: Jeff Chua <jeff.chua.linux@gmail.com> Cc: Adrian Bunk <bunk@stusta.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
John Heffner [Tue, 7 Nov 2006 07:10:51 +0000 (23:10 -0800)]
[TCP]: Don't use highmem in tcp hash size calculation.
This patch removes consideration of high memory when determining TCP
hash table sizes. Taking into account high memory results in tcp_mem
values that are too large.
Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby [Mon, 6 Nov 2006 22:34:48 +0000 (14:34 -0800)]
[NET]: kconfig, correct traffic shaper
As Patrick McHardy <kaber@trash.net> suggested, Traffic Shaper is now
obsolete and alternative to it is no longer CBQ, since its problems with
virtual devices, alter Kconfig text to reflect this -- put a link to the
traffic schedulers as a whole.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ray Lehtiniemi [Tue, 7 Nov 2006 02:19:15 +0000 (03:19 +0100)]
[ARM] 3927/1: Allow show_mem() to work with holes in memory map.
show_mem() was not correctly handling holes in the memory
map. It was treating the freed sections of the map as
though they contained valid struct page entries. This
could cause incorrect debugging output or even a kernel
panic.
This patch keeps the struct meminfo around after system
initialization so that show_mem() can use it when
scanning memory. show_mem() now walks over each bank
of each online node, rather than assuming that each node
contains a single contiguous bank.
Signed-off-by: Ray Lehtiniemi <rayl@mail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
David Brownell [Mon, 6 Nov 2006 18:29:16 +0000 (19:29 +0100)]
[ARM] 3926/1: make timer led handle HZ != 100
The timer LED is unusable at HZ=large, since it's got
a hard-wired value of 100 ticks per cycle; when HZ=1024
(for example) it's essentially always-on. This patch
just makes that be HZ ticks per cycle.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Auke Kok [Mon, 6 Nov 2006 16:57:12 +0000 (08:57 -0800)]
[PATCH] e1000: Fix regression: garbled stats and irq allocation during swsusp
e1000: Fix suspend/resume powerup and irq allocation
From: Auke Kok <auke-jan.h.kok@intel.com>
After 7.0.33/2.6.16, e1000 suspend/resume left the user with an enabled
device showing garbled statistics and undetermined irq allocation state,
where `ifconfig eth0 down` would display `trying to free already freed irq`.
Explicitly free and allocate irq as well as powerup the PHY during resume
fixes when needed.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Johannes Berg [Mon, 6 Nov 2006 22:17:20 +0000 (23:17 +0100)]
[PATCH] b44: change comment about irq mask register
Through some experimentation with the similarly built bcm43xx I came to
the conclusion that if the hw/firmware sets a bit in the interrupt
register, an interrupt will only be raised if that bit is included in
the interrupt mask. Hence, the interrupt mask is more like an interrupt
control mask.
This patch changes the comment to reflect that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Linus Torvalds [Tue, 7 Nov 2006 03:53:12 +0000 (19:53 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Fix EV64120 and Ocelot builds by providing a plat_timer_setup().
[MIPS] EV64120: Fix PCI interrupt allocation.
[MIPS] Make irq number allocator generally available for fixing EV64120.
[MIPS] EV64120: Fix timer initialization for HZ != 100.
[MIPS] Ocelot 3: Fix MAC address detection after platform_device conversion.
[MIPS] Ocelot C: Fix MAC address detection after platform_device conversion.
[MIPS] SB1: On bootup only flush cache on local CPU.
[MIPS] Ocelot 3: Fix large number of warnings.
[MIPS] Ocelot C: Fix mapping of ioport address range.
[MIPS] Ocelot C: Fix warning about missmatching format string.
[MIPS] Ocelot C: fix eth registration after conversion to platform_device
[MIPS] Ocelot C: Fix large number of warnings.
Linus Torvalds [Tue, 7 Nov 2006 03:52:29 +0000 (19:52 -0800)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (4751): Fix DBV_FE_CUSTOMISE for card drivers compiled into kernel
V4L/DVB (4784): [saa7146_i2c] short_delay mode fixed for fast machines
V4L/DVB (4770): Fix mode switch of Compro Videomate T300
V4L/DVB (4787): Budget-ci: Inversion setting fixed for Technotrend 1500 T
V4L/DVB (4786): Pvrusb2: use NULL instead of 0
V4L/DVB (4785): Budget-ci: Change DEBIADDR_IR to a safer default
V4L/DVB (4752): DVB: Add DVB_FE_CUSTOMISE support for MT2060
Ralf Baechle [Fri, 3 Nov 2006 17:48:17 +0000 (17:48 +0000)]
[MIPS] Ocelot C: Fix warning about missmatching format string.
CC arch/mips/momentum/ocelot_c/setup.o
arch/mips/momentum/ocelot_c/setup.c: In function 'momenco_time_init':
arch/mips/momentum/ocelot_c/setup.c:223: warning: format '%d' expects type 'int', but argument 2 has type 'long unsigned int'
Change data type to match format string; a 32-bit type better suits our
needs.
Linus Torvalds [Mon, 6 Nov 2006 17:07:19 +0000 (09:07 -0800)]
Merge branch 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32
* 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32:
AVR32: Add missing return instruction in __raw_writesb
AVR32: Wire up sys_epoll_pwait
AVR32: Fix thinko in generic_find_next_zero_le_bit()
AVR32: Get rid of board_early_init
[DLM] fix oops in kref_put when removing a lockspace
Now that the lockspace struct is freed when the last sysfs object is released
this patch prevents use of that lockspace by sysfs. We attempt to re-get the
lockspace from the lockspace list and fail the request if it has been removed.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes the recounting on the lockspace kobject. Previously the lockspace was freed while userspace could have had a
reference to one of its sysfs files, causing an oops in kref_put.
Now the lockspace kfree is moved into the kobject release() function
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds a sync_fs superblock operation for GFS2 and removes
the journal flush from write_super in favour of sync_fs where it
ought to be. This is more or less identical to the way in which ext3
does this.
This bug was pointed out by Russell Cattelan <cattelan@redhat.com>
Cc: Russell Cattelan <cattelan@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Heiko Carstens [Mon, 6 Nov 2006 09:49:02 +0000 (10:49 +0100)]
[S390] IRQs too early enabled.
setup_lowcore() calls ctl_set_bit() which returns withs interrupts
enabled. The setup arch code is not supposed to enable interrupts that
early. Therefore use the __ctl_set_bit() variant.
This fixes the not working lock dependency validator on non 64 bit
systems.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Mon, 6 Nov 2006 09:49:00 +0000 (10:49 +0100)]
[S390] revert add_active_range() usage patch.
Commit 7676bef9c183fd573822cac9992927ef596d584c breaks DCSS support on
s390. DCSS needs initialized struct pages to work. With the usage of
add_active_range() only the struct pages for physically present pages
are initialized.
This could be fixed if the DCSS driver would initiliaze the struct pages
itself, but this doesn't work too. This is because the mem_map array
does not include holes after the last present memory area and therefore
there is nothing that could be initialized.
To fix this and to avoid some dirty hacks revert this patch for now.
Will be added later when we move to a virtual mem_map.
Cc: Carsten Otte <cotte@de.ibm.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Akinobu Mita [Mon, 6 Nov 2006 07:52:13 +0000 (23:52 -0800)]
[PATCH] sunrpc: add missing spin_unlock
auth_domain_put() forgot to unlock acquired spinlock.
Cc: Olaf Kirch <okir@monad.swb.de> Cc: Andy Adamson <andros@citi.umich.edu> Cc: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Neil Brown <neilb@suse.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch takes the CTL_UNNUMBERD concept from NFS and makes it available to
all new sysctl users.
At the same time the sysctl binary interface maintenance documentation is
updated to mention and to describe what is needed to successfully maintain the
sysctl binary interface.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] sysctl: allow a zero ctl_name in the middle of a sysctl table
Since it is becoming clear that there are just enough users of the binary
sysctl interface that completely removing the binary interface from the kernel
will not be an option for foreseeable future, we need to find a way to address
the sysctl maintenance issues.
The basic problem is that sysctl requires one central authority to allocate
sysctl numbers, or else conflicts and ABI breakage occur. The proc interface
to sysctl does not have that problem, as names are not densely allocated.
By not terminating a sysctl table until I have neither a ctl_name nor a
procname, it becomes simple to add sysctl entries that don't show up in the
binary sysctl interface. Which allows people to avoid allocating a binary
sysctl value when not needed.
I have audited the kernel code and in my reading I have not found a single
sysctl table that wasn't terminated by a completely zero filled entry. So
this change in behavior should not affect anything.
I think this mechanism eases the pain enough that combined with a little
disciple we can solve the reoccurring sysctl ABI breakage.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't warn about libpthread's access to kernel.version. When it receives
-ENOSYS it will read /proc/sys/kernel/version.
If anything else shows up print the sysctl number string.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cal Peake <cp@absolutedigital.net> Cc: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Peter Zijlstra [Mon, 6 Nov 2006 07:52:10 +0000 (23:52 -0800)]
[PATCH] lockdep: fix delayacct locking bug
Make the delayacct lock irqsave; this avoids the possible deadlock where
an interrupt is taken while holding the delayacct lock which needs to
take the delayacct lock.
Ankita Garg [Mon, 6 Nov 2006 07:52:07 +0000 (23:52 -0800)]
[PATCH] Fix for LKDTM MEM_SWAPOUT crashpoint
The MEM_SWAPOUT crashpoint in LKDTM could be broken as some compilers
inline the call to shrink_page_list() and symbol lookup for this function
name fails. Replacing it with the function shrink_inactive_list(), which
is the only function calling shrink_page_list().
[PATCH] Fix the spurious unlock_cpu_hotplug false warnings
Cpu-hotplug locking has a minor race case caused because of setting the
variable "recursive" to NULL *after* releasing the cpu_bitmask_lock in the
function unlock_cpu_hotplug,instead of doing so before releasing the
cpu_bitmask_lock.
This was the cause of most of the recent false spurious lock_cpu_unlock
warnings.
This should fix the problem reported by Martin Lorenz reported in
http://lkml.org/lkml/2006/10/29/127.
Thanks to Srinivasa DS for pointing it out.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AVR32: Fix thinko in generic_find_next_zero_le_bit()
The existing implementation of this function seems to be looking for
a one although it should be looking for a zero. This causes trouble
for the ext2 filesystem, which tends to report -ENOSPC without this
patch.
Fix this by complementing each word before scanning.
board_early_init() is left over from some early prototyping work
where we had to initialize the SDRAM controller ourselves. This
depends on the kernel being loaded into static RAM, which just
isn't possible on any commercially available products today.
In order to run without a boot loader, we need to create a zImage
stub or have the debugger initialize the SDRAM for us (for really
low-level debugging)
David S. Miller [Mon, 6 Nov 2006 00:51:03 +0000 (16:51 -0800)]
[SPARC]: Fix robust futex syscalls and wire up migrate_pages.
When I added the entries for the robust futex syscall entries, I
forgot to bump NR_SYSCALLS. The current situation is error-prone
because NR_SYSCALLS lives in entry.S where the system call limit
checks are enforced. Move the definition to asm/unistd.h in order to
make this mistake much more difficult to make.
And wire up sys_migrate_pages since the powerpc folks implemented the
compat wrapper for us.
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Mon, 6 Nov 2006 00:44:06 +0000 (16:44 -0800)]
[NETLABEL]: Fix build failure.
> the build with the attached .config failed, make ends with:
> ...
> : undefined reference to `cipso_v4_sock_getattr'
> net/built-in.o: In function `netlbl_socket_getattr':
...
It looks like I was stupid and made NetLabel depend on CONFIG_NET and not
CONFIG_INET, the patch below should fix this by making NetLabel depend on
CONFIG_INET and CONFIG_SECURITY. Please review and apply for 2.6.19.
Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 5 Nov 2006 23:47:04 +0000 (15:47 -0800)]
[IPV6]: Give sit driver an appropriate module alias.
It would be nice to keep things working even with this built as a
module, it took me some time to realize my IPv6 tunnel was broken
because of the missing sit module. This module alias fixes things
until distributions have added an appropriate alias to modprobe.conf.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Mishin [Sat, 4 Nov 2006 00:08:19 +0000 (16:08 -0800)]
[IPV6]: Add ndisc_netdev_notifier unregister.
If inet6_init() fails later than ndisc_init() call, or IPv6 module is
unloaded, ndisc_netdev_notifier call remains in the list and will follows in
oops later.
Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Larry Woodman [Sat, 4 Nov 2006 00:05:45 +0000 (16:05 -0800)]
[NET]: __alloc_pages() failures reported due to fragmentation
We have seen a couple of __alloc_pages() failures due to
fragmentation, there is plenty of free memory but no large order pages
available. I think the problem is in sock_alloc_send_pskb(), the
gfp_mask includes __GFP_REPEAT but its never used/passed to the page
allocator. Shouldnt the gfp_mask be passed to alloc_skb() ?
Signed-off-by: Larry Woodman <lwoodman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 3 Nov 2006 09:01:03 +0000 (01:01 -0800)]
[TG3]: Fix 2nd ifup failure on 5752M.
This fixes a bug reported in:
http://bugzilla.kernel.org/show_bug.cgi?id=7438
tg3_close() turns off the PHY if WoL and ASF are both disabled. On
the next tg3_open(), some devices such as the 5752M will not be
brought up correctly without a PHY reset early in the reset sequence.
The PHY clock is needed for some internal MAC blocks to function
correctly.
This problem is fixed by always resetting the PHY early in
tg3_reset_hw() when it is called from tg3_open() or tg3_resume().
tg3_setup_phy() can then be called later in the sequence without the
reset_phy parameter set to 1, since the PHY reset is already done.
Update version to 3.68.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 3 Nov 2006 08:28:23 +0000 (00:28 -0800)]
[IPX]: Annotate and fix IPX checksum
Calculation of IPX checksum got buggered about 2.4.0. The old variant
mangled the packet; that got fixed, but calculation itself got buggered.
Restored the correct logics, fixed a subtle breakage we used to have even
back then: if the sum is 0 mod 0xffff, we want to return 0, not 0xffff.
The latter has special meaning for IPX (cheksum disabled). Observation
(and obvious fix) nicked from history of FreeBSD ipx_cksum.c...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 4 Nov 2006 21:03:00 +0000 (13:03 -0800)]
Make sure "user->sigpending" count is in sync
The previous commit (45c18b0bb579b5c1b89f8c99f1b6ffa4c586ba08, aka "Fix
unlikely (but possible) race condition on task->user access") fixed a
potential oops due to __sigqueue_alloc() getting its "user" pointer out
of sync with switch_user(), and accessing a user pointer that had been
de-allocated on another CPU.
It still left another (much less serious) problem, where a concurrent
__sigqueue_alloc and swich_user could cause sigqueue_alloc to do signal
pending reference counting for a _different_ user than the one it then
actually ended up using. No oops, but we'd end up with the wrong signal
accounting.
Another case of Oleg's eagle-eyes picking up the problem.
This is trivially fixed by just making sure we load whichever "user"
structure we decide to use (it doesn't matter _which_ one we pick, we
just need to pick one) just once.
Linus Torvalds [Sat, 4 Nov 2006 18:06:02 +0000 (10:06 -0800)]
Fix unlikely (but possible) race condition on task->user access
There's a possible race condition when doing a "switch_uid()" from one
user to another, which could race with another thread doing a signal
allocation and looking at the old thread ->user pointer as it is freed.
This explains an oops reported by Lukasz Trabinski:
http://permalink.gmane.org/gmane.linux.kernel/462241
We fix this by delaying the (reference-counted) freeing of the user
structure until the thread signal handler lock has been released, so
that we know that the signal allocation has either seen the new value or
has properly incremented the reference count of the old one.
Linus Torvalds [Sat, 4 Nov 2006 17:55:00 +0000 (09:55 -0800)]
Revert unintentional "volatile" changes in ipc/msg.c
Commit 5a06a363ef48444186f18095ae1b932dddbbfa89 ("[PATCH] ipc/msg.c:
clean up coding style") breaks fakeroot on Alpha (variously hangs or
oopses), according to a report by Falk Hueffner.
The fact that the code seems to rely on compiler access ordering through
the use of "volatile" is a pretty certain sign that the code has locking
problems, and we should fix those properly and then remove the whole
"volatile" entirely.
But in the meantime, the movement of "volatile" was unintentional, and
should be reverted.
Jens Axboe [Sat, 4 Nov 2006 11:49:32 +0000 (12:49 +0100)]
[PATCH] splice: fix problem introduced with inode diet
After the inode slimming patch that unionised i_pipe/i_bdev/i_cdev, it's
no longer enough to check for existance of ->i_pipe to verify that this
is a pipe.
Original patch from Eric Dumazet <dada1@cosmosbay.com>
Final solution suggested by Linus.
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: use MII hooks only if CONFIG_MII is enabled
USB Storage: unusual_devs.h entry for Sony Ericsson P990i
USB: xpad: additional USB id's added
USB: fix compiler issues with newer gcc versions
USB: HID: add blacklist AIRcable USB, little beautification
USB: usblp: fix system suspend for some systems
USB: failure in usblp's error path
usbtouchscreen: use endpoint address from endpoint descriptor
USB: sierra: Fix id for Sierra Wireless MC8755 in new table
USB: new VID/PID-combos for cp2101
hid-core: big-endian fix fix
USB: usb-storage: Unusual_dev update
USB: add another sierra wireless device id
[PATCH] Fix user.* xattr permission check for sticky dirs
The user.* extended attributes are only allowed on regular files and
directories. Sticky directories further restrict write access to the owner
and privileged users. (See the attr(5) man page for an explanation.)
The original check in ext2/ext3 when user.* xattrs were merged was more
restrictive than intended, and when the xattr permission checks were moved
into the VFS, read access to user.* attributes on sticky directores ended
up being denied in addition.
Originally-from: Gerard Neil <xyzzy@devferret.org> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Fix sys_move_pages when a NULL node list is passed
sys_move_pages() uses vmalloc() to allocate an array of structures that is
fills with information passed from user mode and then passes to
do_stat_pages() (in the case the node list is NULL). do_stat_pages()
depends on a marker in the node field of the structure to decide how large
the array is and this marker is correctly inserted into the last element of
the array. However, vmalloc() doesn't zero the memory it allocates and if
the user passes NULL for the node list, then the node fields are not filled
in (except for the end marker). If the memory the vmalloc() returned
happend to have a word with the marker value in it in just the right place,
do_pages_stat will fail to fill the status field of part of the array and
we will return (random) kernel data to user mode.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 3 Nov 2006 06:07:23 +0000 (22:07 -0800)]
[PATCH] uml: include tidying
In order to get the __NR_* constants, we need sys/syscall.h.
linux/unistd.h works as well since it includes syscall.h, however syscall.h
is more parsimonious. We were inconsistent in this, and this patch adds
syscall.h includes where necessary and removes linux/unistd.h includes
where they are not needed.
asm/unistd.h also includes the __NR_* constants, but these are not the
glibc-sanctioned ones, so this also removes one such inclusion.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jeff Dike [Fri, 3 Nov 2006 06:07:22 +0000 (22:07 -0800)]
[PATCH] uml: fix I/O hang
Fix a UML hang in which everything would just stop until some I/O happened
- a ping, someone whacking the keyboard - at which point everything would
start up again as though nothing had happened.
The cause was gcc reordering some code which absolutely needed to be
executed in the order in the source. When unblock_signals switches signals
from off to on, it needs to see if any interrupts had happened in the
critical section. The interrupt handlers check signals_enabled - if it is
zero, then the handler adds a bit to the "pending" bitmask and returns.
unblock_signals checks this mask to see if any signals need to be
delivered.
The crucial part is this:
signals_enabled = 1;
save_pending = pending;
if(save_pending == 0)
return;
pending = 0;
In order to avoid an interrupt arriving between reading pending and setting
it to zero, in which case, the record of the interrupt would be erased,
signals are enabled.
What happened was that gcc reordered this so that 'save_pending = pending'
came before 'signals_enabled = 1', creating a one-instruction window within
which an interrupt could arrive, set its bit in pending, and have it be
immediately erased.
When the I/O workload is purely disk-based, the loss of a block device
interrupt stops the entire I/O system because the next block request will
wait for the current one to finish. Thus the system hangs until something
else causes some I/O to arrive, such as a network packet or console input.
The fix to this particular problem is a memory barrier between enabling
signals and reading the pending signal mask. An xchg would also probably
work.
Looking over this code for similar problems led me to do a few more
things:
- make signals_enabled and pending volatile so that they don't get cached
in registers
- add an mb() to the return paths of block_signals and unblock_signals so
that the modification of signals_enabled doesn't get shuffled into the
caller in the event that these are inlined in the future.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>