Adrian Bunk [Wed, 18 Jul 2007 09:37:05 +0000 (02:37 -0700)]
[PATCH] Missing header include in ipt_iprange.h
[NETFILTER]: ipt_iprange.h must #include <linux/types.h>
ipt_iprange.h must #include <linux/types.h> since it uses __be32.
This patch fixes kernel Bugzilla #7604.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
Patrick McHardy [Wed, 18 Jul 2007 09:26:27 +0000 (02:26 -0700)]
[PATCH] Fix IPCOMP crashes.
[XFRM]: Fix crash introduced by struct dst_entry reordering
XFRM expects xfrm_dst->u.next to be same pointer as dst->next, which
was broken by the dst_entry reordering in commit 1e19e02c~, causing
an oops in xfrm_bundle_ok when walking the bundle upwards.
Kill xfrm_dst->u.next and change the only user to use dst->next instead.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
This patch restores a couple of workarounds from 2.6.16:
* restart transmit moderation timer in case it expires during IRQ routine
* default to having 10 HZ watchdog timer.
At this point it more important not to hang than to worry about the
power cost.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jason Wessel [Mon, 2 Jul 2007 20:53:44 +0000 (15:53 -0500)]
[PATCH] i386: fix infinite loop with singlestep int80 syscalls
The commit 635cf99a80f4ebee59d70eb64bb85ce829e4591f introduced a
regression. Executing a ptrace single step after certain int80
accesses will infinitely loop and never advance the PC.
The TIF_SINGLESTEP check should be done on the return from the syscall
and not before it.
The new test case is below:
/* Test whether singlestep through an int80 syscall works.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/user.h>
#include <string.h>
static int child, status;
static struct user_regs_struct regs;
Jay Lubomirski [Wed, 27 Jun 2007 21:10:09 +0000 (14:10 -0700)]
[PATCH] serial: clear proper MPSC interrupt cause bits
The interrupt clearing code in mpsc_sdma_intr_ack() mistakenly clears the
interrupt for both controllers instead of just the one its supposed to.
This can result in the other controller appearing to hang because its
interrupt was effectively lost.
So, don't clear the interrupt cause bits for both MPSC controllers when
clearing the interrupt for one of them. Just clear the one that is
supposed to be cleared.
Signed-off-by: Jay Lubomirski <jaylubo@motorola.com> Acked-by: Mark A. Greer <mgreer@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Wed, 27 Jun 2007 21:09:58 +0000 (14:09 -0700)]
[PATCH] saa7134: fix thread shutdown handling
This patch changes the test for the thread pid from >= 0 to > 0.
When the saa7134 driver initialization fails after a certain point, it goes
through the complete shutdown process for the driver. Part of shutting it
down includes tearing down the thread for tv audio.
The test for tearing down the thread tests for >= 0. Since the dev
structure is kzalloc'd, the test will always be true if we haven't tried to
start the thread yet. We end up waiting on pid 0 to complete, which will
never happen, so we lock up.
This bug was observed in Novell Bugzilla 284718, when request_irq() failed.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Hugh Dickins [Wed, 27 Jun 2007 21:09:53 +0000 (14:09 -0700)]
[PATCH] mm: kill validate_anon_vma to avoid mapcount BUG
validate_anon_vma gave a useful check on the integrity of the anon_vma list
when Andrea was developing obj rmap; but it was not enabled in SLES9
itself, nor in mainline, until Nick changed commented-out RMAP_DEBUG to
configurable CONFIG_DEBUG_VM in 2.6.17. Now Petr Vandrovec reports that
its BUG_ON(mapcount > 100000) can easily crash a CONFIG_DEBUG_VM=y system.
That limit was just an arbitrary number to protect against an infinite
loop. We could raise it to something enormous (depending on sizeof struct
vma and size of memory?); but I rather think validate_anon_vma has outlived
its usefulness, and is better just removed - which gives a magnificent
performance boost to anything like Petr's test program ;)
Of course, a very long anon_vma list is bad news for preemption latency,
and I believe there has been one recent report of such: let's not forget
that, but validate_anon_vma only makes it worse not better.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Petr Vandrovec <petr@vmware.com> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Andrea Arcangeli <andrea@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Mackerras [Tue, 26 Jun 2007 10:10:12 +0000 (20:10 +1000)]
[PATCH] POWERPC: Fix subtle FP state corruption bug in signal return on SMP
This fixes a bug which can cause corruption of the floating-point state
on return from a signal handler. If we have a signal handler that has
used the floating-point registers, and it happens to context-switch to
another task while copying the interrupted floating-point state from the
user stack into the thread struct (e.g. because of a page fault, or
because it gets preempted), the context switch code will think that the
FP registers contain valid FP state that needs to be copied into the
thread_struct, and will thus overwrite the values that the signal return
code has put into the thread_struct.
This can occur because we clear the MSR bits that indicate the presence
of valid FP state after copying the state into the thread_struct. To fix
this we just move the clearing of the MSR bits to before the copy. A
similar potential problem also occurs with the Altivec state, and this
fixes that in the same way.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Tony Jones <tonyj@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Sat, 23 Jun 2007 09:48:40 +0000 (11:48 +0200)]
[PATCH] FUTEX: Restore the dropped ERSCH fix
The return value of futex_find_get_task() needs to be -ESRCH in case
that the search fails. This was part of the original futex fixes and
got accidentally dropped, when the futex-tidy-up patch was split out.
Results in a NULL pointer dereference in case the search fails.
Restore it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] sched: fix next_interval determination in idle_balance()
Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS.
(and later on reproduced without CFS as well).
The intervals of domains that do not have SD_BALANCE_NEWIDLE must be
considered for the calculation of the time of the next balance.
Otherwise we may defer rebalancing forever and nodes might stay idle for
very long times.
Siddha also spotted that the conversion of the balance interval to
jiffies is missing. Fix that to.
also continue the loop if !(sd->flags & SD_LOAD_BALANCE).
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It did in fact trigger under all three of mainline, CFS, and -rt
including CFS -- see below for a couple of emails from last Friday
giving results for these three on the AMD box (where it happened) and on
a single-quad NUMA-Q system (where it did not, at least not with such
severity).
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Olaf Kirch [Wed, 13 Jun 2007 20:00:30 +0000 (16:00 -0400)]
[PATCH] dm crypt: fix remove first_clone
Get rid of first_clone in dm-crypt
This gets rid of first_clone, which is not really needed. Apparently, cloned
bios used to share their bvec some time way in the past - this is no longer
the case. Contrarily, this even hurts us if we try to create a clone off
first_clone after it has completed, and crypt_endio has destroyed its bvec.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Olaf Kirch [Wed, 13 Jun 2007 19:57:50 +0000 (15:57 -0400)]
[PATCH] dm crypt: fix call to clone_init
Call clone_init early
We need to call clone_init as early as possible - at least before call
bio_put(clone) in any error path. Otherwise, the destructor will try to
dereference bi_private, which may still be NULL.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mike Accetta [Tue, 12 Jun 2007 01:09:35 +0000 (11:09 +1000)]
[PATCH] md: Fix bug in error handling during raid1 repair.
If raid1/repair (which reads all block and fixes any differences
it finds) hits a read error, it doesn't reset the bio for writing
before writing correct data back, so the read error isn't fixed,
and the device probably gets a zero-length write which it might
complain about.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] pi-futex: Fix exit races and locking problems
1. New entries can be added to tsk->pi_state_list after task completed
exit_pi_state_list(). The result is memory leakage and deadlocks.
2. handle_mm_fault() is called under spinlock. The result is obvious.
3. results in self-inflicted deadlock inside glibc.
Sometimes futex_lock_pi returns -ESRCH, when it is not expected
and glibc enters to for(;;) sleep() to simulate deadlock. This problem
is quite obvious and I think the patch is right. Though it looks like
each "if" in futex_lock_pi() got some stupid special case "else if". :-)
4. sometimes futex_lock_pi() returns -EDEADLK,
when nobody has the lock. The reason is also obvious (see comment
in the patch), but correct fix is far beyond my comprehension.
I guess someone already saw this, the chunk:
if (rt_mutex_trylock(&q.pi_state->pi_mutex))
ret = 0;
is obviously from the same opera. But it does not work, because the
rtmutex is really taken at this point: wake_futex_pi() of previous
owner reassigned it to us. My fix works. But it looks very stupid.
I would think about removal of shift of ownership in wake_futex_pi()
and making all the work in context of process taking lock.
From: Thomas Gleixner <tglx@linutronix.de>
Fix 1) Avoid the tasklist lock variant of the exit race fix by adding
an additional state transition to the exit code.
This fixes also the issue, when a task with recursive segfaults
is not able to release the futexes.
Fix 2) Cleanup the lookup_pi_state() failure path and solve the -ESRCH
problem finally.
Fix 3) Solve the fixup_pi_state_owner() problem which needs to do the fixup
in the lock protected section by using the in_atomic userspace access
functions.
This removes also the ugly lock drop / unqueue inside of fixup_pi_state()
Fix 4) Fix a stale lock in the error path of futex_wake_pi()
Added some error checks for verification.
The -EDEADLK problem is solved by the rtmutex fixups.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Fri, 8 Jun 2007 10:29:28 +0000 (10:29 +0000)]
[PATCH] rt-mutex: Fix stale return value
Alexey Kuznetsov found some problems in the pi-futex code.
The major problem is a stale return value in rt_mutex_slowlock():
When the pi chain walk returns -EDEADLK, but the waiter was woken up
during the phases where the locks were dropped, the rtmutex could be
acquired, but due to the stale return value -EDEADLK returned to the
caller.
Reset the return value in the woken up path.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Bob Picco [Fri, 8 Jun 2007 01:01:35 +0000 (21:01 -0400)]
[PATCH] sparsemem: fix oops in x86_64 show_mem
We aren't sampling for holes in memory. Thus we encounter a section hole with
empty section map pointer for SPARSEMEM and OOPs for show_mem. This issue
has been seen in 2.6.21, current git and current mm. This patch is for
2.6.21 stable. It was tested against sparsemem.
Previous to commit f0a5a58aa812b31fd9f197c4ba48245942364eae memory_present
was called for node_start_pfn to node_end_pfn. This would cover the hole(s)
with reserved pages and valid sections. Most SPARSEMEM supported arches
do a pfn_valid check in show_mem before computing the page structure address.
This issue was brought to my attention on IRC by Arnaldo Carvalho de Melo at
acme@redhat.com. Thanks to Arnaldo for testing.
Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.
There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.
This patch add fix to sparsemem model by first try to allocate memmap above
4G.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[chrisw: trivial backport] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Auke Kok [Fri, 1 Jun 2007 17:22:39 +0000 (10:22 -0700)]
[PATCH] e1000: disable polling before registering netdevice
To assure the symmetry of poll enable/disable in up/down, we should
initialize the netdevice to be poll_disabled at load time. Doing
this after register_netdevice leaves us open to another race, so
lets move all the netif_* calls above register_netdevice so the
stack starts out how we expect it to be.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NeilBrown [Mon, 21 May 2007 01:33:10 +0000 (11:33 +1000)]
[PATCH] md: Don't write more than is required of the last page of a bitmap
It is possible that real data or metadata follows the bitmap
without full page alignment.
So limit the last write to be only the required number of bytes,
rounded up to the hard sector size of the device.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NeilBrown [Mon, 21 May 2007 01:33:03 +0000 (11:33 +1000)]
[PATCH] md: Avoid overflow in raid0 calculation with large components.
If a raid0 has a component device larger than 4TB, and is accessed on
a 32bit machines, then as 'chunk' is unsigned lock,
chunk << chunksize_bits
can overflow (this can be as high as the size of the device in KB).
chunk itself will not overflow (without triggering a BUG).
So change 'chunk' to be 'sector_t, and get rid of the 'BUG' as it becomes
impossible to hit.
Cc: "Jeff Zheng" <Jeff.Zheng@endace.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andi Kleen [Mon, 21 May 2007 12:31:45 +0000 (14:31 +0200)]
[PATCH] i386: Fix K8/core2 oprofile on multiple CPUs
Only try to allocate MSRs once instead of for every CPU.
This assumes the MSRs are the same on all CPUs which is currently
true. P4-HT is a special case for different SMT threads, but the code
always saves/restores all MSRs so it works identical.
nf_conntrack_h323: add checking of out-of-range on choices' index values
[NETFILTER]: nf_conntrack_h323: add checking of out-of-range on choices' index values
Choices' index values may be out of range while still encoded in the fixed
length bit-field. This bug may cause access to undefined types (NULL
pointers) and thus crashes (Reported by Zhongling Wen).
This patch also adds checking of decode flag when decoding SEQUENCEs.
Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] Input: i8042 - fix AUX port detection with some chips
The i8042 driver fails detection of the AUX port with some chips,
because they apparently do not change the I8042_CTR_AUXDIS bit
immediately. This is known to affect at least HP500/HP510 notebooks,
consequently the built-in touchpad will not work. The patch will simply
reread the value until it gets the expected value or a retry limit is
hit, without touching other workaround code in the same area.
Signed-off-by: Roland Scheidegger <sroland@tungstengraphics.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] IPV6 ROUTE: No longer handle ::/0 specially.
We do not need to handle ::/0 routes specially any longer.
This should fix BUG #8349.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Yuji Sekiya <sekiya@wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
[chrisw: backport to 2.6.20] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
The unix_state_*() locking macros imply that there is some
rwlock kind of thing going on, but the implementation is
actually a spinlock which makes the code more confusing than
it needs to be.
So use plain unix_state_lock and unix_state_unlock.
Based upon an excellent bug report and initial patch by
Frederik Deweerdt.
The UNIX datagram connect code blindly dereferences other->sk_socket
via the call down to the security_unix_may_send() function.
Without locking 'other' that pointer can go NULL via unix_release_sock()
which does sock_orphan() which also marks the socket SOCK_DEAD.
So we have to lock both 'sk' and 'other' yet avoid all kinds of
potential deadlocks (connect to self is OK for datagram sockets and it
is possible for two datagram sockets to perform a simultaneous connect
to each other). So what we do is have a "double lock" function similar
to how we handle this situation in other areas of the kernel. We take
the lock of the socket pointer with the smallest address first in
order to avoid ABBA style deadlocks.
Once we have them both locked, we check to see if SOCK_DEAD is set
for 'other' and if so, drop everything and retry the lookup.
Signed-off-by: David S. Miller <davem@davemloft.net>
[chrisw: backport to 2.6.20] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
[PATCH] NET: Fix race condition about network device name allocation.
Kenji Kaneshige found this race between device removal and
registration. On unregister it is possible for the old device to
exist, because sysfs file is still open. A new device with 'eth%d'
will select the same name, but sysfs kobject register will fial.
The following changes the shutdown order slightly. It hold a removes
the sysfs entries earlier (on unregister_netdevice), but holds a
kobject reference. Then when todo runs the actual last put free
happens.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[chrisw: backport to 2.6.20] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Mark Glines [Thu, 7 Jun 2007 06:01:05 +0000 (23:01 -0700)]
[PATCH] TCP: Use default 32768-61000 outgoing port range in all cases.
This diff changes the default port range used for outgoing connections,
from "use 32768-61000 in most cases, but use N-4999 on small boxes
(where N is a multiple of 1024, depending on just *how* small the box
is)" to just "use 32768-61000 in all cases".
I don't believe there are any drawbacks to this change, and it keeps
outgoing connection ports farther away from the mess of
IANA-registered ports.
Signed-off-by: Mark Glines <mark@glines.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
David Miller [Thu, 7 Jun 2007 05:56:19 +0000 (22:56 -0700)]
[PATCH] SPARC64: Fix _PAGE_EXEC_4U check in sun4u I-TLB miss handler.
It was using an immediate _PAGE_EXEC_4U value in an 'and'
instruction to perform the test. This doesn't work because
the immediate field is signed 13-bit, this the mask being
tested against the PTE was 0x1000 sign-extended to 32-bits
instead of just plain 0x1000.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Vasily Averin [Thu, 7 Jun 2007 05:51:03 +0000 (22:51 -0700)]
[PATCH] NET: "wrong timeout value" in sk_wait_data() v2
sys_setsockopt() do not check properly timeout values for
SO_RCVTIMEO/SO_SNDTIMEO, for example it's possible to set negative timeout
values. POSIX do not defines behaviour for sys_setsockopt in case negative
timeouts, but requires that setsockopt() shall fail with -EDOM if the send and
receive timeout values are too big to fit into the timeout fields in the socket
structure.
In current implementation negative timeout can lead to error messages like
"schedule_timeout: wrong timeout value".
Proposed patch:
- checks tv_usec and returns -EDOM if it is wrong
- do not allows to set negative timeout values (sets 0 instead) and outputs
ratelimited information message about such attempts.
Signed-off-By: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Jan Engelhardt [Thu, 7 Jun 2007 05:49:14 +0000 (22:49 -0700)]
[PATCH] SPARC: Linux always started with 9600 8N1
The Linux kernel ignored the PROM's serial settings (115200,n,8,1 in
my case). This was because mode_prop remained "ttyX-mode" (expected:
"ttya-mode") due to the constness of string literals when used with
"char *". Since there is no "ttyX-mode" property in the PROM, Linux
always used the default 9600.
[ Investigation of the suncore.s assembler reveals that gcc optimizied
away the stores, yet did not emit a warning, which is a pretty
anti-social thing to do and is the only reason this bug lived for
so long -DaveM ]
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Dave Jones [Thu, 7 Jun 2007 05:48:09 +0000 (22:48 -0700)]
[PATCH] IPV4: Correct rp_filter help text.
As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015
The helptext implies that this is on by default.
This may be true on some distros (Fedora/RHEL have it enabled
in /etc/sysctl.conf), but the kernel defaults to it off.
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
[PATCH] IPSEC: Fix panic when using inter address familiy IPsec on loopback.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Jerome Borsboom [Thu, 7 Jun 2007 05:40:27 +0000 (22:40 -0700)]
[PATCH] NET: parse ip:port strings correctly in in4_pton
in4_pton converts a textual representation of an ip4 address
into an integer representation. However, when the textual representation
is of in the form ip:port, e.g. 192.168.1.1:5060, and 'delim' is set to
-1, the function bails out with an error when reading the colon.
It makes sense to allow the colon as a delimiting character without
explicitly having to set it through the 'delim' variable as there can be
no ambiguity in the point where the ip address is completely parsed. This
function is indeed called from nf_conntrack_sip.c in this way to parse
textual ip:port combinations which fails due to the reason stated above.
Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Currently when icmp_errors_use_inbound_ifaddr is set and an ICMP error is
sent after the packet passed through ip_output(), an address from the
outgoing interface is chosen as ICMP source address since skb->dev doesn't
point to the incoming interface anymore.
Fix this by doing an interface lookup on rt->dst.iif and using that device.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Andy Green [Wed, 2 May 2007 19:48:37 +0000 (21:48 +0200)]
[PATCH] kbuild: fixdep segfault on pathological string-o-death
build scripts: fixdep blows segfault on string CONFIG_MODULE seen
The string "CONFIG_MODULE" appearing anywhere in a source file causes
fixdep to segfault. This string appeared in the wild in the current
mISDN sources (I think they meant CONFIG_MODULES). But it shouldn't
segfault (esp as CONFIG_MODULE appeared in a quoted string).
Signed-off-by: Andy Green <andy@warmcat.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Petri Helin found that this changeset broke tuning:
'Well, after going through the changes that might have had effect on
tuning, I found out the one which had caused this problem. I do not know
the actual reason behind the change, but the changelog says that it
was meant to "Fix TD1316 tuner for DVBC". But at least in my case it
seams to have broken the tuner instead.'
Signed-off-by: Oliver Endriss <o.endriss@gmx.de> Thanks-to: Petri Helin <phelin@googlemail.com> Acked-by: e9hack <e9hack@googlemail.com> Acked-by: Thomas Kaiser <linux-dvb@kaiser-linux.li> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org> Acked-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
The effect of the two changes is that for every call to
clear_page_dirty_for_io a page_test_and_clear_dirty is done. If
the per page dirty bit is set set_page_dirty is called. Strangly
clear_page_dirty_for_io is called for not-uptodate pages, e.g.
over this call-chain:
The bad news now is that page_test_and_clear_dirty might claim
that a not-uptodate page is dirty since SetPageUptodate which
resets the per page dirty bit has not yet been called. The page
writeback that follows clobbers the data on disk.
The simplest solution to this problem is to move the call to
page_test_and_clear_dirty under the "if (page_mapped(page))".
If a file backed page is mapped it is uptodate.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
git commit f994aae1bd8e4813d59a2ed64d17585fe42d03fc changed the
function declaration of csum_tcpudp_nofold. Argument types were
changed from unsigned long to __be32 (unsigned int). Therefore we
lost the implicit type conversion that zeroed the upper half of the
registers that are used to pass parameters. Since the inline assembly
relied on this we ended up adding random values and wrong checksums
were created.
Showed only up on machines with more than 4GB since gcc produced code
where the registers that are used to pass 'saddr' and 'daddr' previously
contained addresses before calling this function.
Fix this by using 32 bit arithmetics and convert code to C, since gcc
produces better code than these hand-optimized versions.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Daniel Drake [Thu, 24 May 2007 13:35:38 +0000 (09:35 -0400)]
[PATCH] ALSA: usb-audio: explicitly match Logitech QuickCam
Commit 93c8bf45e083b89dffe3a708363c15c1b220c723 modified the USB device
matching behaviour to ignore interface class matches if the device class
is vendor-specific.
This patch adds explicit ID matches for Logitech QuickCam devices, which
have a vendor specific device class (but standards-compliant audio
interfaces).
This fixes a 2.6.20 regression where the audio component of these
devices was no longer usable.
http://bugs.gentoo.org/show_bug.cgi?id=175715
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/93822
https://bugtrack.alsa-project.org/alsa-bug/view.php?id=3040
Based on a patch from sergiom
Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andrew Morton [Wed, 23 May 2007 22:43:24 +0000 (18:43 -0400)]
[PATCH] acpi-thermal: fix mod_timer() interval
Use relative time, not absolute. Discovered by Jung-Ik (John) Lee
<jilee@google.com>.
Cc: Jung-Ik (John) Lee <jilee@google.com> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Sat, 19 May 2007 04:57:38 +0000 (14:57 +1000)]
[PATCH] CRYPTO: api: Read module pointer before freeing algorithm
The function crypto_mod_put first frees the algorithm and then drops
the reference to its module. Unfortunately we read the module pointer
which after freeing the algorithm and that pointer sits inside the
object that we just freed.
So this patch reads the module pointer out before we free the object.
Thanks to Luca Tettamanti for reporting this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Kleikamp [Wed, 16 May 2007 03:53:36 +0000 (22:53 -0500)]
[PATCH] JFS: Fix race waking up jfsIO kernel thread
It's possible for a journal I/O request to be added to the log_redrive
queue and the jfsIO thread to be awakened after the thread releases
log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE.
The jfsIO thread should set the state before giving up the spinlock, so
the waking thread will really wake it.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Thu, 10 May 2007 14:45:17 +0000 (16:45 +0200)]
[PATCH] driver-core: don't free devt_attr till the device is released
Currently, devt_attr for the "dev" file is freed immediately on device
removal, but if the "dev" sysfs file is open when a device is removed,
sysfs will access its attribute structure for further access including
close resulting in jumping to garbled address. Fix it by postponing
freeing devt_attr to device release time.
Note that devt_attr for class_device is already freed on release.
This bug is reported by Chris Rankin as bugzilla bug#8198.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Chris Rankin <rankincj@yahoo.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jorge Boncompte [Thu, 3 May 2007 01:14:27 +0000 (03:14 +0200)]
[PATCH] {ip, nf}_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.
The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.
Signed-off-by: Jorge Boncompte <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dan Williams [Wed, 2 May 2007 18:43:19 +0000 (11:43 -0700)]
[PATCH] iop13xx: fix i/o address translation
PCI devices were being programmed with an incorrect base address value.
This patch moves I/O space into a 16-bit addressable region and corrects
the i/o offset.
Much thanks to Martin Michlmayr for tracking this issue and testing
debug patches.
Cc: Martin Michlmayr <tbm@cyrius.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Rientjes [Fri, 27 Apr 2007 16:11:10 +0000 (12:11 -0400)]
[PATCH] oom: kill all threads that share mm with killed task
oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.
When creating a new connection by sending an unknown chunk type, we
don't transition to a valid state, causing a NULL pointer dereference in
sctp_packet when accessing sctp_timeouts[SCTP_CONNTRACK_NONE].
Fix by don't creating new conntrack entry if initial state is invalid.
Noticed by Vilmos Nebehaj <vilmos.nebehaj@ramsys.hu>
CC: Kiran Kumar Immidi <immidi_kiran@yahoo.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Allow in-place crypto operations. Also remove the coherent user flag
(we use it automagically now), and by default use the user written
key rather then the HW hidden key - this makes crypto just work without
any special considerations, and thats OK, since its our only usage
model.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of that is seen
here:
http://lkml.org/lkml/2007/4/15/41
Neil correctly diagnosed the situation for how this can happen, read
that analysis here:
http://lkml.org/lkml/2007/4/25/57
This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).
The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.
Jean Delvare [Wed, 25 Apr 2007 07:51:01 +0000 (09:51 +0200)]
hwmon/w83627ehf: Fix the fan5 clock divider write
Users have been complaining about the w83627ehf driver flooding their
logs with debug messages like:
w83627ehf 9191-0a10: Increasing fan 4 clock divider from 64 to 128
or:
w83627ehf 9191-0290: Increasing fan 4 clock divider from 4 to 8
The reason is that we failed to actually write the LSB of the encoded
clock divider value for that fan, causing the next read to report the
same old value again and again.
Additionally, the fan number was improperly reported, making the bug
harder to find.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Mon, 23 Apr 2007 21:41:17 +0000 (14:41 -0700)]
reiserfs: fix xattr root locking/refcount bug
The listxattr() and getxattr() operations are only protected by a read
lock. As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.
This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.
Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Andrea Righi <a.righi@cineca.it> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Edward Shishkin <edward@namesys.com> Cc: Alex Zarochentsev <zam@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Taskstats fix the structure members alignment issue
We broke the the alignment of members of taskstats to the 8 byte boundary
with the CSA patches. In the current kernel, the taskstats structure is
not suitable for use by 32 bit applications in a 64 bit kernel.
On x86_64
Offsets of taskstats' members (64 bit kernel, 64 bit application)
This is one way to solve the problem without re-arranging structure members
is to pack the structure. The patch adds an __attribute__((aligned(8))) to
the taskstats structure members so that 32 bit applications using taskstats
can work with a 64 bit kernel.
Using __attribute__((packed)) would break the 64 bit alignment of members.
The fix was tested on x86_64. After the fix, we got
Offsets of taskstats' members (64 bit kernel, 64 bit application)
NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to. If we replace the page in the radix tree then we may have to
shift the count to another zone.
Suggested-by: Ethan Solomita <solo@google.com> Cc: Martin Bligh <mbligh@mbligh.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix possible NULL pointer access in 8250 serial driver
I encountered the following kernel panic. The cause of this problem was
NULL pointer access in check_modem_status() in 8250.c. I confirmed this
problem is fixed by the attached patch, but I don't know this is the
correct fix.
Fix the possible NULL pointer access in check_modem_status() in 8250.c. The
check_modem_status() would access 'info' member of uart_port structure, but it
is not initialized before uart_open() is called. The check_modem_status() can
be called through /proc/tty/driver/serial before uart_open() is called.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Taku Izumi <izumi2005@soft.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fix OOM killing processes wrongly thought MPOL_BIND
I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)". memhog is killed correctly once we initialize nodemask in
constrained_alloc().
Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: William Irwin <bill.irwin@oracle.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While digging through my MAP_FIXED changes, I found that rather obvious
bug in /dev/mem mmap implementation for nommu archs. get_unmapped_area()
is expected to return an address, not a pfn.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
James Bottomley [Fri, 6 Apr 2007 16:14:56 +0000 (11:14 -0500)]
3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling
3w-xxxx emulates a REQUEST_SENSE response by simply returning nothing.
Unfortunately, it's assuming that the REQUEST_SENSE command is
implemented with use_sg == 0, which is no longer the case. The oops
occurs because it's clearing the scatterlist in request_buffer instead
of the memory region.
This is fixed by using tw_transfer_internal() to transfer correctly to
the scatterlist.
Acked-by: adam radford <aradford@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] vt: fix potential race in VT_WAITACTIVE handler
On a multiprocessor machine the VT_WAITACTIVE ioctl call may return 0 if
fg_console has already been updated in redraw_screen() but the console
switch itself hasn't been completed. Fix this by checking fg_console in
vt_waitactive() with the console sem held.
Zwane Mwaikambo [Thu, 19 Apr 2007 20:33:13 +0000 (16:33 -0400)]
x86: Don't probe for DDC on VBE1.2
[PATCH] x86: Don't probe for DDC on VBE1.2
VBE1.2 doesn't support function 15h (DDC) resulting in a 'hang' whilst
uncompressing kernel with some video cards. Make sure we check VBE version
before fiddling around with DDC.
http://bugzilla.kernel.org/show_bug.cgi?id=1458
Opened: 2003-10-30 09:12 Last update: 2007-02-13 22:03
Much thanks to Tobias Hain for help in testing and investigating the bug.
Tested on;
i386, Chips & Technologies 65548 VESA VBE 1.2
CONFIG_VIDEO_SELECT=Y
CONFIG_FIRMWARE_EDID=Y
Alan Cox [Tue, 17 Apr 2007 23:59:01 +0000 (23:59 +0000)]
exec.c: fix coredump to pipe problem and obscure "security hole"
exec.c: fix coredump to pipe problem and obscure "security hole"
The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)
Also fixes a very very obscure security corner case. If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands. I doubt anyone does
this.
Signed-off-by: Alan Cox <alan@redhat.com> Confirmed-by: Christopher S. Aker <caker@theshore.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cache_k8_northbridges() is storing config values to incorrect locations
(in flush_words) and also its overflowing beyond the allocation, causing
slab verification failures.
Olaf Kirch [Wed, 18 Apr 2007 22:14:14 +0000 (15:14 -0700)]
Fix IRDA oops'er
This fixes and OOPS due to incorrect socket orpahning in the
IRDA stack.
[IrDA]: Correctly handling socket error
This patch fixes an oops first reported in mid 2006 - see
http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that
when an error is signalled on the socket, irda_recvmsg_stream returns
without removing a local wait_queue variable from the socket's sk_sleep
queue. This causes havoc further down the road.
In response to this problem, a patch was made that invoked sock_orphan on
the socket when receiving a disconnect indication. This is not a good fix,
as this sets sk_sleep to NULL, causing applications sleeping in recvmsg
(and other places) to oops.
This is against the latest net-2.6 and should be considered for -stable
inclusion.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Netpoll UDP input handler needs to pull up the UDP headers
and handle receive checksum offloading properly just like
the normal UDP input path does else we get corrupted
checksums.
[NET]: Fix UDP checksum issue in net poll mode.
In net poll mode, the current checksum function doesn't consider the
kind of packet which is padded to reach a specific minimum length. I
believe that's the problem causing my test case failed. The following
patch fixed this issue.
Signed-off-by: Aubrey.Li <aubreylee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While building a test kernel for the new esp driver (against
git-current), I hit this bug. Trivial fix, put the inline declaration
in the right place. :)
Signed-off-by: Tom "spot" Callaway <tcallawa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 17 Apr 2007 21:40:46 +0000 (14:40 -0700)]
Fix compat sys_ipc() on sparc64
The 32-bit syscall trampoline for sys_ipc() on sparc64
was sign extending various arguments, which is bogus when
using compat_sys_ipc() since that function expects zero
extended copies of all the arguments.
This bug breaks the sparc64 kernel when built with gcc-4.2.x
among other things.
[SPARC64]: Fix arg passing to compat_sys_ipc().
Do not sign extend args using the sys32_ipc stub, that is
buggy and unnecessary.
Based upon an excellent report by Mikael Pettersson.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 17 Apr 2007 21:37:25 +0000 (14:37 -0700)]
Fix sparc64 SBUS IOMMU allocator
[SPARC64]: Fix SBUS IOMMU allocation code.
There are several IOMMU allocator bugs. Instead of trying to fix this
overly complicated code, just mirror the PCI IOMMU arena allocator
which is very stable and well stress tested.
I tried to make the code as identical as possible so we can switch
sun4u PCI and SBUS over to a common piece of IOMMU code. All that
will be need are two callbacks, one to do a full IOMMU flush and one
to do a streaming buffer flush.
This patch gets rid of a lot of hangs and mysterious crashes on SBUS
sparc64 systems, at least for me.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sys_madvise has down_write of mmap_sem, then madvise_remove calls
vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can
easily devise deadlocks from that ordering.
madvise_remove drop mmap_sem while calling vmtruncate_range: luckily,
since madvise_remove doesn't split or merge vmas, it's easy to handle
this case with a NULL prev, without restructuring sys_madvise. (Though
sad to retake mmap_sem when it's unlikely to be needed, and certainly
down_read is sufficient for MADV_REMOVE, unlike the other madvices.)
holepunch: fix disconnected pages after second truncate
shmem_truncate_range has its own truncate_inode_pages_range, to free any
pages racily instantiated while it was in progress: a SHMEM_PAGEIN flag
is set when this might have happened. But holepunching gets no chance
to clear that flag at the start of vmtruncate_range, so it's always set
(unless a truncate came just before), so holepunch almost always does
this second truncate_inode_pages_range.
shmem holepunch has unlikely swap<->file races hereabouts whatever we do
(without a fuller rework than is fit for this release): I was going to
skip the second truncate in the punch_hole case, but Miklos points out
that would make holepunch correctness more vulnerable to swapoff. So
keep the second truncate, but follow it by an unmap_mapping_range to
eliminate the disconnected pages (freed from pagecache while still
mapped in userspace) that it might have left behind.