Mike Accetta [Tue, 12 Jun 2007 01:09:35 +0000 (11:09 +1000)]
[PATCH] md: Fix bug in error handling during raid1 repair.
If raid1/repair (which reads all block and fixes any differences
it finds) hits a read error, it doesn't reset the bio for writing
before writing correct data back, so the read error isn't fixed,
and the device probably gets a zero-length write which it might
complain about.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] pi-futex: Fix exit races and locking problems
1. New entries can be added to tsk->pi_state_list after task completed
exit_pi_state_list(). The result is memory leakage and deadlocks.
2. handle_mm_fault() is called under spinlock. The result is obvious.
3. results in self-inflicted deadlock inside glibc.
Sometimes futex_lock_pi returns -ESRCH, when it is not expected
and glibc enters to for(;;) sleep() to simulate deadlock. This problem
is quite obvious and I think the patch is right. Though it looks like
each "if" in futex_lock_pi() got some stupid special case "else if". :-)
4. sometimes futex_lock_pi() returns -EDEADLK,
when nobody has the lock. The reason is also obvious (see comment
in the patch), but correct fix is far beyond my comprehension.
I guess someone already saw this, the chunk:
if (rt_mutex_trylock(&q.pi_state->pi_mutex))
ret = 0;
is obviously from the same opera. But it does not work, because the
rtmutex is really taken at this point: wake_futex_pi() of previous
owner reassigned it to us. My fix works. But it looks very stupid.
I would think about removal of shift of ownership in wake_futex_pi()
and making all the work in context of process taking lock.
From: Thomas Gleixner <tglx@linutronix.de>
Fix 1) Avoid the tasklist lock variant of the exit race fix by adding
an additional state transition to the exit code.
This fixes also the issue, when a task with recursive segfaults
is not able to release the futexes.
Fix 2) Cleanup the lookup_pi_state() failure path and solve the -ESRCH
problem finally.
Fix 3) Solve the fixup_pi_state_owner() problem which needs to do the fixup
in the lock protected section by using the in_atomic userspace access
functions.
This removes also the ugly lock drop / unqueue inside of fixup_pi_state()
Fix 4) Fix a stale lock in the error path of futex_wake_pi()
Added some error checks for verification.
The -EDEADLK problem is solved by the rtmutex fixups.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Fri, 8 Jun 2007 10:29:28 +0000 (10:29 +0000)]
[PATCH] rt-mutex: Fix stale return value
Alexey Kuznetsov found some problems in the pi-futex code.
The major problem is a stale return value in rt_mutex_slowlock():
When the pi chain walk returns -EDEADLK, but the waiter was woken up
during the phases where the locks were dropped, the rtmutex could be
acquired, but due to the stale return value -EDEADLK returned to the
caller.
Reset the return value in the woken up path.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Bob Picco [Fri, 8 Jun 2007 01:01:35 +0000 (21:01 -0400)]
[PATCH] sparsemem: fix oops in x86_64 show_mem
We aren't sampling for holes in memory. Thus we encounter a section hole with
empty section map pointer for SPARSEMEM and OOPs for show_mem. This issue
has been seen in 2.6.21, current git and current mm. This patch is for
2.6.21 stable. It was tested against sparsemem.
Previous to commit f0a5a58aa812b31fd9f197c4ba48245942364eae memory_present
was called for node_start_pfn to node_end_pfn. This would cover the hole(s)
with reserved pages and valid sections. Most SPARSEMEM supported arches
do a pfn_valid check in show_mem before computing the page structure address.
This issue was brought to my attention on IRC by Arnaldo Carvalho de Melo at
acme@redhat.com. Thanks to Arnaldo for testing.
Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.
There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.
This patch add fix to sparsemem model by first try to allocate memmap above
4G.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[chrisw: trivial backport] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Auke Kok [Fri, 1 Jun 2007 17:22:39 +0000 (10:22 -0700)]
[PATCH] e1000: disable polling before registering netdevice
To assure the symmetry of poll enable/disable in up/down, we should
initialize the netdevice to be poll_disabled at load time. Doing
this after register_netdevice leaves us open to another race, so
lets move all the netif_* calls above register_netdevice so the
stack starts out how we expect it to be.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NeilBrown [Mon, 21 May 2007 01:33:10 +0000 (11:33 +1000)]
[PATCH] md: Don't write more than is required of the last page of a bitmap
It is possible that real data or metadata follows the bitmap
without full page alignment.
So limit the last write to be only the required number of bytes,
rounded up to the hard sector size of the device.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NeilBrown [Mon, 21 May 2007 01:33:03 +0000 (11:33 +1000)]
[PATCH] md: Avoid overflow in raid0 calculation with large components.
If a raid0 has a component device larger than 4TB, and is accessed on
a 32bit machines, then as 'chunk' is unsigned lock,
chunk << chunksize_bits
can overflow (this can be as high as the size of the device in KB).
chunk itself will not overflow (without triggering a BUG).
So change 'chunk' to be 'sector_t, and get rid of the 'BUG' as it becomes
impossible to hit.
Cc: "Jeff Zheng" <Jeff.Zheng@endace.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andi Kleen [Mon, 21 May 2007 12:31:45 +0000 (14:31 +0200)]
[PATCH] i386: Fix K8/core2 oprofile on multiple CPUs
Only try to allocate MSRs once instead of for every CPU.
This assumes the MSRs are the same on all CPUs which is currently
true. P4-HT is a special case for different SMT threads, but the code
always saves/restores all MSRs so it works identical.
nf_conntrack_h323: add checking of out-of-range on choices' index values
[NETFILTER]: nf_conntrack_h323: add checking of out-of-range on choices' index values
Choices' index values may be out of range while still encoded in the fixed
length bit-field. This bug may cause access to undefined types (NULL
pointers) and thus crashes (Reported by Zhongling Wen).
This patch also adds checking of decode flag when decoding SEQUENCEs.
Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] Input: i8042 - fix AUX port detection with some chips
The i8042 driver fails detection of the AUX port with some chips,
because they apparently do not change the I8042_CTR_AUXDIS bit
immediately. This is known to affect at least HP500/HP510 notebooks,
consequently the built-in touchpad will not work. The patch will simply
reread the value until it gets the expected value or a retry limit is
hit, without touching other workaround code in the same area.
Signed-off-by: Roland Scheidegger <sroland@tungstengraphics.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] IPV6 ROUTE: No longer handle ::/0 specially.
We do not need to handle ::/0 routes specially any longer.
This should fix BUG #8349.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Yuji Sekiya <sekiya@wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
[chrisw: backport to 2.6.20] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
The unix_state_*() locking macros imply that there is some
rwlock kind of thing going on, but the implementation is
actually a spinlock which makes the code more confusing than
it needs to be.
So use plain unix_state_lock and unix_state_unlock.
Based upon an excellent bug report and initial patch by
Frederik Deweerdt.
The UNIX datagram connect code blindly dereferences other->sk_socket
via the call down to the security_unix_may_send() function.
Without locking 'other' that pointer can go NULL via unix_release_sock()
which does sock_orphan() which also marks the socket SOCK_DEAD.
So we have to lock both 'sk' and 'other' yet avoid all kinds of
potential deadlocks (connect to self is OK for datagram sockets and it
is possible for two datagram sockets to perform a simultaneous connect
to each other). So what we do is have a "double lock" function similar
to how we handle this situation in other areas of the kernel. We take
the lock of the socket pointer with the smallest address first in
order to avoid ABBA style deadlocks.
Once we have them both locked, we check to see if SOCK_DEAD is set
for 'other' and if so, drop everything and retry the lookup.
Signed-off-by: David S. Miller <davem@davemloft.net>
[chrisw: backport to 2.6.20] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
[PATCH] NET: Fix race condition about network device name allocation.
Kenji Kaneshige found this race between device removal and
registration. On unregister it is possible for the old device to
exist, because sysfs file is still open. A new device with 'eth%d'
will select the same name, but sysfs kobject register will fial.
The following changes the shutdown order slightly. It hold a removes
the sysfs entries earlier (on unregister_netdevice), but holds a
kobject reference. Then when todo runs the actual last put free
happens.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[chrisw: backport to 2.6.20] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Mark Glines [Thu, 7 Jun 2007 06:01:05 +0000 (23:01 -0700)]
[PATCH] TCP: Use default 32768-61000 outgoing port range in all cases.
This diff changes the default port range used for outgoing connections,
from "use 32768-61000 in most cases, but use N-4999 on small boxes
(where N is a multiple of 1024, depending on just *how* small the box
is)" to just "use 32768-61000 in all cases".
I don't believe there are any drawbacks to this change, and it keeps
outgoing connection ports farther away from the mess of
IANA-registered ports.
Signed-off-by: Mark Glines <mark@glines.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
David Miller [Thu, 7 Jun 2007 05:56:19 +0000 (22:56 -0700)]
[PATCH] SPARC64: Fix _PAGE_EXEC_4U check in sun4u I-TLB miss handler.
It was using an immediate _PAGE_EXEC_4U value in an 'and'
instruction to perform the test. This doesn't work because
the immediate field is signed 13-bit, this the mask being
tested against the PTE was 0x1000 sign-extended to 32-bits
instead of just plain 0x1000.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Vasily Averin [Thu, 7 Jun 2007 05:51:03 +0000 (22:51 -0700)]
[PATCH] NET: "wrong timeout value" in sk_wait_data() v2
sys_setsockopt() do not check properly timeout values for
SO_RCVTIMEO/SO_SNDTIMEO, for example it's possible to set negative timeout
values. POSIX do not defines behaviour for sys_setsockopt in case negative
timeouts, but requires that setsockopt() shall fail with -EDOM if the send and
receive timeout values are too big to fit into the timeout fields in the socket
structure.
In current implementation negative timeout can lead to error messages like
"schedule_timeout: wrong timeout value".
Proposed patch:
- checks tv_usec and returns -EDOM if it is wrong
- do not allows to set negative timeout values (sets 0 instead) and outputs
ratelimited information message about such attempts.
Signed-off-By: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Jan Engelhardt [Thu, 7 Jun 2007 05:49:14 +0000 (22:49 -0700)]
[PATCH] SPARC: Linux always started with 9600 8N1
The Linux kernel ignored the PROM's serial settings (115200,n,8,1 in
my case). This was because mode_prop remained "ttyX-mode" (expected:
"ttya-mode") due to the constness of string literals when used with
"char *". Since there is no "ttyX-mode" property in the PROM, Linux
always used the default 9600.
[ Investigation of the suncore.s assembler reveals that gcc optimizied
away the stores, yet did not emit a warning, which is a pretty
anti-social thing to do and is the only reason this bug lived for
so long -DaveM ]
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Dave Jones [Thu, 7 Jun 2007 05:48:09 +0000 (22:48 -0700)]
[PATCH] IPV4: Correct rp_filter help text.
As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015
The helptext implies that this is on by default.
This may be true on some distros (Fedora/RHEL have it enabled
in /etc/sysctl.conf), but the kernel defaults to it off.
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
[PATCH] IPSEC: Fix panic when using inter address familiy IPsec on loopback.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Jerome Borsboom [Thu, 7 Jun 2007 05:40:27 +0000 (22:40 -0700)]
[PATCH] NET: parse ip:port strings correctly in in4_pton
in4_pton converts a textual representation of an ip4 address
into an integer representation. However, when the textual representation
is of in the form ip:port, e.g. 192.168.1.1:5060, and 'delim' is set to
-1, the function bails out with an error when reading the colon.
It makes sense to allow the colon as a delimiting character without
explicitly having to set it through the 'delim' variable as there can be
no ambiguity in the point where the ip address is completely parsed. This
function is indeed called from nf_conntrack_sip.c in this way to parse
textual ip:port combinations which fails due to the reason stated above.
Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Currently when icmp_errors_use_inbound_ifaddr is set and an ICMP error is
sent after the packet passed through ip_output(), an address from the
outgoing interface is chosen as ICMP source address since skb->dev doesn't
point to the incoming interface anymore.
Fix this by doing an interface lookup on rt->dst.iif and using that device.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Andy Green [Wed, 2 May 2007 19:48:37 +0000 (21:48 +0200)]
[PATCH] kbuild: fixdep segfault on pathological string-o-death
build scripts: fixdep blows segfault on string CONFIG_MODULE seen
The string "CONFIG_MODULE" appearing anywhere in a source file causes
fixdep to segfault. This string appeared in the wild in the current
mISDN sources (I think they meant CONFIG_MODULES). But it shouldn't
segfault (esp as CONFIG_MODULE appeared in a quoted string).
Signed-off-by: Andy Green <andy@warmcat.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Petri Helin found that this changeset broke tuning:
'Well, after going through the changes that might have had effect on
tuning, I found out the one which had caused this problem. I do not know
the actual reason behind the change, but the changelog says that it
was meant to "Fix TD1316 tuner for DVBC". But at least in my case it
seams to have broken the tuner instead.'
Signed-off-by: Oliver Endriss <o.endriss@gmx.de> Thanks-to: Petri Helin <phelin@googlemail.com> Acked-by: e9hack <e9hack@googlemail.com> Acked-by: Thomas Kaiser <linux-dvb@kaiser-linux.li> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org> Acked-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
The effect of the two changes is that for every call to
clear_page_dirty_for_io a page_test_and_clear_dirty is done. If
the per page dirty bit is set set_page_dirty is called. Strangly
clear_page_dirty_for_io is called for not-uptodate pages, e.g.
over this call-chain:
The bad news now is that page_test_and_clear_dirty might claim
that a not-uptodate page is dirty since SetPageUptodate which
resets the per page dirty bit has not yet been called. The page
writeback that follows clobbers the data on disk.
The simplest solution to this problem is to move the call to
page_test_and_clear_dirty under the "if (page_mapped(page))".
If a file backed page is mapped it is uptodate.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
git commit f994aae1bd8e4813d59a2ed64d17585fe42d03fc changed the
function declaration of csum_tcpudp_nofold. Argument types were
changed from unsigned long to __be32 (unsigned int). Therefore we
lost the implicit type conversion that zeroed the upper half of the
registers that are used to pass parameters. Since the inline assembly
relied on this we ended up adding random values and wrong checksums
were created.
Showed only up on machines with more than 4GB since gcc produced code
where the registers that are used to pass 'saddr' and 'daddr' previously
contained addresses before calling this function.
Fix this by using 32 bit arithmetics and convert code to C, since gcc
produces better code than these hand-optimized versions.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Daniel Drake [Thu, 24 May 2007 13:35:38 +0000 (09:35 -0400)]
[PATCH] ALSA: usb-audio: explicitly match Logitech QuickCam
Commit 93c8bf45e083b89dffe3a708363c15c1b220c723 modified the USB device
matching behaviour to ignore interface class matches if the device class
is vendor-specific.
This patch adds explicit ID matches for Logitech QuickCam devices, which
have a vendor specific device class (but standards-compliant audio
interfaces).
This fixes a 2.6.20 regression where the audio component of these
devices was no longer usable.
http://bugs.gentoo.org/show_bug.cgi?id=175715
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/93822
https://bugtrack.alsa-project.org/alsa-bug/view.php?id=3040
Based on a patch from sergiom
Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Jaroslav Kysela <perex@suse.cz> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andrew Morton [Wed, 23 May 2007 22:43:24 +0000 (18:43 -0400)]
[PATCH] acpi-thermal: fix mod_timer() interval
Use relative time, not absolute. Discovered by Jung-Ik (John) Lee
<jilee@google.com>.
Cc: Jung-Ik (John) Lee <jilee@google.com> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Sat, 19 May 2007 04:57:38 +0000 (14:57 +1000)]
[PATCH] CRYPTO: api: Read module pointer before freeing algorithm
The function crypto_mod_put first frees the algorithm and then drops
the reference to its module. Unfortunately we read the module pointer
which after freeing the algorithm and that pointer sits inside the
object that we just freed.
So this patch reads the module pointer out before we free the object.
Thanks to Luca Tettamanti for reporting this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Kleikamp [Wed, 16 May 2007 03:53:36 +0000 (22:53 -0500)]
[PATCH] JFS: Fix race waking up jfsIO kernel thread
It's possible for a journal I/O request to be added to the log_redrive
queue and the jfsIO thread to be awakened after the thread releases
log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE.
The jfsIO thread should set the state before giving up the spinlock, so
the waking thread will really wake it.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tejun Heo [Thu, 10 May 2007 14:45:17 +0000 (16:45 +0200)]
[PATCH] driver-core: don't free devt_attr till the device is released
Currently, devt_attr for the "dev" file is freed immediately on device
removal, but if the "dev" sysfs file is open when a device is removed,
sysfs will access its attribute structure for further access including
close resulting in jumping to garbled address. Fix it by postponing
freeing devt_attr to device release time.
Note that devt_attr for class_device is already freed on release.
This bug is reported by Chris Rankin as bugzilla bug#8198.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Chris Rankin <rankincj@yahoo.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jorge Boncompte [Thu, 3 May 2007 01:14:27 +0000 (03:14 +0200)]
[PATCH] {ip, nf}_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.
The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.
Signed-off-by: Jorge Boncompte <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dan Williams [Wed, 2 May 2007 18:43:19 +0000 (11:43 -0700)]
[PATCH] iop13xx: fix i/o address translation
PCI devices were being programmed with an incorrect base address value.
This patch moves I/O space into a 16-bit addressable region and corrects
the i/o offset.
Much thanks to Martin Michlmayr for tracking this issue and testing
debug patches.
Cc: Martin Michlmayr <tbm@cyrius.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Rientjes [Fri, 27 Apr 2007 16:11:10 +0000 (12:11 -0400)]
[PATCH] oom: kill all threads that share mm with killed task
oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.
When creating a new connection by sending an unknown chunk type, we
don't transition to a valid state, causing a NULL pointer dereference in
sctp_packet when accessing sctp_timeouts[SCTP_CONNTRACK_NONE].
Fix by don't creating new conntrack entry if initial state is invalid.
Noticed by Vilmos Nebehaj <vilmos.nebehaj@ramsys.hu>
CC: Kiran Kumar Immidi <immidi_kiran@yahoo.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Allow in-place crypto operations. Also remove the coherent user flag
(we use it automagically now), and by default use the user written
key rather then the HW hidden key - this makes crypto just work without
any special considerations, and thats OK, since its our only usage
model.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of that is seen
here:
http://lkml.org/lkml/2007/4/15/41
Neil correctly diagnosed the situation for how this can happen, read
that analysis here:
http://lkml.org/lkml/2007/4/25/57
This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).
The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.
Jean Delvare [Wed, 25 Apr 2007 07:51:01 +0000 (09:51 +0200)]
hwmon/w83627ehf: Fix the fan5 clock divider write
Users have been complaining about the w83627ehf driver flooding their
logs with debug messages like:
w83627ehf 9191-0a10: Increasing fan 4 clock divider from 64 to 128
or:
w83627ehf 9191-0290: Increasing fan 4 clock divider from 4 to 8
The reason is that we failed to actually write the LSB of the encoded
clock divider value for that fan, causing the next read to report the
same old value again and again.
Additionally, the fan number was improperly reported, making the bug
harder to find.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Mon, 23 Apr 2007 21:41:17 +0000 (14:41 -0700)]
reiserfs: fix xattr root locking/refcount bug
The listxattr() and getxattr() operations are only protected by a read
lock. As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.
This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.
Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Andrea Righi <a.righi@cineca.it> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Edward Shishkin <edward@namesys.com> Cc: Alex Zarochentsev <zam@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Taskstats fix the structure members alignment issue
We broke the the alignment of members of taskstats to the 8 byte boundary
with the CSA patches. In the current kernel, the taskstats structure is
not suitable for use by 32 bit applications in a 64 bit kernel.
On x86_64
Offsets of taskstats' members (64 bit kernel, 64 bit application)
This is one way to solve the problem without re-arranging structure members
is to pack the structure. The patch adds an __attribute__((aligned(8))) to
the taskstats structure members so that 32 bit applications using taskstats
can work with a 64 bit kernel.
Using __attribute__((packed)) would break the 64 bit alignment of members.
The fix was tested on x86_64. After the fix, we got
Offsets of taskstats' members (64 bit kernel, 64 bit application)
NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to. If we replace the page in the radix tree then we may have to
shift the count to another zone.
Suggested-by: Ethan Solomita <solo@google.com> Cc: Martin Bligh <mbligh@mbligh.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix possible NULL pointer access in 8250 serial driver
I encountered the following kernel panic. The cause of this problem was
NULL pointer access in check_modem_status() in 8250.c. I confirmed this
problem is fixed by the attached patch, but I don't know this is the
correct fix.
Fix the possible NULL pointer access in check_modem_status() in 8250.c. The
check_modem_status() would access 'info' member of uart_port structure, but it
is not initialized before uart_open() is called. The check_modem_status() can
be called through /proc/tty/driver/serial before uart_open() is called.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Taku Izumi <izumi2005@soft.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fix OOM killing processes wrongly thought MPOL_BIND
I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)". memhog is killed correctly once we initialize nodemask in
constrained_alloc().
Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: William Irwin <bill.irwin@oracle.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While digging through my MAP_FIXED changes, I found that rather obvious
bug in /dev/mem mmap implementation for nommu archs. get_unmapped_area()
is expected to return an address, not a pfn.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
James Bottomley [Fri, 6 Apr 2007 16:14:56 +0000 (11:14 -0500)]
3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling
3w-xxxx emulates a REQUEST_SENSE response by simply returning nothing.
Unfortunately, it's assuming that the REQUEST_SENSE command is
implemented with use_sg == 0, which is no longer the case. The oops
occurs because it's clearing the scatterlist in request_buffer instead
of the memory region.
This is fixed by using tw_transfer_internal() to transfer correctly to
the scatterlist.
Acked-by: adam radford <aradford@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] vt: fix potential race in VT_WAITACTIVE handler
On a multiprocessor machine the VT_WAITACTIVE ioctl call may return 0 if
fg_console has already been updated in redraw_screen() but the console
switch itself hasn't been completed. Fix this by checking fg_console in
vt_waitactive() with the console sem held.
Zwane Mwaikambo [Thu, 19 Apr 2007 20:33:13 +0000 (16:33 -0400)]
x86: Don't probe for DDC on VBE1.2
[PATCH] x86: Don't probe for DDC on VBE1.2
VBE1.2 doesn't support function 15h (DDC) resulting in a 'hang' whilst
uncompressing kernel with some video cards. Make sure we check VBE version
before fiddling around with DDC.
http://bugzilla.kernel.org/show_bug.cgi?id=1458
Opened: 2003-10-30 09:12 Last update: 2007-02-13 22:03
Much thanks to Tobias Hain for help in testing and investigating the bug.
Tested on;
i386, Chips & Technologies 65548 VESA VBE 1.2
CONFIG_VIDEO_SELECT=Y
CONFIG_FIRMWARE_EDID=Y
Alan Cox [Tue, 17 Apr 2007 23:59:01 +0000 (23:59 +0000)]
exec.c: fix coredump to pipe problem and obscure "security hole"
exec.c: fix coredump to pipe problem and obscure "security hole"
The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)
Also fixes a very very obscure security corner case. If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands. I doubt anyone does
this.
Signed-off-by: Alan Cox <alan@redhat.com> Confirmed-by: Christopher S. Aker <caker@theshore.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cache_k8_northbridges() is storing config values to incorrect locations
(in flush_words) and also its overflowing beyond the allocation, causing
slab verification failures.
Olaf Kirch [Wed, 18 Apr 2007 22:14:14 +0000 (15:14 -0700)]
Fix IRDA oops'er
This fixes and OOPS due to incorrect socket orpahning in the
IRDA stack.
[IrDA]: Correctly handling socket error
This patch fixes an oops first reported in mid 2006 - see
http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that
when an error is signalled on the socket, irda_recvmsg_stream returns
without removing a local wait_queue variable from the socket's sk_sleep
queue. This causes havoc further down the road.
In response to this problem, a patch was made that invoked sock_orphan on
the socket when receiving a disconnect indication. This is not a good fix,
as this sets sk_sleep to NULL, causing applications sleeping in recvmsg
(and other places) to oops.
This is against the latest net-2.6 and should be considered for -stable
inclusion.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Netpoll UDP input handler needs to pull up the UDP headers
and handle receive checksum offloading properly just like
the normal UDP input path does else we get corrupted
checksums.
[NET]: Fix UDP checksum issue in net poll mode.
In net poll mode, the current checksum function doesn't consider the
kind of packet which is padded to reach a specific minimum length. I
believe that's the problem causing my test case failed. The following
patch fixed this issue.
Signed-off-by: Aubrey.Li <aubreylee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While building a test kernel for the new esp driver (against
git-current), I hit this bug. Trivial fix, put the inline declaration
in the right place. :)
Signed-off-by: Tom "spot" Callaway <tcallawa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 17 Apr 2007 21:40:46 +0000 (14:40 -0700)]
Fix compat sys_ipc() on sparc64
The 32-bit syscall trampoline for sys_ipc() on sparc64
was sign extending various arguments, which is bogus when
using compat_sys_ipc() since that function expects zero
extended copies of all the arguments.
This bug breaks the sparc64 kernel when built with gcc-4.2.x
among other things.
[SPARC64]: Fix arg passing to compat_sys_ipc().
Do not sign extend args using the sys32_ipc stub, that is
buggy and unnecessary.
Based upon an excellent report by Mikael Pettersson.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 17 Apr 2007 21:37:25 +0000 (14:37 -0700)]
Fix sparc64 SBUS IOMMU allocator
[SPARC64]: Fix SBUS IOMMU allocation code.
There are several IOMMU allocator bugs. Instead of trying to fix this
overly complicated code, just mirror the PCI IOMMU arena allocator
which is very stable and well stress tested.
I tried to make the code as identical as possible so we can switch
sun4u PCI and SBUS over to a common piece of IOMMU code. All that
will be need are two callbacks, one to do a full IOMMU flush and one
to do a streaming buffer flush.
This patch gets rid of a lot of hangs and mysterious crashes on SBUS
sparc64 systems, at least for me.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sys_madvise has down_write of mmap_sem, then madvise_remove calls
vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can
easily devise deadlocks from that ordering.
madvise_remove drop mmap_sem while calling vmtruncate_range: luckily,
since madvise_remove doesn't split or merge vmas, it's easy to handle
this case with a NULL prev, without restructuring sys_madvise. (Though
sad to retake mmap_sem when it's unlikely to be needed, and certainly
down_read is sufficient for MADV_REMOVE, unlike the other madvices.)
holepunch: fix disconnected pages after second truncate
shmem_truncate_range has its own truncate_inode_pages_range, to free any
pages racily instantiated while it was in progress: a SHMEM_PAGEIN flag
is set when this might have happened. But holepunching gets no chance
to clear that flag at the start of vmtruncate_range, so it's always set
(unless a truncate came just before), so holepunch almost always does
this second truncate_inode_pages_range.
shmem holepunch has unlikely swap<->file races hereabouts whatever we do
(without a fuller rework than is fit for this release): I was going to
skip the second truncate in the punch_hole case, but Miklos points out
that would make holepunch correctness more vulnerable to swapoff. So
keep the second truncate, but follow it by an unmap_mapping_range to
eliminate the disconnected pages (freed from pagecache while still
mapped in userspace) that it might have left behind.
Miklos Szeredi observes that during truncation of shmem page directories,
info->lock is released to improve latency (after lowering i_size and
next_index to exclude races); but this is quite wrong for holepunching,
which receives no such protection from i_size or next_index, and is left
vulnerable to races with shmem_unuse, shmem_getpage and shmem_writepage.
Hold info->lock throughout when holepunching? No, any user could prevent
rescheduling for far too long. Instead take info->lock just when needed:
in shmem_free_swp when removing the swap entries, and whenever removing
a directory page from the level above. But so long as we remove before
scanning, we can safely skip taking the lock at the lower levels, except
at misaligned start and end of the hole.
holepunch: fix shmem_truncate_range punching too far
Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered
in rare circumstances, because shmem_truncate_range() erroneously
removes partially truncated directory pages at the end of the range:
later reclaim on pages pointing to these removed directories triggers
the BUG. Indeed, and it can also cause data loss beyond the hole.
Fix this as in the patch proposed by Miklos, but distinguish between
"limit" (how far we need to search: ignore truncation's next_index
optimization in the holepunch case - if there are races it's more
consistent to act on the whole range specified) and "upper_limit"
(how far we can free directory pages: generally we must be careful
to keep partially punched pages, but can relax at end of file -
i_size being held stable by i_mutex).
Avi Kivity [Sun, 22 Apr 2007 09:28:49 +0000 (12:28 +0300)]
KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram
PAGE_MASK is an unsigned long, so using it to mask physical addresses on
i386 (which are 64-bit wide) leads to truncation. This can result in
page->private of unrelated memory pages being modified, with disasterous
results.
Fix by not using PAGE_MASK for physical addresses; instead calculate
the correct value directly from PAGE_SIZE. Also fix a similar BUG_ON().
Avi Kivity [Sun, 22 Apr 2007 09:28:05 +0000 (12:28 +0300)]
KVM: MMU: Fix guest writes to nonpae pde
KVM shadow page tables are always in pae mode, regardless of the guest
setting. This means that a guest pde (mapping 4MB of memory) is mapped
to two shadow pdes (mapping 2MB each).
When the guest writes to a pte or pde, we intercept the write and emulate it.
We also remove any shadowed mappings corresponding to the write. Since the
mmu did not account for the doubling in the number of pdes, it removed the
wrong entry, resulting in a mismatch between shadow page tables and guest
page tables, followed shortly by guest memory corruption.
This patch fixes the problem by detecting the special case of writing to
a non-pae pde and adjusting the address and number of shadow pdes zapped
accordingly.
This patch removes bogus zeroing of unused bits in output reports,
introduced in Simon's patch in commit d4ae650a.
According to the specification, any sane device should not care
about values of unused bits.
What is worse, the zeroing is done in a way which is broken and
might clear certain bits in output reports which are actually
_used_ - a device that has multiple fields with one value of
the size 1 bit each might serve as an example of why this is
bogus - the second call of hid_output_report() would clear the
first bit of report, which has already been set up previously.
This patch will break LEDs on SpaceNavigator, because this device
is broken and takes into account the bits which it shouldn't touch.
The quirk for this particular device will be provided in a separate
patch.
IB/mthca: Fix data corruption after FMR unmap on Sinai
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different. This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.
Fix by re-applying adjust_key() after masking the key.
Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[IPV4] nl_fib_lookup: Initialise res.r before fib_res_put(&res)
When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup()
needs to initialize the res.r field before fib_res_put(&res) - unlike
fib_lookup(), a direct call to ->tb_lookup does not set this field.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A security issue is emerging. Disallow Routing Header Type 0 by default
as we have been doing for IPv4.
Note: We allow RH2 by default because it is harmless.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel,
which resulted in infinite recursion and stack overflow.
The bug is present in all kernel versions since the feature appeared.
The patch also makes some minimal cleanup:
1. Return something consistent (-ENOENT) when fib table is missing
2. Do not crash when queue is empty (does not happen, but yet)
3. Put result of lookup
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>