Alexey Dobriyan [Sat, 17 Mar 2007 01:32:09 +0000 (18:32 -0700)]
Copy over mac_len when cloning an skb
[NET]: Copy mac_len in skb_clone() as well
ANK says: "It is rarely used, that's wy it was not noticed.
But in the places, where it is used, it should be disaster."
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The ipv6_fl_socklist from listening socket is inadvertently shared
with new socket created for connection. This leads to a variety of
interesting, but fatal, bugs. For example, removing one of the
sockets may lead to the other socket's encountering a page fault
when the now freed list is referenced.
The fix is to not share the flow label list with the new socket.
Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Robert Olsson [Sat, 17 Mar 2007 01:30:13 +0000 (18:30 -0700)]
Fix GFP_KERNEL with preemption disabled in fib_trie
[IPV4]: Do not disable preemption in trie_leaf_remove().
Hello, Just discussed this Patrick...
We have two users of trie_leaf_remove, fn_trie_flush and fn_trie_delete
both are holding RTNL. So there shouldn't be need for this preempt stuff.
This is assumed to a leftover from an older RCU-take.
> Mhh .. I think I just remembered something - me incorrectly suggesting
> to add it there while we were talking about this at OLS :) IIRC the
> idea was to make sure tnode_free (which at that time didn't use
> call_rcu) wouldn't free memory while still in use in a rcu read-side
> critical section. It should have been synchronize_rcu of course,
> but with tnode_free using call_rcu it seems to be completely
> unnecessary. So I guess we can simply remove it.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Joy Latten [Sat, 17 Mar 2007 01:27:51 +0000 (18:27 -0700)]
Fix extraneous IPSEC larval SA creation
[XFRM]: Fix missing protocol comparison of larval SAs.
I noticed that in xfrm_state_add we look for the larval SA in a few
places without checking for protocol match. So when using both
AH and ESP, whichever one gets added first, deletes the larval SA.
It seems AH always gets added first and ESP is always the larval
SA's protocol since the xfrm->tmpl has it first. Thus causing the
additional km_query()
Adding the check eliminates accidental double SA creation.
Signed-off-by: Joy Latten <latten@austin.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Fri, 16 Mar 2007 21:38:20 +0000 (13:38 -0800)]
hrtimer: prevent overrun DoS in hrtimer_forward()
hrtimer_forward() does not check for the possible overflow of
timer->expires. This can happen on 64 bit machines with large interval
values and results currently in an endless loop in the softirq because the
expiry value becomes negative and therefor the timer is expired all the
time.
Check for this condition and set the expiry value to the max. expiry time
in the future. The fix should be applied to stable kernel series as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Trond Myklebust [Fri, 16 Mar 2007 21:38:28 +0000 (13:38 -0800)]
nfs: nfs_getattr() can't call nfs_sync_mapping_range() for non-regular files
Looks like we need a check in nfs_getattr() for a regular file. It makes
no sense to call nfs_sync_mapping_range() on anything else. I think that
should fix your problem: it will stop the NFS client from interfering
with dirty pages on that inode's mapping.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Fri, 16 Mar 2007 13:34:29 +0000 (09:34 -0400)]
EHCI: add delay to bus_resume before accessing ports
This patch (as870) adds a delay to ehci-hcd's bus_resume routine.
Apparently there are controllers and/or BIOSes out there which need
such a delay to get the ports back into their correct state. This
fixes Bugzilla #8190.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jan Beulich [Tue, 13 Mar 2007 18:04:11 +0000 (14:04 -0400)]
adjust legacy IDE resource setting (v2)
adjust legacy IDE resource setting (v2)
The change to force legacy mode IDE channels' resources to fixed non-zero
values confuses (at least some versions of) X, because the values reported
by the kernel and those readable from PCI config space aren't consistent
anymore. Therefore, this patch arranges for the respective BARs to also
get updated if possible.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
IA64: fix NULL pointer in ia64/irq_chip-mask/unmask function
[IA64] fix NULL pointer in ia64/irq_chip-mask/unmask function
This patch fixes boot failure because irq_desc->mask() is NULL.
- Added mask/unmask functions to ia64's irq desc function table.
- rename hw_interrupt_type to irq_chip. hw_interrupt_type is old name.
- Tony: Added same change to arch/ia64/sn/kernel/irq.c as pointed out
by Eric Biederman ... mask/unmask functions there can be no-op.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Moore [Mon, 12 Mar 2007 14:33:12 +0000 (09:33 -0500)]
NetLabel: Verify sensitivity level has a valid CIPSO mapping
The current CIPSO engine has a problem where it does not verify that the given
sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is
used. The end result is that bad packets are sent on the wire which should
have never been sent in the first place. This patch corrects this problem by
verifying the sensitivity level mapping similar to what is done with the
category mapping. This patch also changes the returned error code in this case
to -EPERM to better match what the category mapping verification code returns.
Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Joerg Dorchain [Tue, 6 Mar 2007 10:46:54 +0000 (02:46 -0800)]
gdth: fix oops in gdth_copy_cmd()
Recent alterations to the gdth_fill_raw_cmd() path no longer set the
sg_ranz field for zero transfer commands. However, this field is used
lower down in the function to initialise ha->cmd_len to the size of
the firmware packet. If this uninitialised field contains a bogus
value, ha->cmd_len can become much larger than the actual firmware
packet and end up oopsing in gdth_copy_cmd() as it tries to copy this
huge packet to the device (usually because it runs into an unallocated
page).
The fix is to initialise the sg_ranz field to zero at the start of
gdth_fill_raw_cmd().
Signed-off-by: Joerg Dorchain <joerg@dorchain.net> Acked-by: "Achim Leubner" <Achim_Leubner@adaptec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Evgeniy Polyakov [Sat, 10 Mar 2007 07:04:42 +0000 (23:04 -0800)]
Fix rtm_to_ifaddr() error return.
[IPV4]: Fix rtm_to_ifaddr() error handling.
Return negative error value (embedded in the pointer) instead of
returning NULL.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Olaf Kirch [Sat, 10 Mar 2007 07:03:53 +0000 (23:03 -0800)]
Fix another NULL pointer deref in ipv6_sockglue.c
[IPV6]: Fix for ipv6_setsockopt NULL dereference
I came across this bug in http://bugzilla.kernel.org/show_bug.cgi?id=8155
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Thu, 8 Mar 2007 02:50:54 +0000 (18:50 -0800)]
Fix UDP header pointer after pskb_trim_rcsum()
[UDP]: Reread uh pointer after pskb_trim
The header may have moved when trimming.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric Dumazet [Thu, 8 Mar 2007 02:48:44 +0000 (18:48 -0800)]
Fix timewait jiffies
[INET]: twcal_jiffie should be unsigned long, not int
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Philipp Reisner [Thu, 8 Mar 2007 02:45:12 +0000 (18:45 -0800)]
Fix callback bug in connector
[CONNECTOR]: Bugfix for cn_call_callback()
When system under heavy stress and must allocate new work
instead of reusing old one, new work must use correct
completion callback.
Patch is based on Philipp's and Lars' work.
I only cleaned small stuff (and removed spaces instead of tabs).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Rainer Weikusat [Wed, 3 Jan 2007 14:36:25 +0000 (15:36 +0100)]
fix for bugzilla #7544 (keyspan USB-to-serial converter)
At least the Keyspan USA-19HS USB-to-serial converter supports
two different configurations, one where the input endpoints
have interrupt transfer type and one where they are bulk endpoints.
The default UHCI configuration uses the interrupt input endpoints.
The keyspan driver, OTOH, assumes that the device has only bulk
endpoints (all URBs are initialized by calling usb_fill_bulk_urb
in keyspan.c/ keyspan_setup_urb). This causes the interval field
of the input URBs to have a value of zero instead of one, which
'accidentally' worked with Linux at least up to 2.6.17.11 but
stopped to with 2.6.18, which changed the UHCI support code handling
URBs for interrupt endpoints. The patch below modifies to driver to
initialize its input URBs either as interrupt or as bulk URBs,
depending on the transfertype contained in the associated endpoint
descriptor (only tested with the default configuration) enabling
the driver to again receive data from the serial converter.
Johannes Berg [Thu, 8 Mar 2007 02:42:52 +0000 (18:42 -0800)]
Fix compat_getsockopt
[NET]: Fix compat_sock_common_getsockopt typo.
This patch fixes a typo in compat_sock_common_getsockopt.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Larry Finger [Wed, 7 Mar 2007 18:05:58 +0000 (13:05 -0500)]
bcm43xx: Fix problem with >1 GB RAM
Some versions of the bcm43xx chips only support 30-bit DMA, which means
that the descriptors and buffers must be in the first 1 GB of RAM. On
the i386 and x86_64 architectures with more than 1 GB RAM, an incorrect
assignment may occur. This patch ensures that the various DMA addresses
are within the capability of the chip. Testing has been limited to x86_64
as no one has an i386 system with more than 1 GB RAM.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Douglas Gilbert [Wed, 7 Mar 2007 19:33:38 +0000 (14:33 -0500)]
Fix bug 7994 sleeping function called from invalid context
- addresses the reported bug (with GFP_KERNEL -> GFP_ATOMIC)
- improves error checking, and
- is a subset of the changes to scsi_debug in lk 2.6.21-rc*
Compiled and lightly tested (in lk 2.6.21-rc2 environment).
Signed-off-by: Douglas Gilbert <dougg@torque.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Wed, 7 Mar 2007 21:34:45 +0000 (22:34 +0100)]
nfnetlink_log: fix crash on bridged packet
[NETFILTER]: nfnetlink_log: fix crash on bridged packet
physoutdev is only set on purely bridged packet, when nfnetlink_log is used
in the OUTPUT/FORWARD/POSTROUTING hooks on packets forwarded from or to a
bridge it crashes when trying to dereference skb->nf_bridge->physoutdev.
Reported by Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Wed, 7 Mar 2007 21:34:42 +0000 (22:34 +0100)]
nf_conntrack: fix incorrect classification of IPv6 fragments as ESTABLISHED
[NETFILTER]: nf_conntrack: fix incorrect classification of IPv6 fragments as ESTABLISHED
The individual fragments of a packet reassembled by conntrack have the
conntrack reference from the reassembled packet attached, but nfctinfo
is not copied. This leaves it initialized to 0, which unfortunately is
the value of IP_CT_ESTABLISHED.
The result is that all IPv6 fragments are tracked as ESTABLISHED,
allowing them to bypass a usual ruleset which accepts ESTABLISHED
packets early.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK,
but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or
CONFIG_NF_CONNTRACK_NETLINK for ifdefs.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:
- unconfirmed entries can not be killed manually, they are removed on
confirmation or final destruction of the conntrack entry, which means
we might iterate forever without making forward progress.
This can happen in combination with the conntrack event cache, which
holds a reference to the conntrack entry, which is only released when
the packet makes it all the way through the stack or a different
packet is handled.
- taking references to an unconfirmed entry and using it outside the
locked section doesn't work, the list entries are not refcounted and
another CPU might already be waiting to destroy the entry
What the code really wants to do is make sure the references of the hash
table to the selected conntrack entries are released, so they will be
destroyed once all references from skbs and the event cache are dropped.
Since unconfirmed entries haven't even entered the hash yet, simply mark
them as dying and skip confirmation based on that.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
x86-64: survive having no irq mapping for a vector
Occasionally the kernel has bugs that result in no irq being found for a
given cpu vector. If we acknowledge the irq the system has a good chance
of continuing even though we dropped an irq message. If we continue to
simply print a message and not acknowledge the irq the system is likely to
become non-responsive shortly there after.
AK: Fixed compilation for UP kernels
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: "Luigi Genoni" <luigi.genoni@pirelli.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Marcel Holtmann [Wed, 7 Mar 2007 18:22:40 +0000 (13:22 -0500)]
Fix buffer overflow in Omnikey CardMan 4040 driver (CVE-2007-0005)
Based on a patch from Don Howard <dhoward@redhat.com>
When calling write() with a buffer larger than 512 bytes, the
driver's write buffer overflows, allowing to overwrite the EIP and
execute arbitrary code with kernel privileges.
In read(), there exists a similar problem, but coming from the device.
A malicous or buggy device sending more than 512 bytes can overflow
of the driver's read buffer, with the same effects as above.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
On 2/28/07, KOVACS Krisztian <hidden@balabit.hu> wrote:
>
> Hi,
>
> While reading TCP minisock code I've found this suspiciously looking
> code fragment:
>
> - 8< -
> struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb)
> {
> struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);
>
> if (newsk != NULL) {
> const struct inet_request_sock *ireq = inet_rsk(req);
> struct tcp_request_sock *treq = tcp_rsk(req);
> struct inet_connection_sock *newicsk = inet_csk(sk);
> struct tcp_sock *newtp;
> - 8< -
>
> The above code initializes newicsk to inet_csk(sk), isn't that supposed
> to be inet_csk(newsk)? As far as I can tell this might leave
> icsk_ack.last_seg_size zero even if we do have received data.
Good catch!
David, please apply the attached patch.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Zhang, Yanmin [Thu, 15 Feb 2007 07:37:03 +0000 (23:37 -0800)]
ATA: convert GSI to irq on ia64
If an ATA drive uses legacy mode, ata driver will choose 14 and 15 as the
fixed irq number. On ia64 platform, such numbers are GSI and should be
converted to irq vector.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Mon, 5 Mar 2007 23:53:45 +0000 (15:53 -0800)]
video/aty/mach64_ct.c: fix bogus delay loop
CT based mach64 cards were reported to hang on sparc64 boxes when
compiled with gcc-4.1.x and later.
Looking at this piece of code, it's no surprise. A critical
delay was implemented as an empty for() loop, and gcc 4.0.x
and previous did not optimize it away, so we did get a delay.
But gcc-4.1.x and later can optimize it away, and we get crashes.
Use a real udelay() to fix this. Fix verified on SunBlade100.
Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7628b0a8c01a02966d2228bdf741ddedb128e8f8 (drivers/net/tulip/dmfe:
support basic carrier detection) breaks networking on my Davicom DM9009.
ethtool always reports there is no link. tcpdump shows incoming packets,
but TX is disabled. Reverting the above patch fixes the problem.
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Valerie Henson <val_henson@linux.intel.com> Cc: Thomas Bachler <thomas@archlinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andrew Morton [Thu, 1 Mar 2007 04:13:21 +0000 (20:13 -0800)]
throttle_vm_writeout(): don't loop on GFP_NOFS and GFP_NOIO allocations
throttle_vm_writeout() is designed to wait for the dirty levels to subside.
But if the caller holds IO or FS locks, we might be holding up that writeout.
So change it to take a single nap to give other devices a chance to clean some
memory, then return.
Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Pete Zaitcev <zaitcev@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sam Ravnborg [Thu, 1 Mar 2007 04:12:31 +0000 (20:12 -0800)]
fix section mismatch warning in lockdep
lockdep_init() is marked __init but used in several places
outside __init code. This causes following warnings:
$ scripts/mod/modpost kernel/lockdep.o
WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.lockdep_init_map after 'lockdep_init_map' (at offset 0x105)
WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.lockdep_reset_lock after 'lockdep_reset_lock' (at offset 0x35)
WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.__lock_acquire after '__lock_acquire' (at offset 0xb2)
The warnings are less obviously due to heavy inlining by gcc - this is not
altered.
Fix the section mismatch warnings by removing the __init marking, which
seems obviously wrong.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In file included from arch/s390/kernel/early.c:13:
include/linux/lockdep.h:300: warning:
"struct task_struct" declared inside parameter list
include/linux/lockdep.h:300:
warning: its scope is only this definition or
declaration, which is probably not what you want
Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nick Piggin [Sat, 10 Feb 2007 09:46:22 +0000 (01:46 -0800)]
buffer: memorder fix
unlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.
Mingming later sent the same patch, saying:
We are running SDET benchmark and saw double free issue for ext3 extended
attributes block, which complains the same xattr block already being freed (in
ext3_xattr_release_block()). The problem could also been triggered by
multiple threads loop untar/rm a kernel tree.
The race is caused by missing a memory barrier at unlock_buffer() before the
lock bit being cleared, resulting in possible concurrent h_refcounter update.
That causes a reference counter leak, then later leads to the double free that
we have seen.
Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
bit is being cleared, however, there is no memory barrier *before* the bit is
cleared. On some arch the h_refcount update instruction and the clear bit
instruction could be reordered, thus leave the critical section re-entered.
The race is like this: For example, if the h_refcount is initialized as 1,
kernel/time/clocksource.c needs struct task_struct on m68k
kernel/time/clocksource.c needs struct task_struct on m68k.
Because it uses spin_unlock_irq(), which, on m68k, uses hardirq_count(), which
uses preempt_count(), which needs to dereference struct task_struct, we
have to include sched.h. Because it would cause a loop inclusion, we
cannot include sched.h in any other of asm-m68k/system.h,
linux/thread_info.h, linux/hardirq.h, which leaves this ugly include in
a C file as the only simple solution.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ken Chen [Thu, 8 Feb 2007 22:20:27 +0000 (14:20 -0800)]
hugetlb: preserve hugetlb pte dirty state
__unmap_hugepage_range() is buggy that it does not preserve dirty state of
huge_pte when unmapping hugepage range. It causes data corruption in the
event of dop_caches being used by sys admin. For example, an application
creates a hugetlb file, modify pages, then unmap it. While leaving the
hugetlb file alive, comes along sys admin doing a "echo 3 >
/proc/sys/vm/drop_caches".
drop_pagecache_sb() will happily free all pages that aren't marked dirty if
there are no active mapping. Later when application remaps the hugetlb
file back and all data are gone, triggering catastrophic flip over on
application.
Not only that, the internal resv_huge_pages count will also get all messed
up. Fix it up by marking page dirty appropriately.
Signed-off-by: Ken Chen <kenchen@google.com> Cc: "Nish Aravamudan" <nish.aravamudan@gmail.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As macbook/macbook pro's also have to live with a single mouse button the
following patch just enables the Macintosh device drivers menu in Kconfig +
adds the macintosh dir to the obj-* to make macbook* users happy (who use
exactly that since months....
Signed-off-by: Soeren Sonnenburg <kernel@nn7.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a fix of regression, which triggered by ~2.6.16.
Patch with name ufs-directory-and-page-cache-from-blocks-to-pages.patch: in
additional to conversation from block to page cache mechanism added new
checks of directory integrity, one of them that directory entry do not
across directory chunks.
But some kinds of UFS: OpenStep UFS and Apple UFS (looks like these are the
same filesystems) have different directory chunk size, then common
UFSes(BSD and Solaris UFS).
So this patch adds ability to works with variable size of directory chunks,
and set it for ufstype=openstep to right size.
Magnus Damm [Tue, 6 Feb 2007 00:20:09 +0000 (16:20 -0800)]
kexec: Fix CONFIG_SMP=n compilation V2 (ia64)
Kexec support for 2.6.20 on ia64 does not build properly using a config
made up by CONFIG_SMP=n and CONFIG_HOTPLUG_CPU=n:
CC arch/ia64/kernel/machine_kexec.o
arch/ia64/kernel/machine_kexec.c: In function `machine_shutdown':
arch/ia64/kernel/machine_kexec.c:77: warning: implicit declaration of function `cpu_down'
AS arch/ia64/kernel/relocate_kernel.o
CC arch/ia64/kernel/crash.o
arch/ia64/kernel/crash.c: In function `kdump_cpu_freeze':
arch/ia64/kernel/crash.c:139: warning: implicit declaration of function `ia64_jump_to_sal'
arch/ia64/kernel/crash.c:139: error: `sal_boot_rendez_state' undeclared (first use in this function)
arch/ia64/kernel/crash.c:139: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/crash.c:139: error: for each function it appears in.)
arch/ia64/kernel/crash.c: At top level:
arch/ia64/kernel/crash.c:84: warning: 'kdump_wait_cpu_freeze' defined but not used
make[1]: *** [arch/ia64/kernel/crash.o] Error 1
make: *** [arch/ia64/kernel] Error 2
Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Acked-by: Simon Horman <horms@verge.net.au> Acked-by: Jay Lan <jlan@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The commit was buggy in multiple ways:
- the conversion to ilog2() was incorrect to begin with
- it tested the wrong #defines, so on all architectures but FRV you'd
never see the bug except for constant arguments.
- the new "get_order()" macro used its arguments multiple times, and
didn't even parenthesize them properly
- despite the comments, it was not true that you could use it for
constant initializers, since not all architectures even use the
generic page.h header file.
All of the problems are individually fixable, but it all boils down to:
better just revert it, and re-do it from scratch.
Cc: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Andrew Morton <akpm@osdl.org> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Renninger [Thu, 22 Feb 2007 12:52:40 +0000 (13:52 +0100)]
Backport of psmouse suspend/shutdown cleanups
This patch works back to 2.6.17 (earlier kernels seem to
need up/down operations on mutex/semaphore).
psmouse - properly reset mouse on shutdown/suspend
Some people report that they need psmouse module unloaded
for suspend to ram/disk to work properly. Let's make port
cleanup behave the same way as driver unload.
This fixes "bad state" problem on various HP laptops, such
as nx7400.
Ingo Molnar [Thu, 1 Mar 2007 23:58:51 +0000 (18:58 -0500)]
sched: fix SMT scheduler bug
The SMT scheduler incorrectly skips kernel threads even if they are
runnable (but they are preempted by a higher-prio user-space task which got
SMT-delayed by an even higher-priority task running on a sibling CPU).
Fix this for now by only doing the SMT-nice optimization if the
to-be-delayed task is the only runnable task. (This should cover most of
the real-life cases anyway.)
This bug has been in the SMT scheduler since 2.6.17 or so, but has only
been noticed now by the active check in the dynticks code.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tty_io: fix race in master pty close/slave pty close path
This patch fixes a possible race that leads to double freeing an idr index.
When the master begin to close, release_dev() is called and then
pty_close() is called:
if (tty->driver->close)
tty->driver->close(tty, filp);
This is done without helding any locks other than BKL. Inside pty_close(),
being a master close, the devpts entry will be removed:
#ifdef CONFIG_UNIX98_PTYS
if (tty->driver == ptm_driver)
devpts_pty_kill(tty->index);
#endif
But devpts_pty_kill() will call get_node() that may sleep while waiting for
&devpts_root->d_inode->i_sem. When this happens and the slave is being
opened, tty_open() just found the driver and index:
This part of the code is already protected under tty_mute. The problem is
that the slave close already got an index. Then init_dev() is called and
blocks waiting for the same &devpts_root->d_inode->i_sem.
When the master close resumes, it removes the devpts entry, and the
relation between idr index and the tty is gone. The master then sleeps
waiting for the tty_mutex on release_dev().
Slave open resumes and found no tty for that index. As result, a NULL tty
is returned and init_dev() doesn't flow to fast_track:
/* check whether we're reopening an existing tty */
if (driver->flags & TTY_DRIVER_DEVPTS_MEM) {
tty = devpts_get_tty(idx);
if (tty && driver->subtype == PTY_TYPE_MASTER)
tty = tty->link;
} else {
tty = driver->ttys[idx];
}
if (tty) goto fast_track;
The result of this, is that a new tty will be created and init_dev() returns
sucessfull. After returning, tty_mutex is dropped and master close may resume.
Master close finds it's the only use and both sides are closing, then releases
the tty and the index. At this point, the idr index is free, but slave still
has it.
Slave open then calls pty_open() and finds that tty->link->count is 0,
because there's no master and returns error. Then tty_open() calls
release_dev() which executes without any warning, as it was a case of last
slave close when the master is already closed (master->count == 0,
slave->count == 1). The tty is then released with the already released idr
index.
This normally would only issue a warning on idr_remove() but in case of a
customer's critical application, it's never too simple:
thread1: opens master, gets index X
thread1: begin closing master
thread2: begin opening slave with index X
thread1: finishes closing master, index X released
thread3: opens master, gets index X, just released
thread2: fails opening slave, releases index X <----
thread4: opens master, gets index X, init_dev() then find an already in use
and healthy tty and fails
If no more indexes are released, ptmx_open() will keep failing, as the
first free index available is X, and it will make init_dev() fail because
you're trying to "reopen a master" which isn't valid.
The patch notices when this race happens and make init_dev() fail
imediately. The init_dev() function is called with tty_mutex held, so it's
safe to continue with tty till the end of function because release_dev()
won't make any further changes without grabbing the tty_mutex.
Without the patch, on some machines it's possible get easily idr warnings
like this one:
idr_remove called for id=15 which is not allocated.
[<c02555b9>] idr_remove+0x139/0x170
[<c02a1b62>] release_mem+0x182/0x230
[<c02a28e7>] release_dev+0x4b7/0x700
[<c02a0ea7>] tty_ldisc_enable+0x27/0x30
[<c02a1e64>] init_dev+0x254/0x580
[<c02a0d64>] check_tty_count+0x14/0xb0
[<c02a4f05>] tty_open+0x1c5/0x340
[<c02a4d40>] tty_open+0x0/0x340
[<c017388f>] chrdev_open+0xaf/0x180
[<c017c2ac>] open_namei+0x8c/0x760
[<c01737e0>] chrdev_open+0x0/0x180
[<c0167bc9>] __dentry_open+0xc9/0x210
[<c0167e2c>] do_filp_open+0x5c/0x70
[<c0167a91>] get_unused_fd+0x61/0xd0
[<c0167e93>] do_sys_open+0x53/0x100
[<c0167f97>] sys_open+0x27/0x30
[<c010303b>] syscall_call+0x7/0xb
using this test application available on:
http://www.ruivo.org/~aris/pty_sodomizer.c
Signed-off-by: Aristeu Sergio Rozanski Filho <aris@ruivo.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Neil Brown [Fri, 9 Mar 2007 18:50:27 +0000 (10:50 -0800)]
export blk_recount_segments
On Monday February 12, marcm@liquid-nexus.net wrote:
> >
> > Thanks for the quick response Neil unfortunately the kernel doesn't build with
> > this patch due to a missing symbol:
> >
> > WARNING: "blk_recount_segments" [drivers/md/raid456.ko] undefined!
> >
> > Is that in another file that needs patching or within raid5.c?
Yes. I keep forgetting about that bit. Sorry.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reading /proc/net/anycast6 when there is no anycast address
on an interface results in an ever-increasing inet6_dev reference
count, as well as a reference to the netdevice you can't get rid of.
From: David Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michal Wrobel [Tue, 27 Feb 2007 19:12:45 +0000 (11:12 -0800)]
Don't add anycast reference to device multiple times
[IPV6]: anycast refcnt fix
This patch fixes a bug in Linux IPv6 stack which caused anycast address
to be added to a device prior DAD has been completed. This led to
incorrect reference count which resulted in infinite wait for
unregister_netdevice completion on interface removal.
Signed-off-by: Michal Wrobel <xmxwx@asn.pl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[XFRM_TUNNEL]: Reload header pointer after pskb_may_pull/pskb_expand_head
Please consider applying, this was found on your latest
net-2.6 tree while playing around with that ip_hdr() + turn
skb->nh/h/mac pointers as offsets on 64 bits idea :-)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Tue, 27 Feb 2007 19:01:38 +0000 (11:01 -0800)]
Fix interrupt probing on E450 sparc64 systems
[SPARC64]: Fix PCI interrupts on E450 et al.
When the PCI controller OBP node lacks an interrupt-map
and interrupt-map-mask property, we need to form the
INO by hand. The PCI swizzle logic was not doing that
properly.
This was a regression added by the of_device code.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jiri Kosina [Thu, 1 Mar 2007 11:02:52 +0000 (12:02 +0100)]
HID: fix possible double-free on error path in hid parser
HID: fix possible double-free on error path in hid parser
Freeing of device->collection is properly done in hid_free_device() (as
this function is supposed to free all the device resources and could be
called from transport specific code, e.g. usb_hid_configure()).
Remove all kfree() calls preceeding the hid_free_device() call.
Livio Soares [Thu, 22 Feb 2007 05:13:17 +0000 (16:13 +1100)]
POWERPC: Fix performance monitor exception
To the issue: some point during 2.6.20 development, Paul Mackerras
introduced the "lazy IRQ disabling" patch (very cool work, BTW).
In that patch, the performance monitor unit exception was marked as
"maskable", in the sense that if interrupts were soft-disabled, that
exception could be ignored. This broke my PowerPC profiling code.
The symptom that I see is that a varying number of interrupts
(from 0 to $n$, typically closer to 0) get delivered, when, in
reality, it should always be very close to $n$.
The issue stems from the way masking is being done. Masking in
this fashion seems to work well with the decrementer and external
interrupts, because they are raised again until "really" handled.
For the PMU, however, this does not apply (at least on my Xserver
machine with a 970FX processor). If the PMU exception is not handled,
it will _not_ be re-raised (at least on my machine). The documentation
states that the PMXE bit in MMCR0 is set to 0 when the PMU exception
is raised. However, software must re-set the bit to re-enable PMU
exceptions. If the exception is ignored (as currently) not only is
that interrupt lost, but because software does not re-set PMXE, the
PMU registers are "frozen" forever.
[This patch means that performance monitor exceptions are taken and
handled even if irqs are off, as long as some other interrupt hasn't
come along and caused interrupts to be hard-disabled. In this sense
the PMU exception becomes like an NMI. The oprofile code for most
powerpc processors does nothing that is unsafe in an NMI context, but
the Cell oprofile code does a spin_lock_irqsave. However, that turns
out to be OK because Cell doesn't actually use the performance
monitor exception; performance monitor interrupts come in as a
regular interrupt on Cell, so will be disabled when irqs are off.
-- paulus.]
This restores the previous behaviour for these devices by ensuring that when
the voltage is changed, only one write to set the voltage is performed.
It may be that both writes are needed if the voltage is being changed between
two non-zero values or that it's safe to ensure that only one write is done
if the hardware only supports one voltage; I don't know whether either is the
case nor can I test since I have only the one SD reader (1524:0550), and it
supports just the one voltage.
Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Dike [Thu, 22 Feb 2007 16:48:38 +0000 (11:48 -0500)]
UML - Fix 2.6.20 hang
A previous cleanup misused need_poll, which had a fairly broken
interface. It implemented a growable array, changing the used
elements count itself, but leaving it up to the caller to fill in the
actual elements, including the entire array if the array had to be
reallocated. This worked because the previous users were switching
between two such structures, and the elements were copied from the
inactive array to the active array after making sure the active array
had enough room.
maybe_sigio_broken was made to use need_poll, but it was operating on
a single array, so when the buffer was reallocated, the previous
contents were lost.
This patch makes need_poll implement more sane semantics. It merely
assures that the array is of the proper size and that the contents are
preserved. It is up to the caller to adjust the used elements count
and to ensure that the proper elements are resent.
This manifested itself as a hang in 2.6.20 as the uninitialized buffer
convinced UML that one of its own file descriptors didn't support
SIGIO and needed to be watched by poll in a separate thread. The
result was an interrupt flood as control traffic over this descriptor
sparked interrupts, which resulted in more control traffic, ad nauseum.
Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Hugh Dickins [Fri, 23 Feb 2007 21:53:49 +0000 (21:53 +0000)]
fix umask when noACL kernel meets extN tuned for ACLs
Fix insecure default behaviour reported by Tigran Aivazian: if an ext2
or ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by
a kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.
This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.
We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured. We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).
Likewise don't set the XATTR_USER flag when built without XATTR support.
Tejun Heo [Sat, 24 Feb 2007 13:30:36 +0000 (22:30 +0900)]
sata_sil: ignore and clear spurious IRQs while executing commands by polling
sata_sil used to trigger HSM error if IRQ occurs during polling
command. This didn't matter because polling wasn't used in sata_sil.
However, as of 2.6.20, all IDENTIFYs are performed by polling and
device detection sometimes fails due to spurious IRQ. This patch
makes sata_sil ignore and clear spurious IRQ while executing commands
by polling.
This fixes bug#7996 and IMHO should also be included in -stable.
Stefan Seyfried [Sat, 24 Feb 2007 22:06:43 +0000 (23:06 +0100)]
swsusp: Fix possible oops in userland interface
Fix the Oops occuring when SNAPSHOT_PMOPS or SNAPSHOT_S2RAM ioctl is called on
a system without pm_ops defined (eg. a non-ACPI kernel on x86 PC).
Signed-off-by: Stefan Seyfried <seife@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Thu, 22 Feb 2007 00:33:29 +0000 (01:33 +0100)]
Fix posix-cpu-timer breakage caused by stale p->last_ran value
Problem description at:
http://bugzilla.kernel.org/show_bug.cgi?id=8048
Commit b18ec80396834497933d77b81ec0918519f4e2a7
[PATCH] sched: improve migration accuracy
optimized the scheduler time calculations, but broke posix-cpu-timers.
The problem is that the p->last_ran value is not updated after a context
switch. So a subsequent call to current_sched_time() calculates with a
stale p->last_ran value, i.e. accounts the full time, which the task was
scheduled away.
Michael Krufky [Sat, 3 Mar 2007 14:36:15 +0000 (09:36 -0500)]
V4L: cx88-blackbird: allow usage of 376836 and 262144 sized firmware images
This updates the cx88-blackbird driver to be able to use the new cx23416
firmware image released by Hauppauge Computer Works, while retaining
compatibility with the older firmware images.
cx2341x firmware can be downloaded at: http://dl.ivtvdriver.org/ivtv/firmware/
Hans Verkuil [Thu, 15 Feb 2007 06:40:34 +0000 (03:40 -0300)]
V4L: fix cx25840 firmware loading
Due to changes in the i2c handling in 2.6.20 this cx25840 bug surfaced,
causing the firmware load to fail for the ivtv driver. The correct
sequence is to first attach the i2c client, then use the client's
device to load the firmware.
Michael Krufky [Sat, 3 Mar 2007 14:36:09 +0000 (09:36 -0500)]
DVB: digitv: open nxt6000 i2c_gate for TDED4 tuner handling
dvb-pll normally opens the i2c gate before attempting to communicate with
the pll, but the code for this device is not using dvb-pll. This should
be cleaned up in the future, but for now, just open the i2c gate at the
appropriate place in order to fix this driver bug.
Rework the cx23416 firmware loader so that it longer requires the
firmware size to be a multiple of 8KB. Until recently all cx2341x
firmware images were exactly 256KB, but newer firmware is larger than
that and also appears to have arbitrary size. We still must check
against a multiple of 4 bytes (because the cx23416 itself uses a 32
bit word size).
This fix is already in the upstream driver source and has proven
itself there; this is a backport for the 2.6.20.y kernel series.
Mike Isely [Sat, 3 Mar 2007 14:35:54 +0000 (09:35 -0500)]
V4L: pvrusb2: Fix video corruption on stream start
This introduces some extra cx23416 commands when streaming is
started. The addition of these commands fix random sporadic video
corruption that can take place when the video stream is temporarily
disrupted through loss of signal (e.g. changing the channel in the RF
tuner).
This fix is already in the upstream driver source and has proven
itself there; this is a backport for the 2.6.20.y kernel series.