David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too. The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.
Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.
Here's the relevant code from kernel/futex.c: (not in order in the file)
[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;
t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
[...]
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;
if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;
switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);
So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.
This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.
I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.
The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters. This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.
Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for. If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Relative hrtimers with a large timeout value might end up as negative
timer values, when the current time is added in hrtimer_start().
This in turn is causing the clockevents_set_next() function to set an
huge timeout and sleep for quite a long time when we have a clock
source which is capable of long sleeps like HPET. With PIT this almost
goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
sorts this out in the next timer interrupt, so we never noticed that
problem which has been there since the first day of hrtimers.
This bug became more apparent in 2.6.24 which activates HPET on more
hardware.
[LIB] crc32c: Keep intermediate crc state in cpu order
crypto/crc32.c:chksum_final() is computing the digest as
*(__le32 *)out = ~cpu_to_le32(mctx->crc);
so the low-level crc32c_le routines should just keep
the crc in cpu order, otherwise it is getting swabbed
one too many times on big-endian machines.
Li Zefan [Wed, 28 Nov 2007 08:56:27 +0000 (09:56 +0100)]
nf_nat: fix memset error
This patch fixes an incorrect memset in the NAT code, causing
misbehaviour when unloading and reloading the NAT module.
Applies to stable-2.6.22 and stable-2.6.23.
The size passing to memset is the size of a pointer. Fixes
misbehaviour when unloading and reloading the NAT module.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.
A recent patch added software synchronization during EHCI startup,
so ports aren't switched away from the companion controllers after
resets have started. This patch adds a short delay letting hardware
finish that port switching before any new resets begin ... so both
ends of that hardware race window are closed.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Dave Miller <davem@davemloft.net> Cc: Dely Sy <dely.l.sy@intel.com> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In wait_task_stopped() exit_code already contains the right value for the
si_status member of siginfo, and this is simply set in the non WNOWAIT
case.
If you call waitid() with a stopped or traced process, you'll get the signal
in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the
same time, you'll get the signal << 8 | 0x7f
Pass it unchanged to wait_noreap_copyout(); we would only need to shift it
and add 0x7f if we were returning it in the user status field and that
isn't used for any function that permits WNOWAIT.
Signed-off-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping
them dirty all the time.
It turns out that there is a case, where the VM makes a ramdisk page
clean, without telling the ramdisk driver. On memory pressure
shrink_zone runs and it starts to run shrink_active_list. There is a
check for buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no
releasepage callback, try_to_free_buffers is called. try_to_free_buffers
has now a special logic for some file systems to make a dirty page
clean, if all buffers are clean. Thats what happened in our test case.
The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Nick Piggin <npiggin@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The L1 network chip can DMA to 64-bit addresses, but multiple descriptor
rings share a single register for the high 32 bits of their address, so
only a single, aligned, 4 GB physical address range can be used at a time.
As a result, we need to confine the driver to a 32-bit DMA mask, otherwise
we see occasional data corruption errors in systems containing 4 or more
gigabytes of RAM.
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com> Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net> Acked-by: Chris Snook <csnook@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Recent (i.e. 2005 and later) Sony Vaio laptops have names beginning
with VGN rather than PCG. Update the eeprom driver so that it
recognizes these.
Why this matters: the eeprom driver hides private data from the
EEPROMs it recognizes as Vaio EEPROMs (passwords, serial number...) so
if the driver fails to recognize a Vaio EEPROM as such, the private
data is exposed to the world.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The sysfs interface to DMI data takes care to not make the system
serial number and UUID world-readable, presumably due to privacy
concerns. For consistency, we should not let the eeprom driver
export these same strings to the world on Sony Vaio laptops.
Instead, only make them readable by root, as we already do for BIOS
passwords.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On file systems which don't support sparse files, Ocfs2_map_page_blocks()
was reading blocks on appending writes. This caused write performance to
suffer dramatically. Fix this by detecting an appending write on a nonsparse
fs and skipping the read.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The patch
- Includes the call to capilib_data_b3_req in the spinlock. This routine
in turn calls the offending mq_enqueue routine that triggered the
freeze if not locked. This should also fix other indicators of
incosistent capilib_msgidqueue list, that trigger messages like:
Oct 5 03:05:57 BERL0 kernel: kcapi: msgid 3019 ncci 0x30301 not on queue
that we saw several times a day (usually several in a row).
- Fixes all occurrences of c4_dispatch_tx to be called with active
spinlock, there were some instances where no lock was active. Mostly
these are in very infrequently called routines, so the additional
performance penalty is minimal.
This patch (as999) fixes a problem that sometimes shows up when host
controller driver modules are loaded in the wrong order. If ehci-hcd
happens to initialize an EHCI controller while the companion OHCI or
UHCI controller is in the middle of a port reset, the reset can fail
and the companion may get very confused. The patch adds an
rw-semaphore and uses it to keep EHCI initialization and port resets
mutually exclusive.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <david-b@pacbell.net> Cc: David Miller <davem@davemloft.net> Cc: Dely L Sy <dely.l.sy@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
USB: usbserial - fix potential deadlock between write() and IRQ
usb_serial_generic_write() doesn't disable interrupts when taking port->lock,
and could therefore deadlock with usb_serial_generic_read_bulk_callback()
being called from interrupt, taking the same lock. Fix it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Larry Finger <larry.finger@lwfinger.net> Cc: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No hardware but this driver is currently totally broken so we can't make
it much worse. Remove all tbe broken invalid termios handling and replace
it with a proper set_termios method.
Frank's comments:
Without this patch the userspace libct (to access the cardreader)
segfaults.
Signed-off-by: Frank Seidel <fseidel@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With current adapter firmware the driver is working but future firmware
updates may return sense data larger than 96 bytes, causing overflow on
scp->sense_buffer and a kernel crash.
This fix should be backported to earlier kernels.
Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com> Signed-off-by: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
pci_enable_msi() replaces the INTx irq number in pci_dev->irq with the
new MSI irq number.
The forcedeth driver did not update the copy in netdevice->irq and
parts of the driver used the stale copy.
See bugzilla.kernel.org, bug 9047.
The patch
- updates netdevice->irq
- replaces all accesses to netdevice->irq with pci_dev->irq.
The patch is against 2.6.23.1. IMHO suitable for both 2.6.23 and 2.6.24
The function crypto_alloc_comp returns an errno instead of NULL
to indicate error. So it needs to be tested with IS_ERR.
This is based on a patch by Vicenç Beltran Querol.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.
While trying to implement u32 hashes in my shaping machine I ran into
a possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).
The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).
Let's say that I would like to use 0x3fc0 as the hash mask. This means
8 contiguous "1" bits starting at b6. With such a mask, the expected
(and logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.
This is exactly what would happen on a big endian machine, but on
little endian machines, what would actually happen with current
implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
in the userspace tool and then applied to 192.168.x.x in the u32
classifier. When shifting right by 16 bits (rank of first "1" bit in
the reversed mask) and applying the divisor mask (0xff for divisor
256), what would actually remain is 0x3f applied on the "168" octet of
the address.
One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc0000). But the actual problem is the network address (inside
the packet) not being converted to host order, but used as a
host-order value when computing the bucket.
Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
etc in the machine's registers. Thus bits n7 and n8 would no longer be
adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
consecutive.
The fix is to apply ntohl() on the hmask before computing fshift,
and in u32_hash_fold() convert the packet data to host order before
shifting down by fshift.
With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
by moving the schedule_timeout() call to a new function that doesn't
propagate the remaining timeout back to the caller. This means on each
retry we start with the full timeout again.
ipc/mqueue.c seems to actually want to wait indefinitely so this
behaviour is retained.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
tecl_reset() is called from deactivate and qdisc is set to noop already,
but subsequent teql_xmit does not know about it and dereference private
data as teql qdisc and thus oopses.
not catch it first :)
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With your description I could reproduce the bug and actually you were
completely right: the code above is incorrect. Somehow I was able to
misread RFC1122 and mixed the roles :-(:
When a connection is >>closed actively<<, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
However, it MAY >>accept<< a new SYN from the remote TCP to
reopen the connection directly from TIME-WAIT state, if it:
[...]
The fix is as follows: if the receiver initiated an active close, then the
sender may reopen the connection - otherwise try to figure out if we hold
a dead connection.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit faf8c714f4508207a9c81cc94dafc76ed6680b44 caused a regression:
parameter names longer than MAX_KBUILD_MODNAME will now be rejected,
although we just need to keep the module name part that short. This patch
restores the old behaviour while still avoiding that memchr is called with
its length parameter larger than the total string length.
Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It's possible to provoke unionfs (not yet in mainline, though in mm and
some distros) to hit shmem_writepage's BUG_ON(page_mapped(page)). I expect
it's possible to provoke the 2.6.23 ecryptfs in the same way (but the
2.6.24 ecryptfs no longer calls lower level's ->writepage).
This came to light with the recent find that AOP_WRITEPAGE_ACTIVATE could
leak from tmpfs via write_cache_pages and unionfs to userspace. There's
already a fix (e423003028183df54f039dfda8b58c49e78c89d7 - writeback: don't
propagate AOP_WRITEPAGE_ACTIVATE) in the tree for that, and it's okay so
far as it goes; but insufficient because it doesn't address the underlying
issue, that shmem_writepage expects to be called only by vmscan (relying on
backing_dev_info capabilities to prevent the normal writeback path from
ever approaching it).
That's an increasingly fragile assumption, and ramdisk_writepage (the other
source of AOP_WRITEPAGE_ACTIVATEs) is already careful to check
wbc->for_reclaim before returning it. Make the same check in
shmem_writepage, thereby sidestepping the page_mapped BUG also.
I ran into this problem on a system that was unable to obtain NTP sync
because the clock was running very slow (over 10000ppm slow). ntpd had
declared all of its peers 'reject' with 'peer_dist' reason.
On investigation, the tsc_khz variable was significantly incorrect
causing xtime to run slow. After a reboot tsc_khz was correct so I
did a reboot test to see how often the problem occurred:
Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them
had unacceptable tsc_khz values (>500ppm):
It appears that on rare occasions, mach_countup() is taking longer to
complete than necessary.
I suspect that this is caused by the CPU taking a periodic SMI
interrupt right at the end of the 30ms calibration loop. This would
cause the loop to delay while the SMI BIOS hander runs. The resulting
TSC value is beyond what it actually should be resulting in a higher
tsc_khz.
The below patch makes native_calculate_cpu_khz() take the best
(shortest duration, lowest khz) run of it's 3 calibration loops. If a
SMI goes off causing a bad result (long duration, higher khz) it will
be discarded.
With the patch applied, 300 boots of the same system produce good
results:
compat_exit_robust_list() computes a pointer to the
futex entry in userspace as follows:
(void __user *)entry + futex_offset
'entry' is a 'struct robust_list __user *', and
'futex_offset' is a 'compat_long_t' (typically a 's32').
Things explode if the 32-bit sign bit is set in futex_offset.
Type promotion sign extends futex_offset to a 64-bit value before
adding it to 'entry'.
This triggered a problem on sparc64 running 32-bit applications which
would lock up a cpu looping forever in the fault handling for the
userspace load in handle_futex_death().
Compat userspace runs with address masking (wherein the cpu zeros out
the top 32-bits of every effective address given to a memory operation
instruction) so the sparc64 fault handler accounts for this by
zero'ing out the top 32-bits of the fault address too.
Since the kernel properly uses the compat_uptr interfaces, kernel side
accesses to compat userspace work too since they will only use
addresses with the top 32-bit clear.
Because of this compat futex layer bug we get into the following loop
when executing the get_user() load near the top of handle_futex_death():
1) load from address '0xfffffffff7f16bd8', FAULT
2) fault handler clears upper 32-bits, processes fault
for address '0xf7f16bd8' which succeeds
3) goto #1
I want to thank Bernd Zeimetz, Josip Rodin, and Fabio Massimo Di Nitto
for their tireless efforts helping me track down this bug.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache. The per cpu freelist may contain objects and in that
situation we may overwrite the per cpu freelist pointer loosing objects.
This only occurs if we find that the concurrently allocated slab fits
our allocation needs.
If we simply always deactivate the slab then the freelist will be properly
reintegrated and the memory leak will go away.
NULL ptr can be returned from tcp_write_queue_head to cached_skb
and then assigned to skb if packets_out was zero. Without this,
system is vulnerable to a carefully crafted ACKs which obviously
is remotely triggerable.
Besides, there's very little that needs to be done in sacktag
if there weren't any packets outstanding, just skipping the rest
doesn't hurt.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The original meaning of the old test (p->state > TASK_STOPPED) was
"not dead", since it was before TASK_TRACED existed and before the
state/exit_state split. It was a wrong correction in commit 14bf01bb0599c89fc7f426d20353b76e12555308 to make this test for
TASK_TRACED instead. It should have been changed when TASK_TRACED
was introducted and again when exit_state was introduced.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Kees Cook <kees@ubuntu.com> Acked-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes. So the
commit will likely be reverted in the 2.6.23 stable kernels.
Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.
backported to 2.6.22 by Chuck Ebbert
Reported-by: Dave Jones <davej@redhat.com> Tested-by: Martin Ebourne <fedora@ebourne.me.uk> Cc: Zou Nan hai <nanhai.zou@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Process persistent exception store metadata IOs in a separate thread.
A snapshot may become invalid while inside generic_make_request().
A synchronous write is then needed to update the metadata while still
inside that function. Since the introduction of
md-dm-reduce-stack-usage-with-stacked-block-devices.patch this has to
be performed by a separate thread to avoid deadlock.
If memchr argument is longer than strlen(kp->name), there will be some
weird result.
It will casuse duplicate filenames in sysfs for the "nousb". kernel
warning messages are as bellow:
sysfs: duplicate filename 'usbcore' can not be created
WARNING: at fs/sysfs/dir.c:416 sysfs_add_one()
[<c01c4750>] sysfs_add_one+0xa0/0xe0
[<c01c4ab8>] create_dir+0x48/0xb0
[<c01c4b69>] sysfs_create_dir+0x29/0x50
[<c024e0fb>] create_dir+0x1b/0x50
[<c024e3b6>] kobject_add+0x46/0x150
[<c024e2da>] kobject_init+0x3a/0x80
[<c053b880>] kernel_param_sysfs_setup+0x50/0xb0
[<c053b9ce>] param_sysfs_builtin+0xee/0x130
[<c053ba33>] param_sysfs_init+0x23/0x60
[<c024d062>] __next_cpu+0x12/0x20
[<c052aa30>] kernel_init+0x0/0xb0
[<c052aa30>] kernel_init+0x0/0xb0
[<c052a856>] do_initcalls+0x46/0x1e0
[<c01bdb12>] create_proc_entry+0x52/0x90
[<c0158d4c>] register_irq_proc+0x9c/0xc0
[<c01bda94>] proc_mkdir_mode+0x34/0x50
[<c052aa30>] kernel_init+0x0/0xb0
[<c052aa92>] kernel_init+0x62/0xb0
[<c0104f83>] kernel_thread_helper+0x7/0x14
=======================
kobject_add failed for usbcore with -EEXIST, don't try to register things with the same name in the same directory.
[<c024e466>] kobject_add+0xf6/0x150
[<c053b880>] kernel_param_sysfs_setup+0x50/0xb0
[<c053b9ce>] param_sysfs_builtin+0xee/0x130
[<c053ba33>] param_sysfs_init+0x23/0x60
[<c024d062>] __next_cpu+0x12/0x20
[<c052aa30>] kernel_init+0x0/0xb0
[<c052aa30>] kernel_init+0x0/0xb0
[<c052a856>] do_initcalls+0x46/0x1e0
[<c01bdb12>] create_proc_entry+0x52/0x90
[<c0158d4c>] register_irq_proc+0x9c/0xc0
[<c01bda94>] proc_mkdir_mode+0x34/0x50
[<c052aa30>] kernel_init+0x0/0xb0
[<c052aa92>] kernel_init+0x62/0xb0
[<c0104f83>] kernel_thread_helper+0x7/0x14
=======================
Module 'usbcore' failed to be added to sysfs, error number -17
The system will be unstable now.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This attempts to address CVE-2006-6058
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058
first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html
Essentially a corrupted minix dir inode reporting a very large
i_size will loop for a very long time in minix_readdir, minix_find_entry,
etc, because on EIO they just move on to try the next page. This is
under the BKL, printk-storming as well. This can lock up the machine
for a very long time. Simply ratelimiting the printks gets things back
under control. Make the message a bit more informative while we're here.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Bodo Eggert <7eggert@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex")
rewrote how userspace objects are looked up in the uverbs module's
idrs, and introduced a severe bug in the process: there is no checking
that an operation is being performed by the right process any more.
Fix this by adding the missing check of uobj->context in __idr_get_uobj().
Apparently everyone is being very careful to only touch their own
objects, because this bug was introduced in June 2006 in 2.6.18, and
has gone undetected until now.
Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 5a43a066b11ac2fe84cf67307f20b83bea390f83: "genirq: Allow fasteoi
handler to retrigger disabled interrupts" was erroneously applied to
handle_level_irq(). This added the irq retrigger / resend functionality
to the level irq handler.
It is possible for the current->curr_chain_key to become inconsistent with the
current index if the chain fails to validate. The end result is that future
lock_acquire() operations may inadvertently fail to find a hit in the cache
resulting in a new node being added to the graph for every acquire.
[ peterz: this might explain some of the lockdep is so _slow_ complaints. ]
[ mingo: this does not impact the correctness of validation, but may slow
down future operations significantly, if the chain gets very long. ]
The bank switching code assumes that the bank selector is set to 0
when the driver is loaded. This might not be the case. This is exactly
the same bug as was fixed in the w83627ehf driver two months ago:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0956895aa6f8dc6a33210967252fd7787652537d
In practice, this bug was causing the sensor thermal types to be
improperly reported for my W83627THF the first time I was loading the
w83627hf driver. From the driver history, I'd say that it has been
broken since September 2005 (when we stopped resetting the chip by
default at driver load.)
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We need to read the fan clock dividers at initialization time,
otherwise the code in store_fan_min() may use uninitialized values.
That's pretty much the same bug and same fix as for the w83627ehf
driver last month.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A stupid bit shifting bug caused the VID value to be always exported
even when the hardware is configured for something different.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Missing parentheses in the definition of FAN_FROM_REG cause a
division by zero for a specific register value.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Garzik [Tue, 17 Jul 2007 04:01:09 +0000 (00:01 -0400)]
netdrvr: natsemi: Fix device removal bug
This episode illustrates how an overused warning can train people to
ignore that warning, which winds up hiding bugs.
The warning
drivers/net/natsemi.c: In function ‘natsemi_remove1’:
drivers/net/natsemi.c:3222: warning: ignoring return value of
‘device_create_file’, declared with attribute warn_unused_result
is oft-ignored, even though at close inspection one notices this occurs
in the /remove/ function, not normally where creation occurs. A quick
s/create/remove/ and we are fixed, with the warning gone.
ieee80211_get_radiotap_len() tries to dereference radiotap length without
taking care that it is completely unaligned and get_unaligned()
is required.
Signed-off-by: Andy Green <andy@warmcat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
domain->header.len is le16 and has just been assigned
cpu_to_le16(arithmetical expression). And all fields of adapter->logmsg
are __le32; not a single 16-bit among them...
That's incremental to the previous one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
In STA mode, the AP will echo our traffic. This includes multicast
traffic.
Receiving these frames confuses some protocols and applications,
notably IPv6 Duplicate Address Detection.
Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
TCP V4 sequence numbers are 32bits, and RFC 793 assumed a 250 KHz clock.
In order to follow network speed increase, we can use a faster clock, but
we should limit this clock so that the delay between two rollovers is
greater than MSL (TCP Maximum Segment Lifetime : 2 minutes)
Choosing a 64 nsec clock should be OK, since the rollovers occur every
274 seconds.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Based upon a report and initial patch by Peter Lieven.
tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key. Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().
Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses. This just so happens to work by accident on
little-endian, but on big-endian it doesn't.
Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When only GSO skb was partially ACKed, no hints are reset,
therefore fastpath_cnt_hint must be tweaked too or else it can
corrupt fackets_out. The corruption to occur, one must have
non-trivial ACK/SACK sequence, so this bug is not very often
that harmful. There's a fackets_out state reset in TCP because
fackets_out is known to be inaccurate and that fixes the issue
eventually anyway.
In case there was also at least one skb that got fully ACKed,
the fastpath_skb_hint is set to NULL which causes a recount for
fastpath_cnt_hint (the old value won't be accessed anymore),
thus it can safely be decremented without additional checking.
Reported by Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thanks to Tom Callaway for the excellent bug report and
test case.
sys_ipc() has several problems, most to due with semaphore
call handling:
1) 'err' return should be a 'long'
2) "union semun" is passed in a register on 64-bit compared
to 32-bit which provides it on the stack and therefore
by reference
3) Second and third arguments to SEMCTL are swapped compared
to 32-bit.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It should generate an empty packet for datagram protocols when the
socket is connected, for one.
The check is doubly-wrong because all that a write() can be is a
sendmsg() call with a NULL msg_control and a single entry iovec. No
special semantics should be assigned to it, therefore the zero length
check should be removed entirely.
This matches the behavior of BSD and several other systems.
Alan Cox notes that SuSv3 says the behavior of a zero length write on
non-files is "unspecified", but that's kind of useless since BSD has
defined this behavior for a quarter century and BSD is essentially
what application folks code to.
Based upon a patch from Stephen Hemminger.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit a3d384029aa304f8f3f5355d35f0ae274454f7cd aka
"[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
transformed rose_loopback_neigh var into statically allocated one.
However, on unload it will be kfree's which can't work.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the ICMPv6 Target address is multicast, Linux processes the
redirect instead of dropping it. The problem is in this code in
ndisc_redirect_rcv():
if (ipv6_addr_equal(dest, target)) {
on_link = 1;
} else if (!(ipv6_addr_type(target) & IPV6_ADDR_LINKLOCAL)) {
ND_PRINTK2(KERN_WARNING
"ICMPv6 Redirect: target address is not
link-local.\n");
return;
}
This second check will succeed if the Target address is, for example,
FF02::1 because it has link-local scope. Instead, it should be checking
if it's a unicast link-local address, as stated in RFC 2461/4861 Section
8.1:
- The ICMP Target Address is either a link-local address (when
redirected to a router) or the same as the ICMP Destination
Address (when redirected to the on-link destination).
I know this doesn't explicitly say unicast link-local address, but it's
implied.
This bug is preventing Linux kernels from achieving IPv6 Logo Phase II
certification because of a recent error that was found in the TAHI test
suite - Neighbor Disovery suite test 206 (v6LC.2.3.6_G) had the
multicast address in the Destination field instead of Target field, so
we were passing the test. This won't be the case anymore.
The patch below fixes this problem, and also fixes ndisc_send_redirect()
to not send an invalid redirect with a multicast address in the Target
field. I re-ran the TAHI Neighbor Discovery section to make sure Linux
passes all 245 tests now.
Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
To judge the timing for DAD, netif_carrier_ok() is used. However,
there is a possibility that dev->qdisc stays noop_qdisc even if
netif_carrier_ok() returns true. In that case, DAD NS is not sent out.
We need to defer the IPv6 device initialization until a valid qdisc
is specified.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> The summary is that an evil 80211 frame can crash out a victim's
> machine. It only applies to drivers using the 80211 wireless code, and
> only then to certain drivers (and even then depends on a card's
> firmware not dropping a dubious packet). I must confess I'm not
> keeping track of Linux wireless support, and the different protocol
> stacks etc.
>
> Details are as follows:
>
> ieee80211_rx() does not explicitly check that "skb->len >= hdrlen".
> There are other skb->len checks, but not enough to prevent a subtle
> off-by-two error if the frame has the IEEE80211_STYPE_QOS_DATA flag
> set.
>
> This leads to integer underflow and crash here:
>
> if (frag != 0)
> flen -= hdrlen;
>
> (flen is subsequently used as a memcpy length parameter).
How about this?
Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The ESP scsi driver does not initialize the host controller
instance early enough, so the messages in the log confuse
users.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
device_suspend() calls ACPI suspend functions, which seems to have undesired
side effects on lower idle C-states. It took me some time to realize that
especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one)
show this effect. I'm quite sure that other bug reports against suspend/resume
about turning the system into a brick have the same root cause.
After fishing in the dark for quite some time, I realized that removing the ACPI
processor module before suspend (this removes the lower C-state functionality)
made the problem disappear. Interestingly enough the propability of having a
bricked box is influenced by various factors (interrupts, size of the ram image,
...). Even adding a bunch of printks in the wrong places made the problem go
away. The previous periodic tick implementation simply pampered over the
problem, which explains why the dyntick / clockevents changes made this more
prominent.
We avoid complex functionality during the boot process and we have to do the
same during suspend/resume. It is a similar scenario and equaly fragile.
Add suspend / resume functions to the ACPI processor code and disable the lower
idle C-states across suspend/resume. Fall back to the default idle
implementation (halt) instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The recent fix for a circular lock dependency unfortunately introduced a
potential memory leak in the event where the call to nlmsvc_lookup_host
fails for some reason.
The Averatec 2370 and some other Turion laptop BIOS seems to program the
ENABLE_C1E MSR inconsistently between cores. This confuses the lapic
use heuristics because when C1E is enabled anywhere it seems to affect
the complete chip.
Use a global flag instead of a per cpu flag to handle this.
If any CPU has C1E enabled disabled lapic use.
Clear parent death signal on SID transitions to prevent unauthorized
signaling between SIDs.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@parisplace.org> Signed-off-by: James Morris <jmorris@localhost.localdomain> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When using /proc/timer_stats on ppc64 I noticed the events/sec field wasnt
accurate. Sometimes the integer part was incorrect due to rounding (we
werent taking the fractional seconds into consideration).
The fraction part is also wrong, we need to pad the printf statement and
take the bottom three digits of 1000 times the value.
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to disable all CPUs other than the boot CPU (usually 0) before
attempting to power-off modern SMP machines. This fixes the
hang-on-poweroff issue on my MythTV SMP box, and also on Thomas Gleixner's
new toybox.
Signed-off-by: Mark Lord <mlord@pobox.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The problem is that the garbage collector for the 'host' structures
nlm_gc_hosts(), holds nlm_host_mutex while calling down to
nlmsvc_mark_resources, which, eventually takes the file->f_mutex.
We cannot therefore call nlmsvc_lookup_host() from within
nlmsvc_create_block, since the caller will already hold file->f_mutex, so
the attempt to grab nlm_host_mutex may deadlock.
Fix the problem by calling nlmsvc_lookup_host() outside the file->f_mutex.
This fixes a bug in the way i2c-algo-bit handles I2C_M_RECV_LEN,
used to implement i2c_smbus_read_block_data(). Previously, in the
absence of PEC (rarely used!) it would NAK the "length" byte:
S addr Rd [A] [length] NA
That prevents the subsequent data bytes from being read:
S addr Rd [A] [length] { A [data] }* NA
The primary fix just reorders two code blocks, so the length used
in the "should I NAK now?" check incorporates the data which it
just read from the slave device.
However, that move also highlighted other fault handling glitches.
This fixes those by abstracting the RX path ack/nak logic, so it
can be used in more than one location.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The length check for truncated frames was not correctly handling
the case where VLAN acceleration had already read the tag.
Also, the Yukon EX has some features that use high bit of status
as security tag.
This is the 2.6.22 version of a regression fix that is already
in 2.6.23. Change the watchdog timer form 10 per second all the time,
to 1 per second and only if interface is up.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Domain Validation in the SPI transport class is failing on boxes with
damaged cables (and failing to the extent that the box hangs). The
problem is that the first test it does is a cable integrity test for
wide transfers and if this fails, it turns the wide bit off. The
problem is that the next set of tests it does turns wide back on
again, with the result that it runs through the entirety of DV with a
known bad setting and then hangs the system.
The attached patch fixes the problem by physically nailing the wide
setting to what it deduces it should be for the whole of Domain
Validation.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A crash upon booting that is caused by bcm43xx has been reported [1] and
found to be due to a work queue being reinitialized while work on that
queue is still pending. This fix modifies the shutdown of work queues and
prevents periodic work from being requeued during shutdown. With this patch,
no more crashes on reboot were observed by the original reporter. I do not
get that particular failure on my system; however, when running a large
number of ifdown/ifup sequences, my system would kernel panic with the
'caps lock' light blinking at roughly a 1 Hz rate. In addition, there were
infrequent failures in the firmware that resulted in 'IRQ READY TIMEOUT'
errors. With this patch, no more of the first type of failure occur, and
incidence of the second type is greatly reduced.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Subject: [PATCH] [NET]: Do not dereference iov if length is zero
When msg_iovlen is zero we shouldn't try to dereference
msg_iov. Right now the only thing that tries to do so
is skb_copy_and_csum_datagram_iovec. Since the total
length should also be zero if msg_iovlen is zero, it's
sufficient to check the total length there and simply
return if it's zero.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[TCP]: DSACK signals data receival, be conservative
In case a DSACK is received, it's better to lower cwnd as it's
a sign of data receival.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>