Indeed my previous change to alloc_pskb has made it possible
for the TCP header to be misaligned iff the MTU is not a multiple
of 4 (and less than a page). So I suspect the optimised IPsec
MTU calculation is giving you just such an MTU :)
This patch fixes it by changing alloc_pskb to make sure that
the size is at least 32-bit aligned. This does not cause the
problem fixed by the previous patch because max_header is always
32-bit aligned which means that in the SG/NOTSO case this will
be a no-op.
I thought about putting this in the callers but all the current
callers are from TCP. If and when we get a non-TCP caller we
can always create a TCP wrapper for this function and move the
alignment over there.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As it is crypto_remove_spawn may try to unregister an instance which is
yet to be registered. This patch fixes this by checking whether the
instance has been registered before attempting to remove it.
It also removes a bogus cra_destroy check in crypto_register_instance as
1) it's outside the mutex;
2) we have a check in __crypto_register_alg already.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysctl_tcp_congestion_control seems to have a bug that prevents it
from actually calling the tcp_set_default_congestion_control
function. This is not so apparent because it does not return an error
and generally the /proc interface is used to configure the default TCP
congestion control algorithm. This is present in 2.6.18 onwards and
probably earlier, though I have not inspected 2.6.15--2.6.17.
sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.
Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
if you are lucky (unlucky?) enough to have shared interrupts, the
interrupt handler can be called before the tasklet and lock are ready
for use.
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The #ifdef's in arp_process() were not only a mess, they were also wrong
in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or
CONFIG_NETDEV_10000=y) cases.
Since they are not required this patch removes them.
Also removed are some #ifdef's around #include's that caused compile
errors after this change.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It seems that stats of cpu 0 are counted twice, since
for_each_possible_cpu() is looping on all possible cpus, including 0
Before percpu conversion of ip_rt_acct, we should also remove the
assumption that CPU 0 is online (or even possible)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The CONFIG_SUSPEND changes in 2.6.23 caused a regression under certain
configuration conditions (SUSPEND=n, USB_AUTOSUSPEND=y) where all USB
device attributes in sysfs (idVendor, idProduct, ...) silently disappeared,
causing udev breakage and more.
The cause of this is that the /sys/.../power subdirectory is now only
created when CONFIG_PM_SLEEP is set, however, it should be created whenever
CONFIG_PM is set to handle the above situation. The following patch fixes
the regression.
Signed-off-by: Daniel Drake <dsd@gentoo.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: stable <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When connection tracking entry (nf_conn) is about to copy itself it can
have some of its extension users (like nat) as being already freed and
thus not required to be copied.
Actually looking at this function I suspect it was copied from
nf_nat_setup_info() and thus bug was introduced.
Report and testing from David <david@unsolicited.net>.
[ Patrick McHardy states:
I now understand whats happening:
- new connection is allocated without helper
- connection is REDIRECTed to localhost
- nf_nat_setup_info adds NAT extension, but doesn't initialize it yet
- nf_conntrack_alter_reply performs a helper lookup based on the
new tuple, finds the SIP helper and allocates a helper extension,
causing reallocation because of too little space
- nf_nat_move_storage is called with the uninitialized nat extension
So your fix is entirely correct, thanks a lot :) ]
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Li Zefan [Wed, 28 Nov 2007 08:56:27 +0000 (09:56 +0100)]
nf_nat: fix memset error
This patch fixes an incorrect memset in the NAT code, causing
misbehaviour when unloading and reloading the NAT module.
Applies to stable-2.6.22 and stable-2.6.23.
The size passing to memset is the size of a pointer. Fixes
misbehaviour when unloading and reloading the NAT module.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The esp_reset_cleanup() function is called with the host lock held and
invokes starget_for_each_device() which wants to take it too. Here is a
fix along the lines of shost_for_each_device()/__shost_for_each_device()
adding a __starget_for_each_device() counterpart which assumes the lock
has already been taken.
Eventually, I think the driver should get modified so that more work is
done as a softirq rather than in the interrupt context, but for now it
fixes a bug that causes the spinlock debugger to fire.
While at it, it fixes a small number of cosmetic problems with
starget_for_each_device() too.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Delete refereces to HOSTS_C
- Switch to module_init/module_exit instead of detect/release
- Don't pass around the host template and rename it to adpt_template
- Switch from scsi_register/scsi_unregister to scsi_host_alloc,
scsi_add_host, scsi_scan_host and scsi_host_put.
Because it caused (for unknown reasons) Andres' all-data-reads-as-zeroes
problem, reported at
http://groups.google.com/group/fa.linux.kernel/msg/083a9acff0330234
Cc: Matthew Wilcox <matthew@wil.cx> Cc: Mark Salyzyn <mark_salyzyn@adaptec.com> Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Anders Henke <anders.henke@1und1.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The code in fb_ddc_read() is said to be based on the implementation of the
radeon driver:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fc5891c8a3ba284f13994d7bc1f1bfa8283982de
However, comparing the old radeon driver code with the new fb_ddc code
reveals some differences. Most notably, the I2C bus lines are held at the
end of the function, while the original code was releasing them (as the
comment above correctly says.)
There are a few other differences, which appear to be responsible for read
failures on my system. While tracing low-level I2C code in i2c-algo-bit, I
noticed that the initial attempt to read the EDID always failed. It takes
one retry for the read to succeed. As we are about to remove this
automatic retry property from i2c-algo-bit, reading the EDID would really
fail.
As a summary, the I2C lines quirk which is supposedly needed to read EDID
on some older monitors is currently breaking the (first) read on all other
monitors (and might not even work with older ones - did anyone try since
October 2006?)
After applying the patch below, which makes the code in fb_ddc_read()
really similar to what the radeon driver used to have, the first EDID read
succeeds again.
On top of that, as it appears that this code has been broken for one year
now and nobody seems to have complained, I'm curious if it makes sense to
keep this quirk in place. It makes the code more complex and slower just
for the sake of monitors which I guess nobody uses anymore. Can't we just
get rid of it?
Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Roger Leigh <rleigh@whinlatter.ukfsn.org> Tested-by: Michael Buesch <mb@bu3sch.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In wait_task_stopped() exit_code already contains the right value for the
si_status member of siginfo, and this is simply set in the non WNOWAIT
case.
If you call waitid() with a stopped or traced process, you'll get the signal
in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the
same time, you'll get the signal << 8 | 0x7f
Pass it unchanged to wait_noreap_copyout(); we would only need to shift it
and add 0x7f if we were returning it in the user status field and that
isn't used for any function that permits WNOWAIT.
Signed-off-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On some systems the number of resources(IO,MEM) returnedy by PNP device is
greater than the PNP constant, for example motherboard devices. It brings
that some resources can't be reserved and resource confilicts. This will
cause PCI resources are assigned wrongly in some systems, and cause hang.
This is a regression since we deleted ACPI motherboard driver and use PNP
system driver.
[akpm@linux-foundation.org: fix text and coding-style a bit] Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The APM emulation is currently broken as a result of commit 831441862956fffa17b9801db37e6ea1650b0f69
"Freezer: make kernel threads nonfreezable by default"
that removed the PF_NOFREEZE annotations from apm_ioctl() without adding
the appropriate freezer hooks. Fix it and remove the unnecessary variable flags
from apm_ioctl().
Special thanks to Franck Bui-Huu <vagabon.xyz@gmail.com> for pointing out the
problem.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too. The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.
Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.
Here's the relevant code from kernel/futex.c: (not in order in the file)
[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;
t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
[...]
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;
if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;
switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);
So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.
This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.
I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.
The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters. This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.
Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for. If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Resetting an SMP guest will force AP enter real mode (RESET) with
paging enabled in protected mode. While current enter_rmode() can
only handle mode switch from nonpaging mode to real mode which leads
to SMP reboot failure.
Fix by reloading the mmu context on entering real mode.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If we defer updating rip until pio instructions are executed, we have a
problem with reset: a pio reset updates rip, and when the instruction
completes we skip the emulated instruction, pointing rip somewhere completely
unrelated.
Fix by updating rip when we see decode the instruction, not after emulation.
Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Marko Kohtala [Sun, 2 Dec 2007 11:18:43 +0000 (13:18 +0200)]
KVM: Fix hang on uniprocessor
This is not in mainline, as it was fixed differently in that tree.
first_cpu(cpus) returns the only CPU when NR_CPUS is 1 regardless of
the cpus mask. Therefore we avoid a kernel hang in
KVM_SET_MEMORY_REGION ioctl on uniprocessor by not entering the loop at
all.
Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The patch belows changes the access type to register from memory for
instructions that are declared as SrcMem or DstMem, but have a
ModR/M byte with Mod = 3.
It fixes (at least) the lmsw and smsw instructions on an AMD64 CPU,
which are needed for FreeBSD.
Relative hrtimers with a large timeout value might end up as negative
timer values, when the current time is added in hrtimer_start().
This in turn is causing the clockevents_set_next() function to set an
huge timeout and sleep for quite a long time when we have a clock
source which is capable of long sleeps like HPET. With PIT this almost
goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
sorts this out in the next timer interrupt, so we never noticed that
problem which has been there since the first day of hrtimers.
This bug became more apparent in 2.6.24 which activates HPET on more
hardware.
Fix a long boot delay in the forcedeth driver. During initialization, the
timeout for the handshake between mgmt unit and driver can be very long.
The patch reduces the timeout by eliminating a extra loop around the
timeout logic.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Cc: Alex Howells <astinus@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds new device ids and features for mcp79 devices into the
forcedeth driver.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
index 92ce2e3..f9ba0ac 100644
tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.
A recent patch added software synchronization during EHCI startup,
so ports aren't switched away from the companion controllers after
resets have started. This patch adds a short delay letting hardware
finish that port switching before any new resets begin ... so both
ends of that hardware race window are closed.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Dave Miller <davem@davemloft.net> Cc: Dely Sy <dely.l.sy@intel.com> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As it is some callers of synchronize_irq rely on memory barriers
to provide synchronisation against the IRQ handlers. For example,
the tg3 driver does
tp->irq_sync = 1;
smp_mb();
synchronize_irq();
and then in the IRQ handler:
if (!tp->irq_sync)
netif_rx_schedule(dev, &tp->napi);
Unfortunately memory barriers only work well when they come in
pairs. Because we don't actually have memory barriers on the
IRQ path, the memory barrier before the synchronize_irq() doesn't
actually protect us.
In particular, synchronize_irq() may return followed by the
result of netif_rx_schedule being made visible.
This patch (mostly written by Linus) fixes this by using spin
locks instead of memory barries on the synchronize_irq() path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The only qdiscs that check subqueue state before dequeue'ing are PRIO
and RR. The other qdiscs, including the default pfifo_fast qdisc,
will allow traffic bound for subqueue 0 through to hard_start_xmit.
The check for netif_queue_stopped() is done above in pkt_sched.h, so
it is unnecessary for qdisc_restart(). However, if the underlying
driver is multiqueue capable, and only sets queue states on subqueues,
this will allow packets to enter the driver when it's currently unable
to process packets, resulting in expensive requeues and driver
entries. This patch re-adds the check for the subqueue status before
calling hard_start_xmit, so we can try and avoid the driver entry when
the queues are stopped.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.
This patch fixes the issue.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping
them dirty all the time.
It turns out that there is a case, where the VM makes a ramdisk page
clean, without telling the ramdisk driver. On memory pressure
shrink_zone runs and it starts to run shrink_active_list. There is a
check for buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no
releasepage callback, try_to_free_buffers is called. try_to_free_buffers
has now a special logic for some file systems to make a dirty page
clean, if all buffers are clean. Thats what happened in our test case.
The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Nick Piggin <npiggin@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This routine is called each time hash should be replaced, nf_conn has
extension list which contains pointers to connection tracking users
(like nat, which is right now the only such user), so when replace takes
place it should copy own extensions. Loop above checks for own
extension, but tries to move higer-layer one, which can lead to above
oops.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
All 32 bits machines but i386 dont have CONFIG_KTIME_SCALAR. On these
machines, ktime.tv64 is more than 4 times the (correct) result given
by ktime_to_ns()
Again on these machines, using ktime_get_real().tv64 >> 6 give a
32bits rollover every 64 seconds, which is not wanted (less than the
120 s MSL)
Using ktime_to_ns() is the portable way to get nsecs from a ktime, and
have correct code.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ipw2200 makes extensive use of background scanning when unassociated or
down. Unfortunately, the firmware sends scan completed events many
times per second, which the driver pushes directly up to userspace.
This needlessly wakes up processes listening for wireless events many
times per second. Batch together scan completed events for
non-user-requested scans and send them up to userspace every 4 seconds.
Scan completed events resulting from an SIOCSIWSCAN call are pushed up
without delay.
Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Tobias Powalowski <t.powa@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.
Note that in the old format we only knew that some system call locked
up, we didnt know _which_. With the new format we know that it's at a
specific place in sys_prctl(). [which was where i created an artificial
kernel lockup to test the new format.]
This is also useful if the lockup happens in user-space - the user-space
EIP (and other registers) will be printed too. (such a lockup would
either suggest that the task was running at SCHED_FIFO:99 and looping
for more than 10 seconds, or that the softlockup detector has a
false-positive.)
The task name is printed too first, just in case we dont manage to print
a useful backtrace.
x86: fix freeze in x86_64 RTC update code in time_64.c
Fix hard freeze on x86_64 when the ntpd service calls
update_persistent_clock()
A repeatable but randomly timed freeze has been happening in Fedora 6
and 7 for the last year, whenever I run the ntpd service on my AMD64x2
HP Pavilion dv9000z laptop. This freeze is due to the use of
spin_lock(&rtc_lock) under the assumption (per a bad comment) that
set_rtc_mmss is called only with interrupts disabled. The call from
ntp.c to update_persistent_clock is made with interrupts enabled.
[ tglx@linutronix.de: ported to 2.6.23.stable ]
Signed-off-by: David P. Reed <dpreed@reed.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix a typo in ntp.c that has caused updating of the persistent (RTC)
clock when synced to NTP to behave erratically.
When debugging a freeze that arises on my AMD64 machines when I
run the ntpd service, I added a number of printk's to monitor the
sync_cmos_clock procedure. I discovered that it was not syncing to
cmos RTC every 11 minutes as documented, but instead would keep trying
every second for hours at a time. The reason turned out to be a typo
in sync_cmos_clock, where it attempts to ensure that
update_persistent_clock is called very close to 500 msec. after a 1
second boundary (required by the PC RTC's spec). That typo referred to
"xtime" in one spot, rather than "now", which is derived from "xtime"
but not equal to it. This makes the test erratic, creating a
"coin-flip" that decides when update_persistent_clock is called - when
it is called, which is rarely, it may be at any time during the one
second period, rather than close to 500 msec, so the value written is
needlessly incorrect, too.
Signed-off-by: David P. Reed <dpreed@reed.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
x86: return correct error code from child_rip in x86_64 entry.S
Right now register edi is just cleared before calling do_exit.
That is wrong because correct return value will be ignored.
Value from rax should be copied to rdi instead of clearing edi.
AK: changed to 32bit move because it's strictly an int
This patch fixes a bug of change_page_attr/change_page_attr_addr on
Intel x86_64 CPUs. After changing page attribute to be executable with
these functions, the page remains un-executable on Intel x86_64 CPU.
Because on Intel x86_64 CPU, only if the "NX" bits of all four level
page tables are cleared, the corresponding page is executable (refer to
section 4.13.2 of Intel 64 and IA-32 Architectures Software Developer's
Manual). So, the bug is fixed through clearing the "NX" bit of PMD when
splitting the huge PMD.
Some gcc versions (I checked at least 4.1.1 from RHEL5 & 4.1.2 from gentoo)
can generate incorrect code with read_crX()/write_crX() functions mix up,
due to cached results of read_crX().
The small app for x8664 below compiled with -O2 demonstrates this
(i686 does the same thing):
One more of these issues (which were considered fixed a few releases
back): other than on x86-64, i386 allows set_fixmap() to replace
already present mappings. Consequently, on PAE, care must be taken to
not update the high half of a pte while the low half is still holding
the old value.
[tglx: arch/x86 adaptation]
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[LIB] crc32c: Keep intermediate crc state in cpu order
crypto/crc32.c:chksum_final() is computing the digest as
*(__le32 *)out = ~cpu_to_le32(mctx->crc);
so the low-level crc32c_le routines should just keep
the crc in cpu order, otherwise it is getting swabbed
one too many times on big-endian machines.
Currently the Geode AES module fails to encrypt or decrypt if
the coherent bits are not set what is currently the case if the
encryption does not occur inplace. However, the encryption works
on my Geode machine _only_ if the coherent bits are always set.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Acked-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The problem code has been removed in 2.6.24. The below patch disables
SCHED_FEAT_PRECISE_CPU_LOAD which causes the offending code to be skipped
but does not prevent the user from enabling it.
The divide-by-zero is here in kernel/sched.c:
static void update_cpu_load(struct rq *this_rq)
{
u64 fair_delta64, exec_delta64, idle_delta64, sample_interval64, tmp64;
unsigned long total_load = this_rq->ls.load.weight;
unsigned long this_load = total_load;
struct load_stat *ls = &this_rq->ls;
int i, scale;
this_rq->nr_load_updates++;
if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
goto do_avg;
/* Update delta_fair/delta_exec fields first */
update_curr_load(this_rq);
sata_sis has the same restrictions as other SFF controllers, and so must
use LIBATA_MAX_PRD to denote that SCSI may only fill ATA_MAX_PRD/2
entries, due to our need to handle IOMMU merging.
This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the
same way.
Reiserfs could accumulate dirty sub-page-size files until umount time.
They cannot be synced to disk by pdflush routines or explicit `sync'
commands. Only `umount' can do the trick.
Marin Mitov points out that delay_tsc() can misbehave if it is preempted and
rescheduled on a different CPU which has a skewed TSC. Fix it by disabling
preemption.
(I assume that the worst-case behaviour here is a stall of 2^32 cycles)
When a DMA device is unregistered, its reference count is decremented twice
for each channel: Once dma_class_dev_release() and once in
dma_chan_cleanup(). This may result in the DMA device driver's remove()
function completing before all channels have been cleaned up, causing lots
of use-after-free fun.
Fix it by incrementing the device's reference count twice for each
channel during registration.
[dan.j.williams@intel.com: kill unnecessary client refcounting] Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
this is a case where we need to redo a security check in fh_verify()
even though the filehandle already has an associated dentry--if the
filehandle was created by fh_compose() in an earlier operation of the
nfsv4 compound, then we may not have done these checks yet.
Without this fix it is possible, for example, to traverse from an export
without the secure ports requirement to one with it in a single
compound, and bypass the secure port check on the new export.
While we're here, fix up some minor style problems and change a printk()
to a dprintk(), to make it harder for random unprivileged users to spam
the logs.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-By: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The v2/v3 acl code in nfsd is translating any return from fh_verify() to
nfserr_inval. This is particularly unfortunate in the case of an
nfserr_dropit return, which is an internal error meant to indicate to
callers that this request has been deferred and should just be dropped
pending the results of an upcall to mountd.
Thanks to Roland <devzero@web.de> for bug report and data collection.
Cc: Roland <devzero@web.de> Acked-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-By: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These blocks were prepared to be written out, but were never handled in
ops_run_biodrain(), so they remain locked forever. The operations flags
are all clear which means handle_stripe() thinks nothing else needs to be
done.
This state suggests that the STRIPE_OP_PREXOR bit was sampled 'set' when it
should not have been. This patch cleans up cases where the code looks at
sh->ops.pending when it should be looking at the consistent stack-based
snapshot of the operations flags.
Report from Joel:
Resync done. Patch fix this bug.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Joel Bertrand <joel.bertrand@systella.fr> Cc: <stable@kernel.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Instruction pointer returned by profile_pc() can be a random value. This
break the assumption than we can safely set struct op_sample.eip field to a
magic value to signal to the per-cpu buffer reader side special event like
task switch ending up in a segfault in get_task_mm() when profile_pc()
return ~0UL. Fixed by sanitizing the sampled eip and reject/log invalid
eip.
Problem reported by Sami Farin, patch tested by him.
Signed-off-by: Philippe Elie <phil.el@wanadoo.fr> Tested-by: Sami Farin <safari-kernel@safari.iki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The sysfs interface to DMI data takes care to not make the system
serial number and UUID world-readable, presumably due to privacy
concerns. For consistency, we should not let the eeprom driver
export these same strings to the world on Sony Vaio laptops.
Instead, only make them readable by root, as we already do for BIOS
passwords.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Recent (i.e. 2005 and later) Sony Vaio laptops have names beginning
with VGN rather than PCG. Update the eeprom driver so that it
recognizes these.
Why this matters: the eeprom driver hides private data from the
EEPROMs it recognizes as Vaio EEPROMs (passwords, serial number...) so
if the driver fails to recognize a Vaio EEPROM as such, the private
data is exposed to the world.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The original meaning of the old test (p->state > TASK_STOPPED) was
"not dead", since it was before TASK_TRACED existed and before the
state/exit_state split. It was a wrong correction in commit 14bf01bb0599c89fc7f426d20353b76e12555308 to make this test for
TASK_TRACED instead. It should have been changed when TASK_TRACED
was introducted and again when exit_state was introduced.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Kees Cook <kees@ubuntu.com> Acked-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NULL ptr can be returned from tcp_write_queue_head to cached_skb
and then assigned to skb if packets_out was zero. Without this,
system is vulnerable to a carefully crafted ACKs which obviously
is remotely triggerable.
Besides, there's very little that needs to be done in sacktag
if there weren't any packets outstanding, just skipping the rest
doesn't hurt.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
We cannot zero the user page in nfs_mark_uptodate() any more, since
a) We'd be modifying the page without holding the page lock
b) We can race with other updates of the page, most notably
because of the call to nfs_wb_page() in nfs_writepage_setup().
Instead, we do the zeroing in nfs_update_request() if we see that we're
creating a request that might potentially be marked as up to date.
Thanks to Olivier Paquet for reporting the bug and providing a test-case.
On file systems which don't support sparse files, Ocfs2_map_page_blocks()
was reading blocks on appending writes. This caused write performance to
suffer dramatically. Fix this by detecting an appending write on a nonsparse
fs and skipping the read.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This attempts to address CVE-2006-6058
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058
first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html
Essentially a corrupted minix dir inode reporting a very large
i_size will loop for a very long time in minix_readdir, minix_find_entry,
etc, because on EIO they just move on to try the next page. This is
under the BKL, printk-storming as well. This can lock up the machine
for a very long time. Simply ratelimiting the printks gets things back
under control. Make the message a bit more informative while we're here.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Bodo Eggert <7eggert@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 9b039330808b83acac3597535da26f47ad1862ce removed
acpi_gpe_sleep_prepare(), the only function used at S5 transition
Add call to generic acpi_enable_wake_device().
Tejun Heo [Thu, 25 Oct 2007 06:53:19 +0000 (15:53 +0900)]
libata: backport ATA_FLAG_NO_SRST and ATA_FLAG_ASSUME_ATA, part 2
Differs from mainline, but the functionality is already there.
P5W-DH Deluxe has ICH7R which doesn't have PMP support but SIMG 4726
hardwired to the second port of AHCI controller at PCI device 1f.2.
The 4726 doesn't work as PMP but as a storage processor which can do
hardware RAID on downstream ports.
When no device is attached to the downstream port of the 4726, pseudo
ATA device for configuration appears. Unfortunately, ATA emulation on
the device is very lousy and causes long hang during boot.
This patch implements workaround for the board. If the mainboard is
P5W-DH Deluxe (matched using DMI), only hardreset is used on the
second port of AHCI controller @ 1f.2 and the hardreset doesn't depend
on receiving the first FIS and just proceed to IDENTIFY.
Tejun Heo [Thu, 25 Oct 2007 06:51:57 +0000 (15:51 +0900)]
libata: backport ATA_FLAG_NO_SRST and ATA_FLAG_ASSUME_ATA
Differs from mainline, but the functionality is already there.
Backport ATA_FLAG_NO_SRST and ATA_FLAG_ASSUME_ATA. These are
originally link flags (ATA_LFLAG_*) but link abstraction doesn't exist
on 2.6.23, so make it port flags.
This is for the following workaround for ASUS P5W DH Deluxe.
These new flags don't introduce any behavior change unless set and
nobody sets them yet.
This code relied on the CPU and GPU address for the aperture being the same,
On some r5xx hardware I was playing with I noticed that this isn't always true.
This fixes issues seen on some r400 cards. (bugs.freedesktop.org 9957)
Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- register_device unconditionally (non-pci dependent) to have also isa
devices in /dev
- unregister devices on module removal
- don't set TTY_DRIVER_DYNAMIC_DEV twice (removed the one dependent on some
macro)
This is the substantial part of the patch and the previous point is for
not checking which devices to unregister and which not (simply register
and unregister all found no matter on which bus they are plugged).
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ferenc Wagner <wferi@niif.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With current adapter firmware the driver is working but future firmware
updates may return sense data larger than 96 bytes, causing overflow on
scp->sense_buffer and a kernel crash.
This fix should be backported to earlier kernels.
Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com> Signed-off-by: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The patch
- Includes the call to capilib_data_b3_req in the spinlock. This routine
in turn calls the offending mq_enqueue routine that triggered the
freeze if not locked. This should also fix other indicators of
incosistent capilib_msgidqueue list, that trigger messages like:
Oct 5 03:05:57 BERL0 kernel: kcapi: msgid 3019 ncci 0x30301 not on queue
that we saw several times a day (usually several in a row).
- Fixes all occurrences of c4_dispatch_tx to be called with active
spinlock, there were some instances where no lock was active. Mostly
these are in very infrequently called routines, so the additional
performance penalty is minimal.
USB: usbserial - fix potential deadlock between write() and IRQ
usb_serial_generic_write() doesn't disable interrupts when taking port->lock,
and could therefore deadlock with usb_serial_generic_read_bulk_callback()
being called from interrupt, taking the same lock. Fix it.
Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Larry Finger <larry.finger@lwfinger.net> Cc: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as999) fixes a problem that sometimes shows up when host
controller driver modules are loaded in the wrong order. If ehci-hcd
happens to initialize an EHCI controller while the companion OHCI or
UHCI controller is in the middle of a port reset, the reset can fail
and the companion may get very confused. The patch adds an
rw-semaphore and uses it to keep EHCI initialization and port resets
mutually exclusive.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <david-b@pacbell.net> Cc: David Miller <davem@davemloft.net> Cc: Dely L Sy <dely.l.sy@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>