arm's fls() is implemented as a macro, causing it to misbehave when passed
64-bit arguments. Fix.
Cc: Nickolay Vinogradov <nickolay@protei.ru> Tested-by: Krzysztof Halasa <khc@pm.waw.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a kernel was rebuilt, the previous Module.markers was not cleared.
It caused markers with different format strings to appear as duplicates
when a markers was changed. This problem is present since
scripts/mod/modpost.c started to generate Module.markers, commit b2e3e658b344c6bcfb8fb694100ab2f2b5b2edb0
It therefore applies to 2.6.25, 2.6.26 and linux-next.
I merely merged the patches from Roland, Wenji and Takashi here.
Credits to
Roland McGrath <roland@redhat.com>
Wenji Huang <wenji.huang@oracle.com>
and
Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
for providing the individual fixes.
- Changelog :
- Integrated Takashi's Makefile modification to clear Module.markers upon
make clean.
A couple of distributions (Fedora, Ubuntu) were having weird problems with the
ATI IXP series PATA controllers being reported as simplex. At the heart of
the problem is that both distros ignored the recommendations to load pata_acpi
and ata_generic *AFTER* specific host drivers.
The underlying cause however is that if you D3 and then D0 an ATI IXP it
helpfully throws away some configuration and won't let you rewrite it.
Add checks to ata_generic and pata_acpi to pin ATIIXP devices. Possibly the
real answer here is to quirk them and pin them, but right now we can't do that
before they've been pcim_enable()'d by a driver.
I'm indebted to David Gero for this. His bug report not only reported the
problem but identified the cause correctly and he had tested the right values
to prove what was going on
[If you backport this for 2.6.24 you will need to pull in the 2.6.25
removal of the bogus WARN_ON() in pcim_enagle]
Signed-off-by: Alan Cox <alan@redhat.com> Tested-by: David Gero <davidg@havidave.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
No straightforward fix exists because we want to
enable these IRQs and setup state atomically before
getting into the IRQ handler the first time.
What happens now is that we mark the VIRQ to not be
automatically enabled by request_irq(). Then we
make explicit enable_irq() calls when we grab the
LDC channel.
This way we don't need to call request_irq() illegally
under the LDC channel lock any more.
Bump LDC version and release date.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is based upon an excellent bug report from Eric Dumazet.
tcp_ack() should clear ->icsk_probes_out even if there are packets
outstanding. Otherwise if we get a sequence of ACKs while we do have
packets outstanding over and over again, we'll never clear the
probes_out value and eventually think the connection is too sick and
we'll reset it.
This appears to be some "optimization" added to tcp_ack() in the 2.4.x
timeframe. In 2.2.x, probes_out is pretty much always cleared by
tcp_ack().
Here is Eric's original report:
----------------------------------------
Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost.
In order to reproduce the problem, please try the following program on linux-2.6.25.*
Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000
iptables -N SLOWLO
iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT
iptables -A SLOWLO -j DROP
iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO
Then run the attached program and see the output :
# ./loop
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15)
write(): Connection timed out
wrote 890 bytes but was interrupted after 9 seconds
ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455
Exiting read() because no data available (4000 ms timeout).
read 860 bytes
While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15)
tcpdump :
15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257
15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257
15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257
15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257
15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257
15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257
15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257
15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257
15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257
15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257
15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257
15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257
15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257
15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257
15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257
15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257
15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257
15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257
15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257
15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257
15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257
15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257
15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257
15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257
15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257
15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257
15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257
15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257
15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257
15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257
15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257
15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257
15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257
15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257
15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257
15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0
Source of program :
/*
* small producer/consumer program.
* setup a listener on 127.0.0.1:12000
* Forks a child
* child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms
* Father accepts connection, and read all data
*/
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <time.h>
#include <sys/poll.h>
int port = 12000;
char buffer[4096];
int main(int argc, char *argv[])
{
int lfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in socket_address;
time_t t0, t1;
int on = 1, sfd, res;
unsigned long total = 0;
socklen_t alen = sizeof(socket_address);
pid_t pid;
Due to the addition of __attribute__((__cold__)) to a few symbols
without adjusting the linker scripts, those symbols currently may end
up outside the [_stext,_etext) range, as they get placed in
.text.unlikely by (at least) gcc 4.3.0. This may confuse code not only
outside of the kernel, symbol_put_addr()'s BUG() could also trigger.
Hence we need to add .text.unlikely (and for future uses of
__attribute__((__hot__)) also .text.hot) to the TEXT_TEXT() macro.
Issue observed by Lukas Lipavsky.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Tested-by: Lukas Lipavsky <llipavsky@suse.cz> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
currently if you use PTRACE_SINGLEBLOCK on AMD K6-3 (i586) it will crash.
Kernel now wrongly assumes existing DEBUGCTLMSR MSR register there.
Removed the assumption also for some other non-K6 CPUs but I am not sure there
(but it can only bring small inefficiency there if my assumption is wrong).
Based on info from Roland McGrath, Chuck Ebbert and Mikulas Patocka.
More info at:
https://bugzilla.redhat.com/show_bug.cgi?id=456175
Some iso9660 images contain files with rockridge data that is either
incorrect or incompletely parsed. Prior to commit f2966632a134e865db3c819346a1dc7d96e05309 ("[PATCH] rock: handle directory
overflows") (included with kernel 2.6.13) the kernel ignored the rockridge
data for these files, while still allowing the files to be accessed under
their non-rockridge names. That commit inadvertently changed things so
that files with invalid rockridge data could not be accessed at all. (I
ran across the problem when comparing some old CDs with hard disk copies I
had made long ago under kernel 2.4: a few of the files on the hard disk
copies were no longer visible on the CDs.)
This change reverts to the pre-2.6.13 behavior.
Signed-off-by: Adam Greenblatt <adam.greenblatt@gmail.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When quota structure is going to be dropped and it is dirty, quota code tries
to write it. If the write fails for some reason (e. g. transaction cannot
be started because the journal is aborted), we try writing again and again and
again... Fix the problem by clearing the dirty bit even if the write failed.
(akpm: for 2.6.27, 2.6.26.x and 2.6.25.x)
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch clamps the cscov setsockopt values to a maximum of 0xFFFF.
Setsockopt values greater than 0xffff can cause an unwanted
wrap-around. Further, IPv6 jumbograms are not supported (RFC 3838,
3.5), so that values greater than 0xffff are not even useful.
Further changes: fixed a typo in the documentation.
[ Add USHORT_MAX from upstream to linux/kernel.h -DaveM ]
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When generating the ip header for the transformed packet we just copy
the frag_off field of the ip header from the original packet to the ip
header of the new generated packet. If we receive a packet as a chain
of fragments, all but the last of the new generated packets have the
IP_MF flag set. We have to mask the frag_off field to only keep the
IP_DF flag from the original packet. This got lost with git commit 36cf9acf93e8561d9faec24849e57688a81eb9c5 ("[IPSEC]: Separate
inner/outer mode processing on output")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This trivial patch restores correct behavior, and applies to current
Linus tree (should also be applied to stable tree as well.)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We need to unshare the skb first as otherwise pskb_may_pull may
write to a shared skb which could be bad.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The length field in the PPPOE header wasn't checked completely.
This patch causes all packets shorter than the declared length
to be dropped.
It also changes the memcpy_toiovec call to skb_copy_datagram_iovec
so that paged packets (rare for PPPOE) are handled properly.
Thanks to Ilja of the Netric Security Team for discovering and
reporting this bug, and Chris Wright for the total_len check.
[ Incorporate warning fix from Stephen Hemminger. -DaveM ]
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a potential memory corruption in
pppol2tp_recvmsg(). If skb->len is bigger than the caller's buffer
length, memcpy_toiovec() will go into unintialized data on the kernel
heap, interpret it as an iovec and start modifying memory.
The fix is to change the memcpy_toiovec() call to
skb_copy_datagram_iovec() so that paged packets (rare for PPPOL2TP)
are handled properly. Also check that the caller's buffer is big
enough for the data and set the MSG_TRUNC flag if it is not so.
Reported-by: Ilja <ilja@netric.org> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This fixes the bridge reference count problem and cleanups ipv6 FIB
timer management. Don't use expires field, because it is not a proper
way to test, instead use timer_pending().
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a trivial patch against the hdlcdrv module that fixes its CRC
calculation. The finished CRC was overwriting the first two bytes of
each packet rather than being appended to the end.
I've tested this with 2.6.8 and 2.6.10-rc1, but hdlcdrv hasn't changed
much recently so it should work with many other kernel versions.
Signed-off-by: Micah Dowty <micah@navi.cx> Acked-by: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Unfortunately this allows the task to migrate after setting up the softirq
and raising it. Since softirqs run a queue that is per-cpu we may raise the
softirq on the wrong CPU and this will keep the queued softirq task from
running.
To solve this issue, this patch disables preemption around the releasing
of the hrtimer lock and raising of the softirq.
Even the newer ENE controllers have bugs in their DMA engine that make
it too dangerous to use. Disable it until someone has figured out under
which conditions it corrupts data.
This has caused problems at least once, and can be found as bug report
10925 in the kernel bugzilla.
The pxa27x DMA controller defaults to 64-bit alignment. This caused
the SCR reads to fail (and, depending on card type, error out) when
card->raw_scr was not aligned on a 8-byte boundary.
For performance reasons all scatter-gather addresses passed to
pxamci_request should be aligned on 8-byte boundaries, but if
this can't be guaranteed, byte aligned DMA transfers in the
have to be enabled in the controller to get correct behaviour.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is dma_mask in of_device upon of_platform_device_create()
but we don't actually set coherent_dma_mask. This may cause weird
behavior of USB subsystem using of_device USB host drivers.
The problem here is that if the ioc faults too early in the bring up
sequence (as it usually does for an irq routing problem), ioc_reset gets
called before the scsi host is even allocated. This causes an oops when
it later schedules a renegotiation. Fix this by checking ioc->sh before
trying to renegotiate.
Cc: Eric Moore <Eric.Moore@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I found a very old patch for this that was Acked but did not get applied
https://lists.linux-foundation.org/pipermail/kernel-janitors/2006-September/016362.html
There looks to be a small leak in isdn_writebuf_stub() in isdn_common.c, when
copy_from_user() returns an un-copied data length (length != 0). The below
patch should be a minimally invasive fix.
Signed-off-by: Darren Jenkins <darrenrjenkins@gmail.com> Acked-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch is a bugfix for how defio handles multiple processes manipulating
the same framebuffer.
Thanks to Bernard Blackham for identifying this bug.
It occurs when two applications mmap the same framebuffer and concurrently
write to the same page. Normally, this doesn't occur since only a single
process mmaps the framebuffer. The symptom of the bug is that the mapping
applications will hang. The cause is that defio incorrectly tries to add the
same page twice to the pagelist. The solution I have is to walk the pagelist
and check for a duplicate before adding. Since I needed to walk the pagelist,
I now also keep the pagelist in sorted order.
Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com> Cc: Bernard Blackham <bernard@largestprime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I had 8250.nr_uarts=16 in the boot line of a test kernel and I had a weird
mysterious crash in sysfs. After taking an in-depth look I realized that
CONFIG_SERIAL_8250_NR_UARTS was set to 4 and I was walking off the end of
the serial8250_ports array.
Ouch!!!
Don't let this happen to someone else.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cortland Setlow pointed out a bug in ov7670.c where the result from
ov7670_read() was just being checked for !0, rather than <0. This made me
realize that ov7670_read's semantics were rather confusing; it both fills
in 'value' with the result, and returns it. This is goes against general
kernel convention; so rather than fixing callers, let's fix the function.
This makes ov7670_read return <0 in the case of an error, and 0 upon
success. Thus, code like:
res = ov7670_read(...);
if (!res)
goto error;
.will work properly.
Signed-off-by: Cortland Setlow <csetlow@tower-research.com> Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current definition of wksidarr works fine on little endian arches
(since cpu_to_le32 is a no-op there), but on big-endian arches, it fails
to compile with this error:
error: braced-group within expression allowed only inside a function
The problem is that this static declaration has cpu_to_le32 embedded
within it, and that expands into a function macro. We need to use
__constant_cpu_to_le32() instead.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The IRQ rate reported back by the RTC is incorrect when HPET is enabled.
Newer hardware that has HPET to emulate the legacy RTC device gets this value
wrong since after it sets the rate, it returns before setting the variable
used to report the IRQ rate back to users of the device -- so the set rate and
the reported rate get out of sync.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Brownell <david-b@pacbell.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
"The register %ecx looks innocent but is very important here. The disassembly:
mov %edx,%ecx
shr $0x2,%ecx
rep stos %eax,%es:(%edi) <-- the fault
So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
(0x6b6b6b6b >> 2 == 0x1adadada.)
%ecx is the counter for the memset, from here:
memset(object, 0, c->objsize);
i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
Where did "c" come from? Uh-oh...
c = get_cpu_slab(s, smp_processor_id());
This looks like it has very much to do with CPU hotplug/unplug. Is
there a race between SLUB/hotplug since the CPU slab is used after it
has been freed?"
Good analysis.
Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc()
can be migrated on another CPU right after local_irq_restore() and
before memset(). The inital cpu can become offline in the mean time (or
a migration is a consequence of the CPU going offline) so its
'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback).
At some point of time the caller continues on another CPU having an
obsolete pointer...
Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
exec'ing an ELF without a PT_GNU_STACK program header should default to an
executable stack; but this got broken by the unlimited argv feature because
stack vma is now created before the right personality has been established:
so breaking old binaries using nested function trampolines.
Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
vm_flags used to be set, before the mprotect_fixup. Checking through
our existing VM_flags, none would have changed since insert_vm_struct:
so this seems safer than finding a way through the personality labyrinth.
I would like to inform you of our zd1211 based usb wifi adapter (AirTies
WUS-201), which works with the zd1211rw driver with the following device
id definition.
Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Peter Nixon <listuser@peternixon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Lost connections was reported by Thomas Bätzler (running 2.6.25 kernel) on
the netfilter mailing list (see the thread "Weird nat/conntrack Problem
with PASV FTP upload"). He provided tcpdump recordings which helped to
find a long lingering bug in conntrack.
In TCP connection tracking, checking the lower bound of valid ACK could
lead to mark valid packets as INVALID because:
- We have got a "higher or equal" inequality, but the test checked
the "higher" condition only; fixed.
- If the packet contains a SACK option, it could occur that the ACK
value was before the left edge of our (S)ACK "window": if a previous
packet from the other party intersected the right edge of the window
of the receiver, we could move forward the window parameters beyond
accepting a valid ack. Therefore in this patch we check the rightmost
SACK edge instead of the ACK value in the lower bound of valid (S)ACK
test.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic has a bug which cannot find matching pattern, if the
pattern is matched from the first character of target string.
for example:
pattern=abc, string=abcdefg
pattern=a, string=abcdefg
Searching algorithm should return 0 for those things.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the dubious attempt to prefer 'compute' over 'read'. Not only is it
wrong given commit c337869d (md: do not compute parity unless it is on a failed
drive), but it can trigger a BUG_ON in handle_parity_checks5().
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Even though the CAN netlayer only deals with CAN netdevices, the
netlayer interface to the userspace and to the device layer should
perform some sanity checks.
This patch adds several sanity checks that mainly prevent userspace apps
to send broken content into the system that may be misinterpreted by
some other userspace application.
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Acked-by: Andre Naujoks <nautsch@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As reported by Vipul Gandhi, the current serial_match_port() doesn't work
for tty-devices using dynamic major number allocation. Fix it.
It oopses if you suspend a serial port with _dynamic_ major number. ATM,
I think, there's only the drivers/serial/jsm/jsm_driver.c driver, that
does it in-tree.
This patch changes the way we determine the maximum number of outstanding
commands for each controller.
Most Smart Array controllers can support up to 1024 commands, the notable
exceptions are the E200 and E200i.
The next generation of controllers which were just added support a mode of
operation called Zero Memory Raid (ZMR). In this mode they only support
64 outstanding commands. In Full Function Raid (FFR) mode they support
1024.
We have been setting the queue depth by arbitrarily assigning some value
for each controller. We needed a better way to set the queue depth to
avoid lots of annoying "fifo full" messages. So we made the driver a
little smarter. We now read the config table and subtract 4 from the
returned value. The -4 is to allow some room for ioctl calls which are
not tracked the same way as io commands are tracked.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With the removal of struct file from the xattr code,
reiserfs_file_release() isn't used anymore, so the prealloc isn't
discarded. This causes hangs later down the line.
This patch adds it to reiserfs_delete_inode. In most cases it will be a
no-op due to it already having been called, but will avoid hangs with
xattrs.
The esp driver currently does hand rolled reference counting of its
target. It's much easier to do what it needs to do if it's plugged into
the mid-layer callbacks (target_alloc and target_destroy) which were
designed for this case, so do it this way and get rid of the internal
target reference count.
Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
OOPS reported by Friedrich Oslage <bluebird@porno-bullen.de>
The problem here is that tp->starget is set every time a lun
is allocated for a particular target so we can catch the
sdev_target parent value.
The reset handler uses the NULL'ness of this value to determine
which targets are active.
But esp_slave_destroy() does not NULL out this value when appropriate.
So for every target that doesn't respond, the SCSI bus scan causes
a stale pointer to be left here, with ensuing crashes like you're
seeing.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-tip testing found that Johannes Berg's "softirq: remove irqs_disabled
warning from local_bh_enable" enhancement to lockdep triggers a new
warning on an old testbox that uses 3c59x vortex and netlogging:
----->
calling vortex_init+0x0/0xb0
PCI: Found IRQ 10 for device 0000:00:0b.0
PCI: Sharing IRQ 10 with 0000:00:0a.0
PCI: Sharing IRQ 10 with 0000:00:0b.1
3c59x: Donald Becker and others.
0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400.
PCI: Enabling bus mastering for device 0000:00:0b.0
initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs
..
calling init_netconsole+0x0/0x1b0
netconsole: local port 4444
netconsole: local IP 10.0.1.9
netconsole: interface eth0
netconsole: remote port 4444
netconsole: remote IP 10.0.1.16
netconsole: remote ethernet address 00:19:xx:xx:xx:xx
netconsole: device eth0 not up yet, forcing it
eth0: setting half-duplex.
eth0: setting full-duplex.
------------[ cut here ]------------
WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0()
Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091
[<c0125ecf>] warn_on_slowpath+0x4f/0x70
[<c0126834>] ? release_console_sem+0x1b4/0x1d0
[<c0126d00>] ? vprintk+0x2a0/0x450
[<c012fde5>] ? __mod_timer+0xa5/0xc0
[<c046f7fd>] ? mdio_sync+0x3d/0x50
[<c0160ef6>] ? marker_probe_cb+0x46/0xa0
[<c0126ed7>] ? printk+0x27/0x50
[<c046f4c3>] ? vortex_set_duplex+0x43/0xc0
[<c046f521>] ? vortex_set_duplex+0xa1/0xc0
[<c0471b92>] ? vortex_timer+0xe2/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b361>] local_bh_enable_ip+0xd1/0xe0
[<c08d9f9f>] _spin_unlock_bh+0x2f/0x40
[<c0471b92>] vortex_timer+0xe2/0x3e0
[<c014743b>] ? trace_hardirqs_on+0xb/0x10
[<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160
[<c012f8b2>] run_timer_softirq+0x162/0x1c0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c0471ab0>] ? vortex_timer+0x0/0x3e0
[<c012b60a>] __do_softirq+0x9a/0x160
[<c012b570>] ? __do_softirq+0x0/0x160
[<c0106775>] call_on_stack+0x15/0x30
[<c012b4f5>] ? irq_exit+0x55/0x60
[<c0106e85>] ? do_IRQ+0x85/0xd0
[<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160
[<c0104888>] ? common_interrupt+0x28/0x30
[<c08d8ac8>] ? mutex_unlock+0x8/0x10
[<c08d8180>] ? _cond_resched+0x10/0x30
[<c07a3be7>] ? netpoll_setup+0x117/0x390
[<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0
[<c013d539>] ? ktime_get+0x19/0x40
[<c0c9bab2>] ? kernel_init+0x1b2/0x2c0
[<c0cbfbb0>] ? init_netconsole+0x0/0x1b0
[<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0103f12>] ? restore_nocheck_notrace+0x0/0xe
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0c9b900>] ? kernel_init+0x0/0x2c0
[<c0104aa7>] ? kernel_thread_helper+0x7/0x10
=======================
---[ end trace 37f9c502aff112e0 ]---
console [netcon0] enabled
netconsole: network logging started
initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs
looking at the driver I think the bug is real and the fix actually
is trivial.
vp->lock is also taken in hardware IRQ context, so we _have_ to always
use irqsafe locking. As we run in a timer with IRQs disabled,
we can simply use spin_lock.
This fixes a possible NULL pointer dereference in an error path of the
DMA allocation error checking code. This is also necessary for a future
DMA API change that is on its way into the mainline kernel that adds
an additional dev parameter to dma_mapping_error().
Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Originally based on an Ubuntu patch that got it wrong, the dmidecode
output of the corresponding laptops shows LENOVO as the manufacturer.
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/133636
Cc: Klaus S. Madsen <ubuntu@hjernemadsen.org> Cc: Chuck Short <zulcss@ubuntu.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: maximilian attems <max@stro.at> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As has been discussed several times on LKML, IRQF_SHARED | IRQF_DISABLED
doesn't work reliably, i.e. a shared interrupt handler CAN'T be certain to
be called with interrupts disabled. Most USB HCD handlers use IRQF_DISABLED
and therefore havoc can break out if they share their interrupt with a
handler that doesn't use it.
On my test machine the yenta_socket interrupt handler (no IRQF_DISABLED)
was registered before ehci_hcd and one uhci_hcd instance. Therefore all
usb_hcd_irq() invocations for ehci_hcd and for one uhci_hcd instance
happened with interrupts enabled. That led to random lockups as USB core
HCD functions that acquire the same spinlock could be called twice
from interrupt handlers.
This patch updates usb_hcd_irq() to always disable/restore interrupts.
usb_add_hcd() will silently remove any IRQF_DISABLED requested from HCD code.
Signed-off-by: Stefan Becker <stefan.becker@nokia.com> Acked-by: David Brownell <david-b@pacbell.net> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a problem with OHCI where canceling bulk or
interrupt URBs may lose track of the right data toggle. This
seems to be a longstanding bug, possibly dating back to the
Linux 2.4 kernel, which stayed hidden because
(a) about half the time the data toggle bit was correct;
(b) canceling such URBs is unusual; and
(c) the few drivers which cancel these URBs either
[1] do it only as part of shutting down, or
[2] have fault recovery logic, which recovers.
For those transfer types, the toggle is normally written back
into the ED when each TD is retired. But canceling bypasses
the mechanism used to retire TDs ... so on average, half the
time the toggle bit will be invalid after cancelation.
The fix is simple: the toggle state of any canceled TDs are
propagated back to the ED in the finish_unlinks function.
(Issue found by leonidv11@gmail.com ...)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Leonid <leonidv11@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a regression in the EHCI driver's TIMER_IO_WATCHDOG
behavior. The patch "USB: EHCI: add separate IAA watchdog timer" changed
how that timer is handled, so that short timeouts on the remaining
timer (unfortunately, overloaded) would never be used.
This takes a more direct approach, reorganizing the code slightly to
be explicit about only the I/O watchdog role now being overridable.
It also replaces a now-obsolete comment describing older timer behavior.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Leonid <leonidv11@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the SM501 and another platform driver, such as the SM501
then we end up defining PLATFORM_DRIVER twice. This patch
seperated the SM501 onto a seperate define of SM501_OHCI_DRIVER
so that it can be selected without overwriting the original
definition.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.
This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:
Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.
If, while assembling an array, we find a device which is not fully
in-sync with the array, it is important to set the "fullsync" flags.
This is an exact analog to the setting of this flag in hot_add_disk
methods.
Currently, only v1.x metadata supports having devices in an array
which are not fully in-sync (it keep track of how in sync they are).
The 'fullsync' flag only makes a difference when a write-intent bitmap
is being used. In this case it tells recovery to ignore the bitmap
and recovery all blocks.
This fix is already in place for raid1, but not raid5/6 or raid10.
So without this fix, a raid1 ir raid4/5/6 array with version 1.x
metadata and a write intent bitmaps, that is stopped in the middle
of a recovery, will appear to complete the recovery instantly
after it is reassembled, but the recovery will not be correct.
If you might have an array like that, issueing
echo repair > /sys/block/mdXX/md/sync_action
will make sure recovery completes properly.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We shouldn't acknowledge that a stripe has been expanded (When
reshaping a raid5 by adding a device) until the moved data has
actually been written out. However we are currently
acknowledging (by calling md_done_sync) when the POST_XOR
is complete and before the write.
So track in s.locked whether there are pending writes, and don't
call md_done_sync yet if there are.
Note: we all set R5_LOCKED on devices which are are about to
read from. This probably isn't technically necessary, but is
usually done when writing a block, and justifies the use of
s.locked here.
This bug can lead to a crash if an array is stopped while an reshape
is in progress.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
md_probe can fail (e.g. alloc_disk could fail) without
returning an error (as it alway returns NULL).
So when we call mddev_find immediately afterwards, we need
to check that md_probe actually succeeded. This means checking
that mdev->gendisk is non-NULL.
Cc: Dave Jones <davej@redhat.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
AS scheduler alternates between issuing read and write batches. It does
the batch switch only after all requests from the previous batch are
completed.
When switching to a write batch, if there is an on-going read request,
it waits for its completion and indicates its intention of switching by
setting ad->changed_batch and the new direction but does not update the
batch_expire_time for the new write batch which it does in the case of
no previous pending requests.
On completion of the read request, it sees that we were waiting for the
switch and schedules work for kblockd right away and resets the
ad->changed_data flag.
Now when kblockd enters dispatch_request where it is expected to pick
up a write request, it in turn ends the write batch because the
batch_expire_timer was not updated and shows the expire timestamp for
the previous batch.
This results in the write starvation for all the cases where there is
the intention for switching to a write batch, but there is a previous
in-flight read request and the batch gets reverted to a read_batch
right away.
This also holds true in the reverse case (switching from a write batch
to a read batch with an in-flight write request).
I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and
linux-2.6-block git HEAD. I've tested the fix on x86 platforms with
SCSI drives where the driver asks for the next request while a current
request is in-flight.
This patch is based off linux-2.6-block git HEAD.
Bug reproduction:
A simple scenario which reproduces this bug is:
- dd if=/dev/hda3 of=/dev/null &
- lilo
The lilo takes forever to complete.
This can also be reproduced fairly easily with the earlier dd and
another test
program doing msync().
The example test program below should print out a message after every
iteration
but it simply hangs forever. With this bugfix it makes forward progress.
====
Example test program using msync() (thanks to suleiman AT google DOT
com)
Johannes Berg [Thu, 3 Jul 2008 01:36:31 +0000 (20:36 -0500)]
mac80211: detect driver tx bugs
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.
Thanks to Larry Finger <Larry.Finger@lwfinger.net> for doing the -stable
port.
Cc: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Buesch [Thu, 3 Jul 2008 00:04:33 +0000 (02:04 +0200)]
b43: Fix possible MMIO access while device is down
This fixes a possible MMIO access while the device is still down
from a suspend cycle. MMIO accesses with the device powered down
may cause crashes on certain devices.
Michael Buesch [Wed, 2 Jul 2008 23:04:29 +0000 (01:04 +0200)]
b43: Do not return TX_BUSY from op_tx
Never return TX_BUSY from op_tx. It doesn't make sense to return
TX_BUSY, if we can not transmit the packet.
Drop the packet and return TX_OK.
This will fix the resume hang.
In function _cpu_up, the panic happens when calling
__raw_notifier_call_chain at the second time. Kernel doesn't panic when
calling it at the first time. If just say because of nr_cpu_ids, that's
not right.
By checking the source code, I found that function do_boot_cpu is the culprit.
Consider below call chain:
_cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
So do_boot_cpu is called in the end. In do_boot_cpu, if
boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later
on, when _cpu_up calls __raw_notifier_call_chain at the second time to
report CPU_UP_CANCELED, because this cpu is already cleared from
cpu_possible_map, get_cpu_sysdev returns NULL.
Many resources are related to cpu_possible_map, so it's better not to
change it.
Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
cpu_possible_map.
struct user_fxsr_struct {
unsigned short cwd;
unsigned short swd;
unsigned short twd;
unsigned short fop;
long fip;
long fcs;
long foo;
long fos;
long mxcsr;
long reserved;
long st_space[32]; /* 8*16 bytes for each FP-reg = 128 bytes */
long xmm_space[32]; /* 8*16 bytes for each XMM-reg = 128 bytes */
long padding[56];
};
int main(void)
{
pid_t pid;
pid = fork();
switch(pid){
case -1:/* error */
break;
case 0:/* child */
child();
break;
default:
parent(pid);
break;
}
return 0;
}
int child(void)
{
ptrace(PTRACE_TRACEME);
kill(getpid(), SIGSTOP);
sleep(10);
return 0;
}
int parent(pid_t pid)
{
int ret;
struct user_fxsr_struct fpxregs;
the CPU hotplug problems (crashes under high-volume unplug+replug
tests) seem to be related to migrate_dead_tasks().
Firstly I added traces to see all tasks being migrated with
migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem
pops up (the one with "se == NULL" in the loop of
pick_next_task_fair()) shortly after the traces indicate that some has
been migrated with migrate_dead_tasks()). btw., I can reproduce it
much faster now with just a plain cpu down/up loop.
[disclaimer] Well, unless I'm really missing something important in
this late hour [/desclaimer] pick_next_task() is not something
appropriate for migrate_dead_tasks() :-)
the following change seems to eliminate the problem on my setup
(although, I kept it running only for a few minutes to get a few
messages indicating migrate_dead_tasks() does move tasks and the
system is still ok)
On 9xx chips, bus mastering needs to be enabled at resume time for much of the
chip to function. With this patch, vblank interrupts will work as expected
on resume, along with other chip functions. Fixes kernel bugzilla #10844.
Signed-off-by: Jie Luo <clotho67@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Current memfree FW has a bug which in some cases, assumes that ICM
pages passed to it are cleared. This patch uses __GFP_ZERO to
allocate all ICM pages passed to the FW. Once firmware with a fix is
released, we can make the workaround conditional on firmware version.
This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here:
http://lists.openfabrics.org/pipermail/general/2008-May/050026.html
[ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing
ICM memory and memset()ing it to 0. - Roland ]
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.
David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space futex variable are out
of sync. First we assumed that this is a user space data corruption,
but deeper investigation revieled that the problem happend because the
pi-futex code is not handling a fault in the futex_lock_pi path when
the user space variable needs to be fixed up.
The fault happens when a fork mapped the anon memory which contains
the futex readonly for COW or the page got swapped out exactly between
the unlock of the futex and the return of either the new futex owner
or the task which was the expected owner but failed to acquire the
kernel internal rtmutex. The current futex_lock_pi() code drops out
with an inconsistent in case it faults and returns -EFAULT to user
space. User space has no way to fixup that state.
When we wrote this code we thought that we could not drop the hash
bucket lock at this point to handle the fault.
After analysing the code again it turned out to be wrong because there
are only two tasks involved which might modify the pi_state and the
user space variable:
- the task which acquired the rtmutex
- the pending owner of the pi_state which did not get the rtmutex
Both tasks drop into the fixup_pi_state() function before returning to
user space. The first task which acquired the hash bucket lock faults
in the fixup of the user space variable, drops the spinlock and calls
futex_handle_fault() to fault in the page. Now the second task could
acquire the hash bucket lock and tries to fixup the user space
variable as well. It either faults as well or it succeeds because the
first task already faulted the page in.
One caveat is to avoid a double fixup. After returning from the fault
handling we reacquire the hash bucket lock and check whether the
pi_state owner has been modified already.
Reported-by: David Holmes <david.holmes@sun.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Holmes <david.holmes@sun.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Cox [Fri, 27 Jun 2008 14:21:55 +0000 (15:21 +0100)]
TTY: fix for tty operations bugs
This is fixed with the recent tty operations rewrite in mainline in a
different way, this is a selective backport of the relevant portions to
the -stable tree.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
"This broke vmware 6.0.4.
Jun 22 14:53:03.845: vmx| NOT_IMPLEMENTED
/build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774"
and the reason seems to be that there's an old bug in how we handle do
FOLL_ANON on VM_SHARED areas in get_user_pages(), but since it only
triggered if the whole page table was missing, nobody had apparently hit
it before.
The recent changes to 'follow_page()' made the FOLL_ANON logic trigger
not just for whole missing page tables, but for individual pages as
well, and exposed this problem.
This fixes it by making the test for when FOLL_ANON is used more
careful, and also makes the code easier to read and understand by moving
the logic to a separate inline function.
data->max_duty_at_overheat is not updated in adt7473_update_device,
so it might be used before it is initialized (if the user reads from
sysfs file max_duty_at_crit before writing to it.)
Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Mon, 23 Jun 2008 08:14:26 +0000 (10:14 +0200)]
hwmon: (lm85) Fix function RANGE_TO_REG()
Function RANGE_TO_REG() is broken. For a requested range of 2000 (2
degrees C), it will return an index value of 15, i.e. 80.0 degrees C,
instead of the expected index value of 0. All other values are handled
properly, just 2000 isn't.
The bug was introduced back in November 2004 by this patch:
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=1c28d80f1992240373099d863e4996cdd5d646d0
In Linus' kernel I decided to rewrite the whole function in a way
which was more obviously correct. But for -stable let's just do the
minimal fix.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The inline assembly in drivers/watchdog/hpwdt.c was incredibly broken,
and included all the function prologue and epilogue stuff, even though
it was itself then inside a C function where the compiler would add its
own prologue and epilogue on top of it all.
This then just _happened_ to work if you had exactly the right compiler
version and exactly the right compiler flags, so that gcc just happened
to not create any prologue at all (the gcc-generated epilogue wouldn't
matter, since it would never be reached).
But the more proper way to fix it is to simply not do this. Move the
inline asm to the top level, with no surrounding function at all (the
better alternative would be to remove the prologue and make it actually
use proper description of the arguments to the inline asm, but that's a
bigger change than the one I'm willing to make right now).
When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging"). This means, in theory,
we could have up to 52 bits of physical address in a pte.
The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide. Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.
This is a bugfix for two cases:
1. running a 32-bit PAE kernel on a machine with
more than 64GB RAM.
2. running a 32-bit PAE Xen guest on a host machine with
more than 64GB RAM
In both cases, a pte could need to have more than 36 bits of physical,
and masking it to 36-bits will cause fairly severe havoc.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit 557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") removed
the ZERO_PAGE from the VM mappings, any users of get_user_pages() will
generally now populate the VM with real empty pages needlessly.
We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but
since fault handling no longer uses ZERO_PAGE for new anonymous pages,
we now need to handle that special case in follow_page() instead.
In particular, the removal of ZERO_PAGE effectively removed the core
file writing optimization where we would skip writing pages that had not
been populated at all, and increased memory pressure a lot by allocating
all those useless newly zeroed pages.
This reinstates the optimization by making the unmapped PTE case the
same as for a non-existent page table, which already did this correctly.
While at it, this also fixes the XIP case for follow_page(), where the
caller could not differentiate between the case of a page that simply
could not be used (because it had no "struct page" associated with it)
and a page that just wasn't mapped.
We do that by simply returning an error pointer for pages that could not
be turned into a "struct page *". The error is arbitrarily picked to be
EFAULT, since that was what get_user_pages() already used for the
equivalent IO-mapped page case.
[ Also removed an impossible test for pte_offset_map_lock() failing:
that's not how that function works ]
The atl1 driver tries to determine the MAC address thusly:
- If an EEPROM exists, read the MAC address from EEPROM and
validate it.
- If an EEPROM doesn't exist, try to read a MAC address from
SPI flash.
- If that fails, try to read a MAC address directly from the
MAC Station Address register.
- If that fails, assign a random MAC address provided by the
kernel.
We now have a report of a system fitted with an EEPROM containing all
zeros where we expect the MAC address to be, and we currently handle
this as an error condition. Turns out, on this system the BIOS writes
a valid MAC address to the NIC's MAC Station Address register, but we
never try to read it because we return an error when we find the all-
zeros address in EEPROM.
This patch relaxes the error check and continues looking for a MAC
address even if it finds an illegal one in EEPROM.
http://ubuntuforums.org/showthread.php?t=562617
[jacliburn@bellsouth.net: backport to 2.6.25.7]
Signed-off-by: Radu Cristescu <advantis@gmx.net> Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.
If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.
Thus HLT saves more power than MWAIT here.
It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD
Family 10)
Re-add the AMD families 10H/11H check and disable the mwait usage for
those.
| powertop shows between 200-400 wakeups/second with the description
| "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
| I need to run two busy-loops on my 2-CPU system for this to show up).
|
| The bisect resulted in this commit:
|
| commit 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec
| Date: Wed Jan 30 13:33:16 2008 +0100
|
| x86: use the correct cpuid method to detect MWAIT support for C states
remove the functional effects of this patch and make mwait unconditional.
A future patch will turn off mwait on specific CPUs where that causes
power to be wasted.
Properly free h323_buffer when helper registration fails.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The H.245 helper is not registered/unregistered, but assigned to
connections manually from the Q.931 helper. This means on unload
existing expectations and connections using the helper are not
cleaned up, leading to the following oops on module unload:
One way to fix this would be to split helper cleanup from the unregistration
function and invoke it for the H.245 helper, but since ctnetlink needs to be
able to find the helper for synchonization purposes, a better fix is to
register it normally and make sure its not assigned to connections during
helper lookup. The missing l3num initialization is enough for this, this
patch changes it to use AF_UNSPEC to make it more explicit though.
Reported-by: liannan <liannan@twsz.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>