Mmapped files were able to trigger a lock ordering bug. Private
maps do not need to take the glock so early on. Shared maps do
unfortunately, however we can get around that by adding a flag
into the flags for the struct gfs2_file. This only works because
we are taking an exclusive lock at this point, so we know that
nobody else can be racing with us.
Fixes Red Hat bugzilla: #201196
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This was a nasty bug which resulted in corruption of hash tables
in the directory code with larger directories. We forgot to
increment a pointer in the read/write routines internal to the
directory code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3734/1: Fix the unused variable warning in __iounmap()
[ARM] 3737/1: Export ARM copy/clear_user_page symbols
[ARM] 3736/1: xscale: don't mis-report 80219 as an iop32x
[ARM] 3733/2: S3C24XX: Remove old IDE registers in Anubis
[ARM] 3732/1: S3C24XX: tidy syntax in osiris and anubis machines
[ARM] Fix SMP booting
[ARM] 3731/1: Allow IRQ definitions of IQ80331 and IQ80332 to co-exist
[ARM] 3730/1: ep93xx: enable usb ohci driver in the defconfig
[ARM] Fix cats build
It was broken before. But having it is important as possible hardware
bug workaround.
And previously there was no way to force swiotlb if there is another IOMMU.
Side effect is that iommu=force won't force swiotlb anymore even if there
isn't another IOMMU.
Calgary hits a NULL pointer dereference when booting in a multi-chassis
NUMA system. See Redhat bugzilla number 198498, found by Konrad
Rzeszutek (konradr@redhat.com).
There are many issues that had to be resolved to fix this problem.
Firstly when I originally wrote the code to handle NUMA systems, I
had a large misunderstanding that was not corrected until now. That was
that I thought the "number of nodes online" referred to number of
physical systems connected. So that if NUMA was disabled, there
would only be 1 node and it would only show that node's PCI bus.
In reality if NUMA is disabled, the system displays all of the
connected chassis as one node but is only ignorant of the delays
in accessing main memory. Therefore, references to num_online_nodes()
and MAX_NUMNODES are incorrect and need to be set to the maximum
number of nodes that can be accessed (which are 8). I created a
variable, MAX_NUM_CHASSIS, and set it to 8 to fix this.
Secondly, when walking the PCI in detect_calgary, the code only
checked the first "slot" when looking to see if a device is present.
This will work for most cases, but unfortunately it isn't always the
case. In the NUMA MXE drawers, there are USB devices present on the
3rd slot (with slot 1 being empty). So, to work around this, all
slots (up to 8) are scanned to see if there are any devices present.
Lastly, the bus is being enumerated on large systems in a different
way the we originally thought. This throws the ugly logic we had
out the window. To more elegantly handle this, I reorganized the
kva array to be sparse (which removed the need to have any bus number
to kva slot logic in tce.c) and created a secondary space array to
contain the bus number to phb mapping.
With these changes Calgary boots on an x460 with 4 nodes with and
without NUMA enabled.
George G. Davis [Sat, 29 Jul 2006 07:29:27 +0000 (08:29 +0100)]
[ARM] 3737/1: Export ARM copy/clear_user_page symbols
Patch from George G. Davis
As reported by various folks on the ARM Linux kernel mailing list,
the video-buf.ko driver has undefined references on all ARM machines
which use it as observed during `make modules`:
Similar warnings exist for all ARM machines which use this driver.
So this change adds the missing EXPORT_SYMBOLs to allow using this
driver as a module.
Signed-off-by: George G. Davis <gdavis@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 3736/1: xscale: don't mis-report 80219 as an iop32x
Patch from Lennert Buytenhek
The IOP 80219 xscale CPU is a stripped down version of the IOP32x.
But the fact that the 80219 and IOP32x are very similar doesn't mean
that they need to share a cpu table entry. It's also somewhat confusing
for the end user to see the 80219 reported as an IOP32x, so this patch
splits the IOP32x cpu table entry to make a separate entry for the
80219.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] myri10ge - Always do a dummy RDMA after loading the firmware
Always do a dummy RDMA after loading the firmware to work around
buggy PCIe chipsets which do not implement resending properly.
This is so cheap as to be almost free, and should never have been
conditional on the tx boundary != 4096.
Signed-off-by: Brice Goglin <brice@myri.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix robust PI-futexes to be properly unlocked on unexpected exit.
For this to work the kernel has to know whether a futex is a PI or a
non-PI one, because the semantics are different. Since the space in
relevant glibc data structures is extremely scarce, the best solution is
to encode the 'PI' information in bit 0 of the robust list pointer.
Existing (non-PI) glibc robust futexes have this bit always zero, so the
ABI is kept. New glibc with PI-robust-futexes will set this bit.
Further fixes from Thomas Gleixner <tglx@linutronix.de>
Fix pi_state->list handling bugs: list handling mishap, locking error.
Plus add more debug checks and fix a few style issues i noticed while
debugging this.
The dwarf2 unwinder currently often gets stuck because a lot
of assembly code doesn't have proper dwarf2 annotiation yet.
This currently often happens with __down. Should fix this by
adding proper dwarf2 annotation to all inline assembly. However
until that's done we need a quick fix for 2.6.18 to avoid
incomplete backtraces.
So when this happens dump the rest of the stack with the old unwinder
instead of silently not dumping it. There was already a optional
"both" mode that dumped both, but that was too ugly.
I also clarified the headers for the different backtraces a bit.
Also add a clear error message for missing dwarf2
annotation that people can work on.
And I removed a dead variable left over from Ingo's changes.
bibo mao [Fri, 28 Jul 2006 12:44:48 +0000 (14:44 +0200)]
[PATCH] x86_64: Enlarge debug stack for nested kprobes
In x86_64 platform, INT1 and INT3 trap stack is IST stack called DEBUG_STACK,
when INT1/INT3 trap happens, system will switch to DEBUG_STACK by hardware.
Current DEBUG_STACK size is 4K, when int1/int3 trap happens, kernel will
minus current DEBUG_STACK IST value by 4k. But if int3/int1 trap is nested,
it will destroy other vector's IST stack. This patch modifies this, it sets
DEBUG_STACK size as 8K and allows two level of nested int1/int3 trap.
Kprobe DEBUG_STACK may be nested, because kprobe handler may be probed
by other kprobes.
Thanks jbeulich for pointing out error in the first patch.
[AK: nested kprobes are pretty dubious. Hopefully one nest
will be enough. This will cost 8K per CPU (4K more than before)]
[PATCH] x86_64: Don't clobber r8-r11 in int 0x80 handler
When int 0x80 is called from long mode r8-r11 would leak out of the
kernel (or rather they would be filled with some values from
the kernel stack). I don't think it's a security issue because
the values come from the fixed stack frame which should be near
always user registers from a previous interrupt.
Still better fix it.
Longer term the register save macros need to be cleaned up
to avoid such mistakes in the future.
Original analysis from Richard Brunner, fix by me.
[PATCH] i386/x86-64: Add user_mode checks to profile_pc for oprofile
Fixes a obscure user space triggerable crash during oprofiling.
Oprofile calls profile_pc from NMIs even when user_mode(regs) is not true and
the program counter is inside the kernel lock section. This opens
a race - when a user program jumps to a kernel lock address and
a NMI happens before the illegal page fault exception is raised
and the program has a unmapped esp or ebp then the kernel could
oops. NMIs have a higher priority than exceptions so that could
happen.
Add user_mode checks to i386/x86-64 profile_pc to prevent that.
* git://oss.sgi.com:8090/nathans/xfs-rc-2.6:
[XFS] Ensure bulkstat from an invalid inode number gets caught always with
[XFS] Fix a barrier related forced shutdown on mounts with quota enabled.
[XFS] Fix remount vs no/barrier options by ensuring we clear unwanted
[XFS] All xfs_disk_dquot_t values are (as the name says) disk endian.
[PATCH] ide: option to disable cache flushes for buggy drives
Some drives claim they support cache flushing, but get seriously
confused if you try. Add this option to be able to boot with
barriers enabled by default.
David S. Miller [Thu, 27 Jul 2006 23:49:21 +0000 (16:49 -0700)]
[SPARC64]: Fix quad-float multiply emulation.
Something is wrong with the 3-multiply (vs. 4-multiply) optimized
version of _FP_MUL_MEAT_2_*(), so just use the slower version
which actually computes correct values.
Noticed by Rene Rebe
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Fri, 14 Jul 2006 15:41:47 +0000 (11:41 -0400)]
[PATCH] orinoco: fix setting transmit key only
When determining whether there's a key to set or not, orinoco should be
looking at the key length, not the key data. Otherwise confusion reigns
when trying to set TX key only, passing in zero-length key, but non-NULL
pointer. Key length takes precedence over non-NULL key data.
Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Daniel Drake [Tue, 11 Jul 2006 22:16:34 +0000 (23:16 +0100)]
[PATCH] softmac: do shared key auth in workqueue
Johann Uhrmann reported a bcm43xx crash and Michael Buesch tracked
it down to a problem with the new shared key auth code (recursive
calls into the driver)
This patch (effectively Michael's patch with a couple of small
modifications) solves the problem by sending the authentication
challenge response frame from a workqueue entry.
I also removed a lone \n from the bcm43xx messages relating to
authentication mode - this small change was previously discussed but
not patched in.
Signed-off-by: Daniel Drake <dsd@gentoo.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Robert Schulze [Wed, 5 Jul 2006 20:52:43 +0000 (22:52 +0200)]
[PATCH] airo: should select crypto_aes
The driver airo (for Cisco Wlan-Cards) complains about "failed to load
transform for AES", when it is loaded and CRYPTO_AES is not selected
in Kconfig.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This means that we don't need to create a special inode just to contain
a struct address_space in order to read a single disk block. Instead
we read the disk block directly. Its slightly faster, and uses slightly
less memory, but the real reason for doing this is that it removes a
special case from the glock code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
[S390] permanent subchannel busy conditions may cause I/O stall
In special conditions where a subchannel rejects the HALT I/O-
instruction with a busy indication (cc 2), I/O may stall.
I/O request termination logic retries HALT I/O indefinitely
because it expects HALT I/O to alter the subchannel status which
is not true when cc 2 is returned.
In case of a busy indication, try CLEAR I/O instruction immediately.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
David Teigland [Wed, 26 Jul 2006 19:31:15 +0000 (15:31 -0400)]
[DLM] fix i_private
> I think you must have an old version of the base kernel as well?
> i_private no longer exists in struct inode, so you'll have to use
> something else,
I have that patch in my stack but didn't send it; for some reason I
thought it was already changed in your git tree.
Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
David Teigland [Wed, 26 Jul 2006 13:29:06 +0000 (08:29 -0500)]
[DLM] fix broken patches
On Wed, Jul 26, 2006 at 10:47:14AM +0100, Steven Whitehouse wrote:
> Hi,
>
> I've applied all the patches you sent, but they don't build:
Argh, sorry about that... when I fixed these a long time ago they somehow
never got included in the quilt patches. I mistakenly assumed the quilt
patches matched the source I had in front of me.
Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The remaining routines in page.c were all only used in one other
file, so they are now moved into the files where they are referenced
and made static. Thus page.[ch] are no longer required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
a) Moving it into bmap.c
b) Making it static
c) Calling it directly from gfs2_unstuff_dinode
d) Updating all callers of gfs2_unstuff_dinode due to one less
required argument.
It doesn't change the behaviour at all.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
[PATCH] fix compile regression for a few scsi drivers
This fixes three drivers to compile again after my patch that removes
the data_cmnd member from struct scsi_cmnd.
The fas216 change is trivial, it should have been using ->cmnd all the
time.
NCR53C9 (which seem to be mostly duplicate driver with esp.c!) is doing
something odd, it should only have looked at ->cmnd before not the saved
copy that is kept for the error handlers sake. Note that it really
should deal with the sync setting themselves but use the generic domain
validation code that get this right - but that's for later let's push
this simple compile fix for now.
And sorry for the late fix for this, I have been busy with OLS and
associated activities last week.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SCSI] esp: Fix build.
[SPARC]: Fix SA_STATIC_ALLOC value.
[SPARC64]: Explicitly print return PC when the kernel fault PC is bogus.
Arjan van de Ven [Wed, 26 Jul 2006 13:40:07 +0000 (15:40 +0200)]
[PATCH] Reorganize the cpufreq cpu hotplug locking to not be totally bizare
The patch below moves the cpu hotplugging higher up in the cpufreq
layering; this is needed to avoid recursive taking of the cpu hotplug
lock and to otherwise detangle the mess.
The new rules are:
1. you must do lock_cpu_hotplug() around the following functions:
__cpufreq_driver_target
__cpufreq_governor (for CPUFREQ_GOV_LIMITS operation only)
__cpufreq_set_policy
2. governer methods (.governer) must NOT take the lock_cpu_hotplug()
lock in any way; they are called with the lock taken already
3. if your governer spawns a thread that does things, like calling
__cpufreq_driver_target, your thread must honor rule #1.
4. the policy lock and other cpufreq internal locks nest within
the lock_cpu_hotplug() lock.
I'm not entirely happy about how the __cpufreq_governor rule ended up
(conditional locking rule depending on the argument) but basically all
callers pass this as a constant so it's not too horrible.
The patch also removes the cpufreq_governor() function since during the
locking audit it turned out to be entirely unused (so no need to fix it)
The patch works on my testbox, but it could use more testing
(otoh... it can't be much worse than the current code)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Teigland [Tue, 25 Jul 2006 18:59:48 +0000 (13:59 -0500)]
[DLM] fix loop in grant_after_purge
The loop in grant_after_purge is intended to find all rsb's in each hash
bucket that have the LOCKS_PURGED flag set. The loop was quitting the
current bucket after finding just one rsb instead of going until there are
no more.
Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
David Teigland [Tue, 25 Jul 2006 18:53:33 +0000 (13:53 -0500)]
[DLM] set purged flag on rsbs
If a node becomes the new master of an rsb during recovery, the
LOCKS_PURGED flag needs to be set on it so that any waiting/converting
locks will try to be granted.
Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The recvmsg() for raw socket seems to return random u16 value
from the kernel stack memory since port field is not initialized.
But I'm not sure this patch is correct.
Does raw socket return any information stored in port field?
[ BSD defines RAW IP recvmsg to return a sin_port value of zero.
This is described in Steven's TCP/IP Illustrated Volume 2 on
page 1055, which is discussing the BSD rip_input() implementation. ]
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
IP multicast route code was reusing an skb which causes use after free
and double free.
From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Note, it is real skb_clone(), not alloc_skb(). Equeued skb contains
the whole half-prepared netlink message plus room for the rest.
It could be also skb_copy(), if we want to be puristic about mangling
cloned data, but original copy is really not going to be used.
Acked-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
As per comments received, alter the GFS2 direct I/O path so that
it uses the standard read functions "out of the box". Needs a
small change to one of the VFS functions. This reduces the size
of the code quite a lot and also removes the need for one new export.
Some more work remains to be done, but this is the bones of the
thing.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
[PATCH] cciss: fix stall with softirq handling and CFQ
We need to postpone the queue startup until after the softirq
handler has actually finished some requests, otherwise we could
be racing with cciss_softirq_done() and not actually restart
the queue handling.
Patrick McHardy [Tue, 25 Jul 2006 05:54:55 +0000 (22:54 -0700)]
[NETFILTER]: bridge netfilter: add deferred output hooks to feature-removal-schedule
Add bridge netfilter deferred output hooks to feature-removal-schedule
and disable them by default. Until their removal they will be
activated by the physdev match when needed.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Oester [Tue, 25 Jul 2006 05:54:14 +0000 (22:54 -0700)]
[NETFILTER]: xt_pkttype: fix mismatches on locally generated packets
Locally generated broadcast and multicast packets have pkttype set to
PACKET_LOOPBACK instead of PACKET_BROADCAST or PACKET_MULTICAST. This
causes the pkttype match to fail to match packets of either type.
The below patch remedies this by using the daddr as a hint as to
broadcast|multicast. While not pretty, this seems like the only way
to solve the problem short of just noting this as a limitation of the
match.
This resolves netfilter bugzilla #484
Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 25 Jul 2006 05:52:47 +0000 (22:52 -0700)]
[NETFILTER]: nf_queue: handle NF_STOP and unknown verdicts in nf_reinject
In case of an unknown verdict or NF_STOP the packet leaks. Unknown verdicts
can happen when userspace is buggy. Reinject the packet in case of NF_STOP,
drop on unknown verdicts.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Jul 2006 05:47:14 +0000 (22:47 -0700)]
[SCSI] esp: Fix build.
The data_cmd[] member got deleted, so do not use it any more. Scsi
commands do not have their ->cmd[] overwritten temporary to probe for
status after an error before retrying.
Signed-off-by: David S. Miller <davem@davemloft.net>
skbuff.h has an #ifndef CONFIG_HAVE_ARCH_DEV_ALLOC_SKB to allow
architectures to reimplement __dev_alloc_skb. It's not set on any
architecture and now that we have an architecture-overrideable
NET_SKB_PAD there is not point at all to have one either.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Rompf [Mon, 24 Jul 2006 20:52:13 +0000 (13:52 -0700)]
[VLAN]: Fix link state propagation
When the queue of the underlying device is stopped at initialization time
or the device is marked "not present", the state will be propagated to the
vlan device and never change. Based on an analysis by Patrick McHardy.
Signed-off-by: Stefan Rompf <stefan@loplof.de> ACKed-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>