David Chinner [Tue, 11 Apr 2006 05:11:20 +0000 (15:11 +1000)]
[XFS] Fix an inode use-after-free durin an unpin. When reclaiming inodes
that have been unlinked, we may need to execute transactions during
reclaim. By the time the transaction has hit the disk, the linux inode and
xfs vnode may already have been freed so we can't reference them safely.
Use the known xfs inode state to determine if it is safe to reference the
vnode and linux inode during the unpin operation.
David Chinner [Tue, 11 Apr 2006 05:11:12 +0000 (15:11 +1000)]
[XFS] Fix inode reclaim scalability regression. When a filesystem has
millions of inodes cached and has sparse cluster population, removing
inodes from the cluster hash consumes excessive amounts of CPU time.
Reduce the CPU cost by making removal O(1) via use of a double linked list
for the hash chains.
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3473/1: Use numbers 0-15 for the VFP double registers
[ARM] 3472/1: Use the D variants of FLDMIA/FSTMIA on ARMv6
[ARM] 3471/1: FTOSI functions should return 0 for NaN
[ARM] 3470/1: Clear the HWCAP bits for the disabled kernel features
[ARM] 3469/1: S3C24XX: clkout missing hclk selector
[ARM] 3468/1: S3C2410: SMDK common include fix
[ARM] 3461/1: ARM: OMAP: Fix clk_get() when using id and name
[ARM] 3460/1: ARM: OMAP: Remove unnecessary nop_release()
[ARM] 3459/1: ixp23xx: fix debug serial macros for big-endian operation
[ARM] Allow decompressor to be built with -ffunction-sections
[ARM] Fix SA110/SA1100 cache flushing
[ARM] ebsa110: Fix incorrect serial port address
[ARM] Fix ebsa110 debug macros
[ARM] Move FLUSH_BASE macros to asm/arch/memory.h
[ARM] Remove unnecessary extra parens in include/asm-arm/memory.h
[ARM] arm's arch_local_page_offset() fix against 2.6.17-rc1
Merge branch 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2
* 'upstream-linus' of git://oss.oracle.com/home/sourcebo/git/ocfs2:
[PATCH] CONFIGFS_FS must depend on SYSFS
[PATCH] Bogus NULL pointer check in fs/configfs/dir.c
ocfs2: Better I/O error handling in heartbeat
ocfs2: test and set teardown flag early in user_dlm_destroy_lock()
ocfs2: Handle the DLM_CANCELGRANT case in user_unlock_ast()
ocfs2: catch an invalid ast case in dlmfs
ocfs2: remove an overly aggressive BUG() in dlmfs
ocfs2: multi node truncate fix
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[ARM] 3473/1: Use numbers 0-15 for the VFP double registers
Patch from Catalin Marinas
This patch changes the double registers numbering to 0-15 from even 0-30,
in preparation for future VFP extensions. It also fixes the VFP_REG_ZERO
bug (value 16 actually represents the 8th double register with the original
numbering).
The original mcrr/mrrc on CP10 were generating FMRRS/FMSRR instead of
FMRRD/FMDRR. The patch changes to CP11 for the correct instructions.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 3470/1: Clear the HWCAP bits for the disabled kernel features
Patch from Catalin Marinas
Glibc interprets the HWCAP bits and decides on what features to use.
However, even if the features are present in the hardware, they are not
always supported by the kernel and hence the corresponding bits have to be
cleared from the elf_hwcap variable.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] move ->eh_strategy_handler to the transport class
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Nick Piggin [Mon, 10 Apr 2006 01:21:48 +0000 (11:21 +1000)]
[PATCH] Fix buddy list race that could lead to page lru list corruptions
Rohit found an obscure bug causing buddy list corruption.
page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.
Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).
Signed-off-by: Martin Bligh <mbligh@google.com> Signed-off-by: Rohit Seth <rohitseth@google.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David S. Miller [Thu, 6 Apr 2006 23:54:33 +0000 (16:54 -0700)]
[SPARC64]: smp_call_function() fixups...
1) Take doc-book function comment from i386 implementation.
2) cacheline align call_lock, taken from powerpc
3) Need memory barrier after setting call_data
4) Remove timeout
Signed-off-by: David S. Miller <davem@davemloft.net>
> However, show_address() does not output anything unless
> dev->reg_state == NETREG_REGISTERED - and this state is set by
> netdev_run_todo() only after netdev_register_sysfs() returns, so in
> the meantime (while netdev_register_sysfs() is busy adding the
> "statistics" attribute group) some process may see an empty "address"
> attribute.
I've tried the attached patch, suggested by Sergey Vlasov on
hotplug-devel@, and as far as i can test it works just fine.
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 7 Apr 2006 04:46:34 +0000 (21:46 -0700)]
[TG3]: Speed up SRAM access (2nd version)
Speed up SRAM read and write functions if possible by using MMIO
instead of config. cycles. With this change, the post reset signature
done at the end of D3 power change must now be moved before the D3
power change.
IBM reported a problem on powerpc blades during ethtool self test that
was caused by the memory test taking excessively long. Config. cycles
are very slow on powerpc and the memory test can take more than 10
seconds to complete using config. cycles.
David Miller informed me that an earlier version of the patch caused
problems on sparc64 systems with built-in tg3 chips. This version
fixes the problem by excluding all SUN built-in tg3 chips from doing
MMIO SRAM access.
TG3_FLAG_EEPROM_WRITE_PROT is also set unconditionally when
TG3_FLG2_SUN_570X is set. This should be sane as all SUN chips are
built-in and do not require Vaux switching.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 6 Apr 2006 21:19:24 +0000 (14:19 -0700)]
[NETFILTER]: Convert conntrack/ipt_REJECT to new checksumming functions
Besides removing lots of duplicate code, all converted users benefit
from improved HW checksum error handling. Tested with and without HW
checksums in almost all combinations.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 6 Apr 2006 21:18:43 +0000 (14:18 -0700)]
[NETFILTER]: Add address family specific checksum helpers
Add checksum operation which takes care of verifying the checksum and
dealing with HW checksum errors and avoids multiple checksum
operations by setting ip_summed to CHECKSUM_UNNECESSARY after
successful verification.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
default_rrq_ttl is used when no TTL is included in the RRQ.
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER]: H.323 helper: make get_h245_addr() static
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER]: H.323 helper: change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER]: H.323 helper: move some function prototypes to ip_conntrack_h323.h
Move prototypes of NAT callbacks to ip_conntrack_h323.h. Because the
use of typedefs as arguments, some header files need to be moved as
well.
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Ortiz [Thu, 6 Apr 2006 05:39:14 +0000 (22:39 -0700)]
[IRDA]: Support for Sigmatel STIR421x chip
This patch enables support for the Sigmatel's STIR421x IrDA chip.
Once patched with Sigmatel's firmware, this chip "almost" follows the
USB-IrDA spec. Thus this patch is against irda-usb.[ch].
The code has been tested by Nick Fedchik on an STIR4210 chipset based
dongle.
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch integrates the smcinit code into the smsc-ircc driver.
Some laptops have their smsc-ircc chip not properly configured by the
BIOS and needs some preconfiguration. Currently, this can be done from
userspace with smcinit, a utility that comes with the irda-utils
package. It messes with ioports and PCI settings, from userspace. Now
with this patch, if we happen to be on one of the known to be faulty
laptops, we preconfigure the chip from the driver.
Patch from Linus Walleij <triad@df.lth.se> Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Sesterhenn [Thu, 6 Apr 2006 05:28:14 +0000 (22:28 -0700)]
[BLUETOOTH] sco: Possible double free.
this fixes coverity bug id #1068.
hci_send_sco() frees skb if (skb->len > hdev->sco_mtu).
Since it returns a negative error value only in this case, we
can directly return here.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 4 Apr 2006 20:50:45 +0000 (13:50 -0700)]
[INET]: Move no-tunnel ICMP error to tunnel4/tunnel6
This patch moves the sending of ICMP messages when there are no IPv4/IPv6
tunnels present to tunnel4/tunnel6 respectively. Please note that for now
if xfrm4_tunnel/xfrm6_tunnel is loaded then no ICMP messages will ever be
sent. This is similar to how we handle AH/ESP/IPCOMP.
This move fixes the bug where we always send an ICMP message when there is
no ip6_tunnel device present for a given packet even if it is later handled
by IPsec. It also causes ICMP messages to be sent when no IPIP tunnel is
present.
I've decided to use the "port unreachable" ICMP message over the current
value of "address unreachable" (and "protocol unreachable" by GRE) because
it is not ambiguous unlike the other ones which can be triggered by other
conditions. There seems to be no standard specifying what value must be
used so this change should be OK. In fact we should change GRE to use
this value as well.
Incidentally, this patch also fixes a fairly serious bug in xfrm6_tunnel
where we don't check whether the embedded IPv6 header is present before
dereferencing it for the inside source address.
This patch is inspired by a previous patch by Hugo Santos <hsantos@av.it.pt>.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 4 Apr 2006 20:42:35 +0000 (13:42 -0700)]
[NETFILTER]: Fix fragmentation issues with bridge netfilter
The conntrack code doesn't do re-fragmentation of defragmented packets
anymore but relies on fragmentation in the IP layer. Purely bridged
packets don't pass through the IP layer, so the bridge netfilter code
needs to take care of fragmentation itself.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Olsson [Tue, 4 Apr 2006 19:53:35 +0000 (12:53 -0700)]
[FIB_TRIE]: Fix leaf freeing.
Seems like leaf (end-nodes) has been freed by __tnode_free_rcu and not
by __leaf_free_rcu. This fixes the problem. Only tnode_free is now
used which checks for appropriate node type. free_leaf can be removed.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lindgren [Sun, 9 Apr 2006 21:21:05 +0000 (22:21 +0100)]
[ARM] 3461/1: ARM: OMAP: Fix clk_get() when using id and name
Patch from Tony Lindgren
Recent change to use both id and name when available was
not necessarily returning the right clock as it also searched
for clock name afterwards. This caused MMC to break on H2 and
H3 boards.
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 3459/1: ixp23xx: fix debug serial macros for big-endian operation
Patch from Lennert Buytenhek
The debug-8250 macros do byte accesses, which means that if we're in
big-endian mode, we need to logically OR the UART address with 3, as
the LSB byte lane (where UART data and status is transferred) has the
highest byte address in the word when we are in big-endian mode.
It's unclear why this problem didn't surface earlier.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Jones [Mon, 3 Apr 2006 06:34:19 +0000 (23:34 -0700)]
[SELINUX] Fix build after ipsec decap state changes.
security/selinux/xfrm.c: In function 'selinux_socket_getpeer_dgram':
security/selinux/xfrm.c:284: error: 'struct sec_path' has no member named 'x'
security/selinux/xfrm.c: In function 'selinux_xfrm_sock_rcv_skb':
security/selinux/xfrm.c:317: error: 'struct sec_path' has no member named 'x'
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move request_standard_resources() back to before PCI probing
This effectively undoes the PCI resource allocation changes done in
commit b408cbc704352eccee301e1103b23203ba1c3a0e, but leaves the cleanups
of that commit in place.
We're going back to marking the resources reported by e820 busy _before_
doing PCI probing, so that any PCI resource that clashes with the BIOS-
reported memory map will be reloacted to a non-clashing area.
The reason? Larry Finger reports that his laptop has the cardbus
controller set up by the BIOS so that it conflicts with the e820 memory
map, and needs to be relocated. See
http://bugzilla.kernel.org/show_bug.cgi?id=6337
for more details.
We'll have to work out how to handle the fbcon problem that caused that
commit in the first place in some other way.
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Antonino A. Daplas <adaplas@pol.net> Cc: <bjk@luxsci.net> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
John Blackwood [Fri, 7 Apr 2006 17:50:25 +0000 (19:50 +0200)]
[PATCH] x86_64: Plug GS leak in arch_prctl()
In linux-2.6.16, we have noticed a problem where the gs base value
returned from an arch_prtcl(ARCH_GET_GS, ...) call will be incorrect if:
- the current/calling task has NOT set its own gs base yet to a
non-zero value,
- some other task that ran on the same processor previously set their
own gs base to a non-zero value.
In this situation, the ARCH_GET_GS code will read and return the
MSR_KERNEL_GS_BASE msr register.
However, since the __switch_to() code does NOT load/zero the
MSR_KERNEL_GS_BASE register when the task that is switched IN has a zero
next->gs value, the caller of arch_prctl(ARCH_GET_GS, ...) will get back
the value of some previous tasks's gs base value instead of 0.
Change the arch_prctl() ARCH_GET_GS code to only read and return
the MSR_KERNEL_GS_BASE msr register if the 'gs' register of the calling
task is non-zero.
Side note: Since in addition to using arch_prctl(ARCH_SET_GS, ...),
a task can also setup a gs base value by using modify_ldt() and write
an index value into 'gs' from user space, the patch below reads
'gs' instead of using thread.gs, since in the modify_ldt() case,
the thread.gs value will be 0, and incorrect value would be returned
(the task->thread.gs value).
When the user has not set its own gs base value and the 'gs'
register is zero, then the MSR_KERNEL_GS_BASE register will not be
read and a value of zero will be returned by reading and returning
'task->thread.gs'.
The first patch shown below is an attempt at implementing this
approach.
Jordan Hargrave [Fri, 7 Apr 2006 17:50:18 +0000 (19:50 +0200)]
[PATCH] x86_64: Fix drift with HPET timer enabled
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).
If HZ changes, this drift can become even more pronounced.
HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.
Vojtech comments:
"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."
The exception is reported in the SYSRET, not the next instruction.
This leads to the kernel exception handler running on the user stack
with the wrong GS because the kernel didn't expect exceptions
on this instruction.
This version of the patch has the teething problems that plagued an earlier
version fixed.
This is CVE-2006-0744
Thanks to Ernie Petrides and Asit B. Mallick for analysis and initial
patches.
AMD systems have a modern APIC that supports 8 bit IDs, but
don't have a XAPIC version number. Add a new "modern_apic"
subfunction that handles this correctly and use it (nearly)
everywhere where XAPIC is tested for.
I removed one wart: the code specified that external APICs
would use an 8bit APIC ID. But I checked a real 82093 data sheet
and it says clearly that they only use 4bit. So I removed
this special case since it would a bit awkward to implement now.
I removed the valid APIC tests in mptable parsing completely. On any modern
system they only check against the full field width (8bit) anyways
and are no-ops. This also fixes them doing the wrong thing
on >8 core Opterons.
[PATCH] x86-64/i386: Don't process APICs/IO-APICs in ACPI when APIC is disabled.
When nolapic was passed or the local APIC was disabled
for another reason ACPI would still parse the IO-APICs
until these were explicitely disabled with noapic.
Usually this resulted in a non booting configuration unless
"nolapic noapic" was used.
I also disabled the local APIC parsing in this case, although
that's only cosmetic (suppresses a few printks)
[PATCH] x86_64: Don't sanity check Type 1 PCI bus access on newer systems
Horus systems don't have anything on bus 0 which makes
the Type 1 sanity checks fail. Use the DMI BIOS year to
check for newer systems and always assume Type 1 works on them.
I used 2001 as an pretty arbitary cutoff year.
[PATCH] i386/x86-64: Check that MCFG points to an e820 reserved area
This patch introduces a user for the e820_all_mapped function:
There have been several machines that don't have a working MMCONFIG,
often because of a buggy MCFG table in the ACPI bios. This patch adds a
simple sanity check that detects a whole bunch of these cases, and when
it detects it, linux now boots rather than crash-and-burns.
The accuracy of this detection can in principle be improved if there was
a "is this entire range in e820 with THIS attribute", but no such
function exist and the complexity needed for this is not really worth
it; this simple check already catches most cases anyway.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a e820_all_mapped() function which checks if the entire range
<start,end> is mapped with type.
This is done by moving the local start variable to the end of each
known-good region; if at the end of the function the start address is
still before end, there must be a part that's not of the correct type;
otherwise it's a good region.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] x86_64: Reserve SRAT hotadd memory on x86-64
From: Keith Mannthey, Andi Kleen
Implement memory hotadd without sparsemem. The memory in the SRAT
hotadd area is just preserved instead and can be activated later.
There are a few restrictions:
- Only one continuous hotadd area allowed per node
The main problem is dealing with the many buggy SRAT tables
that are out there. The strategy here is to reject anything
suspicious.
Originally from Keith Mannthey, with several hacks and changes by AK
and also contributions from Andrew Morton
[ TBD: Problems pointed out by KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
1) Goto's rebuild_zonelist patch will not work if CONFIG_MEMORY_HOTPLUG=n.
Rebuilding zonelist is necessary when the system has just memory <
4G at boot, and hot add memory > 4G. because x86_64 has DMA32,
ZONE_NORAML is not included into zonelist at boot time if system
doesn't have memory >4G at boot.
[AK: should just force the higher zones at boot time when SRAT tells us]
2) zone and node's spanned_pages and present_pages are not incremented.
They should be.
For example, our server (ia64/Fujitsu PrimeQuest) can equip memory
from 4G to 1T(maybe 2T in future), and SRAT will *always* say we have
possible 1T +memory. (Microsoft requires "write all possible memory
in SRAT") When we reserve memmap for possible 1T memory, Linux will
not work well in +minimum 4G configuraion ;)
[PATCH] x86_64: Support memory hotadd without sparsemem
Memory hotadd doesn't need SPARSEMEM, but can be handled by just preallocating
mem_maps. This only needs some untangling of ifdefs to enable the necessary
code even without SPARSEMEM.