Linus Torvalds [Sun, 9 Dec 2007 18:14:36 +0000 (10:14 -0800)]
Avoid double memclear() in SLOB/SLUB
Both slob and slub react to __GFP_ZERO by clearing the allocation, which
means that passing the GFP_ZERO bit down to the page allocator is just
wasteful and pointless.
Acked-by: Matt Mackall <mpm@selenic.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Fri, 7 Dec 2007 03:46:23 +0000 (12:46 +0900)]
libata: kill spurious NCQ completion detection
Spurious NCQ completion detection implemented in ahci was incorrect.
On AHCI receving and processing FISes and raising interrupts are not
interlocked and spurious interrupts are expected.
For example, if an interrupt occurs while interrupt handler is running
and the running interrupt handler handles the event the new IRQ
indicated, after IRQ handler finishes, it will be executed again
because IRQ pending bit is set by the new interrupt but there won't be
anything to process.
Please read the following message for more information.
http://article.gmane.org/gmane.linux.ide/26012
This patch...
* Removes all spurious IRQ whining from ahci. Spurious NCQ completion
detection was completely wrong. Spurious D2H Register FIS taught us
that some early drives send spurious D2H Register FIS with I bit set
while NCQ commands are in progress but none of recent drives does
that and even the ones which show such behavior can do NCQ fine.
* Kills all NCQ blacklist entries which were added because of spurious
NCQ completions. I tracked down each commit and verified all
removed ones are actually added because of spurious completions.
WD740ADFD-00NLR1 wasn't deleted but moved upward because the drive
not only had spurious NCQ completions but also is slow on sequential
data transfers if NCQ is enabled.
Maxtor 7V300F0 was added by 0e3dbc01d53940fe10e5a5cfec15ede3e929c918
from Alan Cox. I can only find evidences that the drive only had
troubles with spuruious completions by searching the mailing list.
This entry needs to be verified and removed if it doesn't have other
NCQ related problems.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Thu, 6 Dec 2007 06:09:43 +0000 (15:09 +0900)]
ahci: don't attach if ICH6 is in combined mode
ICH6 R/Ms share PCI ID between piix and ahci modes and we've been
allowing ahci to attach regardless of how BIOS configured it.
However, enabling AHCI mode when the controller is in combined mode
can result in unexpected behavior. Don't attach if the controller is
in combined mode.
Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Bill Nottingham <notting@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
S2io: Check for register initialization completion before accesing device registers
- Making sure register initialisation is complete before proceeding further.
The driver must wait until initialization is complete before attempting to
access any other device registers.
ibm_newemac: Call dev_set_drvdata() before tah_reset()
The patch moves dev_set_drvdata(&ofdev->dev, dev) up before tah_reset(ofdev)
is called to avoid a NULL pointer dereference, since tah_reset uses drvdata.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Hugh Blemings [Wed, 5 Dec 2007 00:14:30 +0000 (11:14 +1100)]
ibm_newemac: Skip EMACs that are marked unused by the firmware
Depending on how the 44x processors are wired, some EMAC cells
might not be useable (and not connected to a PHY). However, some
device-trees may choose to still expose them (since their registers
are present in the MMIO space) but with an "unused" property in them.
Signed-off-by: Hugh Blemings <hugh@blemings.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
ibm_newemac: Cleanup/fix support for STACR register variants
There are a few variants of the STACR register that affect more than
just the "AXON" version of EMAC. Replace the current test of various
chip models with tests for generic properties in the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Stefan Roese <sr@denx.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
ibm_newemac: Cleanup/Fix RGMII MDIO support detection
More than just "AXON" version of EMAC RGMII supports MDIO, so replace
the current test with a generic property in the device-tree that
indicates such support.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Stefan Roese <sr@denx.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
ibm_newemac: Workaround reset timeout when no link
With some PHYs, when the link goes away, the EMAC reset fails due
to the loss of the RX clock I believe.
The old EMAC driver worked around that using some internal chip-specific
clock force bits that are different on various 44x implementations.
This is an attempt at doing it differently, by avoiding the reset when
there is no link, but forcing loopback mode instead. It seems to work
on my Taishan 440GX based board so far.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Stefan Roese <sr@denx.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
When using ZMII for MDIO only (such as 440GX with RGMII for data and ZMII for
MDIO), the ZMII code would fail to properly refcount, thus triggering a
BUG_ON().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Stefan Roese <sr@denx.de> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stefan Roese [Wed, 5 Dec 2007 00:14:25 +0000 (11:14 +1100)]
ibm_newemac: Add BCM5248 and Marvell 88E1111 PHY support
This patch adds BCM5248 and Marvell 88E1111 PHY support to NEW EMAC driver.
These PHY chips are used on PowerPC 440EPx boards.
The PHY code is based on the previous work by Stefan Roese <sr@denx.de>
Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Eliezer Tamir [Wed, 5 Dec 2007 14:12:39 +0000 (16:12 +0200)]
make bnx2x select ZLIB_INFLATE
The bnx2x module depends on the zlib_inflate functions. The
build will fail if ZLIB_INFLATE has not been selected manually
or by building another module that automatically selects it.
Modify BNX2X config option to 'select ZLIB_INFLATE' like BNX2
and others. This seems to fix it.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Eliezer Tamir <eliezert@broadcom.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Vosburgh [Fri, 7 Dec 2007 07:40:35 +0000 (23:40 -0800)]
bonding: Fix race at module unload
Fixes a race condition in module unload. Without this change,
workqueue events may fire while bonding data structures are partially
freed but before bond_close() is invoked by unregister_netdevice().
Update version to 3.2.3.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Vosburgh [Fri, 7 Dec 2007 07:40:34 +0000 (23:40 -0800)]
bonding: Add new layer2+3 hash for xor/802.3ad modes
Add new hash for balance-xor and 802.3ad modes. Originally
submitted by "Glenn Griffin" <ggriffin.kernel@gmail.com>; modified by
Jay Vosburgh to move setting of hash policy out of line, tweak the
documentation update and add version update to 3.2.2.
Glenn's original comment follows:
Included is a patch for a new xmit_hash_policy for the bonding driver
that selects slaves based on MAC and IP information. This is a middle
ground between what currently exists in the layer2 only policy and the
layer3+4 policy. This policy strives to be fully 802.3ad compliant by
transmitting every packet of any particular flow over the same link.
As documented the layer3+4 policy is not fully compliant for extreme
cases such as ip fragmentation, so this policy is a nice compromise
for environments that require full compliance but desire more than the
layer2 only policy.
Signed-off-by: "Glenn Griffin" <ggriffin.kernel@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Wagner Ferenc [Fri, 7 Dec 2007 07:40:32 +0000 (23:40 -0800)]
bonding: Allow setting and querying xmit policy regardless of mode
From: Wagner Ferenc <wferi@niif.hu>
For consistency with the behaviour of the arp_ip_target option,
let /sys/class/net/bond0/bonding/xmit_hash_policy accept and report
current policy even if the bonding mode in effect does not use it.
Signed-off-by: Ferenc Wagner <wferi@niif.hu> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Wagner Ferenc [Fri, 7 Dec 2007 07:40:29 +0000 (23:40 -0800)]
bonding: Return nothing for not applicable values
From: Wagner Ferenc <wferi@niif.hu>
The previous code returned '\n' (that is, a single empty line)
from most files, with one exception (xmit_hash_policy), where
it returned 'NA\n'. This patch consolidates each file to return
nothing at all if not applicable, not even a '\n'.
I find this behaviour more usual, more useful, more efficient
and shorter to code from both sides.
Signed-off-by: Ferenc Wagner <wferi@niif.hu> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Wagner Ferenc [Fri, 7 Dec 2007 07:40:28 +0000 (23:40 -0800)]
bonding: Remove trailing NULs from sysfs interface.
From: Wagner Ferenc <wferi@niif.hu>
Also remove trailing spaces from multivalued files.
This fixes output like for example:
$ od -c /sys/class/net/bond0/bonding/slaves 0000000 e t h - l e f t e t h - r i g 0000020 h t \n \0 0000025
It mostly entails deleting '+1'-s after sprintf() calls: the return value
of sprintf is the number of characters printed, without the closing NUL,
ie. exactly what the sysfs interface requires. The three multivalue
cases are different, because they also have to swallow back a trailing
space.
Signed-off-by: Ferenc Wagner <wferi@niif.hu> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
ACPI: move timer broadcast before busmaster disable
clockevents: warn once when program_event() is called with negative expiry
hrtimers: avoid overflow for large relative timeouts
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
sched: enable early use of sched_clock()
lockdep: make cli/sti annotation warnings clearer
Linus Torvalds [Fri, 7 Dec 2007 19:00:31 +0000 (11:00 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
[AVR32] Fix wrong pt_regs in critical exception handler
[AVR32] Fix copy_to_user_page() breakage
[AVR32] Follow the rules when dealing with the OCD system
[AVR32] Clean up OCD register usage
[AVR32] Implement irqflags trace and lockdep support
[AVR32] Implement stacktrace support
[AVR32] Kconfig: Use def_bool instead of bool + default
[AVR32] Fix invalid status register bit definitions in asm/ptrace.h
[AVR32] Add TIF_RESTORE_SIGMASK to the work masks
Linus Torvalds [Fri, 7 Dec 2007 18:59:48 +0000 (10:59 -0800)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[AF_RXRPC]: Add a missing goto
[VLAN]: Lost rtnl_unlock() in vlan_ioctl()
[SCTP]: Fix the bind_addr info during migration.
[SCTP]: Add bind hash locking to the migrate code
[IPV4]: Remove prototype of ip_rt_advice
[IPv4]: Reply net unreachable ICMP message
[IPv6] SNMP: Increment OutNoRoutes when connecting to unreachable network
[BRIDGE]: Section fix.
[NIU]: Fix link LED handling.
Thomas Gleixner [Fri, 7 Dec 2007 18:16:17 +0000 (19:16 +0100)]
ACPI: move timer broadcast before busmaster disable
The timer broadcast code might access HPET, which should not be
accessed after the busmaster disable.
In acpi_idle_enter_simple() this change also prevents, that we modify
the busmaster state without going actually idle. This might leave the
ACPI bm state in a stale state, when we leave the function early in
the need_resched() check.
Thomas Gleixner [Fri, 7 Dec 2007 18:16:17 +0000 (19:16 +0100)]
clockevents: warn once when program_event() is called with negative expiry
The hrtimer problem with large relative timeouts resulting in a
negative expiry time went unnoticed as there is no check in the
clockevents_program_event() code. Put a check there with a WARN_ONCE
to avoid such problems in the future.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Fri, 7 Dec 2007 18:16:17 +0000 (19:16 +0100)]
hrtimers: avoid overflow for large relative timeouts
Relative hrtimers with a large timeout value might end up as negative
timer values, when the current time is added in hrtimer_start().
This in turn is causing the clockevents_set_next() function to set an
huge timeout and sleep for quite a long time when we have a clock
source which is capable of long sleeps like HPET. With PIT this almost
goes unnoticed as the maximum delta is ~27ms. The non-hrt/nohz code
sorts this out in the next timer interrupt, so we never noticed that
problem which has been there since the first day of hrtimers.
This bug became more apparent in 2.6.24 which activates HPET on more
hardware.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 7 Dec 2007 18:02:47 +0000 (19:02 +0100)]
sched: enable early use of sched_clock()
some platforms have sched_clock() implementations that cannot be called
very early during wakeup. If it's called it might hang or crash in hard
to debug ways. So only call update_rq_clock() [which calls sched_clock()]
if sched_init() has already been called. (rq->idle is NULL before the
scheduler is initialized.)
[AVR32] Fix wrong pt_regs in critical exception handler
It's not like it really matters at this point since the system is
dying anyway, but handle_critical pushes too few registers on the
stack so the register dump, which makes the register dump look a bit
strange. This patch fixes it.
The current implementation of copy_to_user_page() gives "vaddr" to the
cache instruction when trying to sync the icache with the dcache. If
vaddr does not exist in the TLB, the CPU will silently abort the
operation, which may result in the caches staying out of sync.
To fix this, pass the "dst" parameter to flush_icache_range() instead
-- we know this is valid because we just wrote to it.
[AVR32] Follow the rules when dealing with the OCD system
The current debug trap handling code does a number of things that are
illegal according to the AVR32 Architecture manual. Most importantly,
it may try to schedule from Debug Mode, thus clearing the D bit, which
can lead to "undefined behaviour".
It seems like this works in most cases, but several people have
observed somewhat unstable behaviour when debugging programs,
including soft lockups. So there's definitely something which is not
right with the existing code.
The new code will never schedule from Debug mode, it will always exit
Debug mode with a "retd" instruction, and if something not running in
Debug mode needs to do something debug-related (like doing a single
step), it will enter debug mode through a "breakpoint" instruction.
The monitor code will then return directly to user space, bypassing
its own saved registers if necessary (since we don't actually care
about the trapped context, only the one that came before.)
This adds three instructions to the common exception handling code,
including one branch. It does not touch super-hot paths like the TLB
miss handler.
Generate a new set of OCD register definitions in asm/ocd.h and rename
__mfdr() and __mtdr() to ocd_read() and ocd_write() respectively.
The bitfield definitions are a lot more complete now, and they are
entirely based on bit numbers, not masks. This is because OCD
registers are frequently accessed from assembly code, where bit
numbers are a lot more useful (can be fed directly to sbr, bfins,
etc.)
Bitfields that consist of more than one bit have two definitions:
_START, which indicates the number of the first bit, and _SIZE, which
indicates the number of bits. These directly correspond to the
parameters taken by the bfextu, bfexts and bfins instructions.
We really need to check TIF_RESTORE_SIGMASK before returning to
userspace. The existing code does not necessarily do this.
Define the work masks as a bitwise OR of the respective flags instead
of a hardcoded hex value to make it easier to spot errors like this in
the future.
Vlad Yasevich [Fri, 7 Dec 2007 06:50:54 +0000 (22:50 -0800)]
[SCTP]: Fix the bind_addr info during migration.
During accept/migrate the code attempts to copy the addresses from
the parent endpoint to the new endpoint. However, if the parent
was bound to a wildcard address, then we end up pointlessly copying
all of the current addresses on the system.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Fri, 7 Dec 2007 06:50:27 +0000 (22:50 -0800)]
[SCTP]: Add bind hash locking to the migrate code
SCTP accept code tries to add a newliy created socket
to a bind bucket without holding a lock. On a really
busy system, that can causes slab corruptions.
Add a lock around this code.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mitsuru Chinen [Fri, 7 Dec 2007 09:07:24 +0000 (01:07 -0800)]
[IPv4]: Reply net unreachable ICMP message
IPv4 stack doesn't reply any ICMP destination unreachable message
with net unreachable code when IP detagrams are being discarded
because of no route could be found in the forwarding path.
Incidentally, IPv6 stack replies such ICMPv6 message in the similar
situation.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Purdie [Sat, 10 Nov 2007 13:29:04 +0000 (13:29 +0000)]
leds: Fix led trigger locking bugs
Convert part of the led trigger core from rw spinlocks to rw
semaphores. We're calling functions which can sleep from invalid
contexts otherwise. Fixes bug #9264.
Mitsuru Chinen [Thu, 6 Dec 2007 06:31:47 +0000 (22:31 -0800)]
[IPv6] SNMP: Increment OutNoRoutes when connecting to unreachable network
IPv6 stack doesn't increment OutNoRoutes counter when IP datagrams
is being discarded because no route could be found to transmit them
to their destination. IPv6 stack should increment the counter.
Incidentally, IPv4 stack increments that counter in such situation.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mirko Lindner [Thu, 6 Dec 2007 05:10:02 +0000 (21:10 -0800)]
[NIU]: Fix link LED handling.
The LED in the current driver will not be controlled correctly. During
a link change the carrier of the link is not available and the LED
will never turn on.
Signed-off-by: David S. Miller <davem@davemloft.net>
Grant Likely [Thu, 6 Dec 2007 19:16:44 +0000 (06:16 +1100)]
[POWERPC] virtex bug fix: Use canonical value for AC97 interrupt xparams
The ml300 and ml403 xparameters.h files use different macros for the
AC97 interrupt pin assignments. This normalizes them to a canonical
value similar to what EDK generates for most other devices. This is
needed to get ml300 support to compile in arch/ppc.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
[PARISC] lba_pci: pci_claim_resources disabled expansion roms
[PARISC] print more than one character at a time for pdc console
[PARISC] Update parisc-linux MAINTAINERS entries
[PARISC] timer interrupt should not be IRQ_DISABLED
Revert "[PARISC] import necessary bits of libgcc.a"
Linus Torvalds [Thu, 6 Dec 2007 17:43:26 +0000 (09:43 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Oprofile: Fix computation of number of counters.
[MIPS] Alchemy: fix IRQ bases
[MIPS] Alchemy: replace ffs() with __ffs()
[MIPS] BCM1480: Fix interrupt routing, take 2.
radeonfb was HPMC-ing my C8000 by trying to map its expansion rom from
IO_VIEW, instead of PA_VIEW. Fix seems to be to ensure that its disabled
ROM is properly inserted into the resource tree.
FIXME: this will result in a whinging printk for cards which share expansion
ROMS, such as a quad tulip. Thankfully, it isn't harmful.
Ralf Baechle [Thu, 6 Dec 2007 16:53:19 +0000 (16:53 +0000)]
Fix oprofile configuration breakage
The cleanup 09cadedbdc01f1a4bea1f427d4fb4642eaa19da9 broke the oprofile
configuration for MIPS by allowing oprofile support to be built for
kernel models where oprofile doesn't have a chance in hell to work.
Just a dependecy list on a number of architectures is - surprise - broken
and should as per past discussions probably in most considered to be
broken in most cases. So I introduce a dependency for the oprofile
configuration on ARCH_SUPPORTS_OPROFILE.
Kyle McMartin [Thu, 6 Dec 2007 17:32:15 +0000 (09:32 -0800)]
[PARISC] print more than one character at a time for pdc console
There's really no reason not to print more than one character at a
time to the PDC console... Booting is measurably speedier, and now I don't
have to watch individual characters get drawn.
Kyle McMartin [Wed, 28 Nov 2007 07:17:53 +0000 (02:17 -0500)]
[PARISC] timer interrupt should not be IRQ_DISABLED
The timer interrupt had accidentally been marked IRQ_DISABLED since
IRQ_PER_CPU had been OR-ed in, instead of set. This had been working
by accident for quite a while.
Kyle McMartin [Wed, 28 Nov 2007 07:07:35 +0000 (02:07 -0500)]
Revert "[PARISC] import necessary bits of libgcc.a"
This reverts commit efb80e7e097d0888e59fbbe4ded2ac5a256f556d, it turned
out to cause sporadic problems with the timer interrupt on 32-bit kernels.
Needs more investigation.
Sergei Shtylyov [Wed, 5 Dec 2007 16:08:24 +0000 (19:08 +0300)]
[MIPS] Alchemy: replace ffs() with __ffs()
Fix havoc wrought by commit 56f621c7f6f735311eed3f36858b402013023c18 --
au_ffs() and ffs() are equivalent, that patch should have just replaced one
with another. Now replace ffs() with __ffs() which returns an unbiased bit
number.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Paul Mackerras [Thu, 6 Dec 2007 05:53:35 +0000 (16:53 +1100)]
[POWERPC] Update defconfigs
This updates all the defconfigs in arch/powerpc/configs except iseries
and ps3, which were updated by the preceding commits.
This mostly takes the defaults, except that I turned on tickless idle
and high-resolution timers for everything, and turned off instrumentation
support and "Fair group CPU scheduler" for the smaller/embedded platforms.
Tony Breeds [Tue, 4 Dec 2007 05:51:44 +0000 (16:51 +1100)]
[POWERPC] Fix hardware IRQ time accounting problem.
The commit fa13a5a1f25f671d084d8884be96fc48d9b68275 (sched: restore
deterministic CPU accounting on powerpc), unconditionally calls
update_process_tick() in system context. In the deterministic
accounting case this is the correct thing to do. However, in the
non-deterministic accounting case we need to not do this, since doing
this results in the time accounted as hardware irq time being
artificially elevated.
Also this collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING'
checks in time.h into one for neatness.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linus Torvalds [Wed, 5 Dec 2007 17:26:52 +0000 (09:26 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
VM/Security: add security hook to do_brk
Security: round mmap hint address above mmap_min_addr
security: protect from stack expantion into low vm addresses
Security: allow capable check to permit mmap or low vm space
SELinux: detect dead booleans
SELinux: do not clear f_op when removing entries
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[LRO]: fix lro_gen_skb() alignment
[TCP]: NAGLE_PUSH seems to be a wrong way around
[TCP]: Move prior_in_flight collect to more robust place
[TCP] FRTO: Use of existing funcs make code more obvious & robust
[IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS
[ROSE]: Trivial compilation CONFIG_INET=n case
[IPVS]: Fix sched registration race when checking for name collision.
[IPVS]: Don't leak sysctl tables if the scheduler registration fails.
Al Viro [Wed, 5 Dec 2007 08:46:47 +0000 (08:46 +0000)]
remove nonsense force-casts from ocfs2
endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.
Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris [Wed, 5 Dec 2007 07:45:31 +0000 (23:45 -0800)]
VM/Security: add security hook to do_brk
Given a specifically crafted binary do_brk() can be used to get low pages
available in userspace virtual memory and can thus be used to circumvent
the mmap_min_addr low memory protection. Add security checks in do_brk().
Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Alan Cox <alan@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Wed, 5 Dec 2007 07:45:28 +0000 (23:45 -0800)]
proc: fix proc_dir_entry refcounting
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.
This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():
1) PDE leak
remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_read(&de->count) == 0)
if (atomic_dec_and_test(&de->count))
if (de->deleted)
/* also not taken! */
free_proc_entry(de);
else
de->deleted = 1;
[refcount=0, deleted=1]
Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.
Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.
Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 5 Dec 2007 07:45:27 +0000 (23:45 -0800)]
jbd: Fix assertion failure in fs/jbd/checkpoint.c
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>