* add ide_set_dma() helper and make ide_hwif_t.ide_dma_check return
-1 when DMA needs to be disabled (== need to call ->ide_dma_off_quietly)
0 when DMA needs to be enabled (== need to call ->ide_dma_on)
1 when DMA setting shouldn't be changed
* fix IDE code to use ide_set_dma() instead if using ->ide_dma_check directly
ide: use PIO/MMIO operations directly where possible (v2)
This results in smaller/faster/simpler code and allows future optimizations.
Also remove no longer needed ide[_mm]_{inl,outl}() and ide_hwif_t.{INL,OUTL}.
* add ide_use_fast_pio() helper for use by host drivers
* add DMA capability and hwif->autodma checks to ide_use_dma()
- au1xxx-ide/it8213/it821x drivers didn't check for (id->capability & 1)
[ for the IT8211/2 in SMART mode this check shouldn't be made but since
in it821x_fixups() we set DMA bit explicitly:
if(strstr(id->model, "Integrated Technology Express")) {
/* In raid mode the ident block is slightly buggy
We need to set the bits so that the IDE layer knows
LBA28. LBA48 and DMA ar valid */
id->capability |= 3; /* LBA28, DMA */
we are better off using generic helper if we can ]
- ide-cris driver didn't set ->autodma
[ before the patch hwif->autodma was only checked in the chipset specific
hwif->ide_dma_check implementations, for ide-cris it is cris_dma_check()
function so there no behavior change here ]
v2:
* updated patch description (thanks to Alan Cox for the feedback)
In cmd64x, siimage and scc_pata drivers:
* don't set drive->init_speed as it should be already
set by successful execution of ide_set_xfer_rate()
* use hwif->speedproc functions directly
Above changes allows removal of EXPORT_SYMBOL_GPL(ide_set_xfer_rate).
This field is no longer used by the core IDE code so fix ide-{disk,floppy}
drivers to keep openers count in the driver specific objects and remove
it from ide-{cd,scsi,tape} drivers (it was write-only).
* remove bogus comment for sis5513_config_xfer_rate()
* there is no need to call config_drive_art_rwp() because
it is called by config_art_rwp_pio()
* remove needless wrapper
* remove stale "TODO" comment
(IDE core should provide generic tuning code)
* disable DMA masks if no_piix_dma is set and remove now
not needed no_piix_dma_check from piix_config_drive_for_dma()
* there is no need to read register 0x55 in init_hwif_piix()
* move cable detection code to piix_cable_detect()
* remove unreachable 82371MX code from init_hwif_piix()
* remove redundant svwks_ide_dma_end() [ __ide_dma_end() is used by default ]
* remove init_dma_svwks() so the default ide_setup_dma() function is used
[ init_setup_csb6() takes care of not initializing disabled channels ]
* BUG() on unknown DMA mode in cs5530_config_dma()
* there is no need to call hwif->ide_dma_host_{off,on}() in
cs5530_config_dma() because hwif->ide_dma_host_{off,on}()
is called by hwif->ide_dma_off_{quietly,on}()
Kou Ishizaki [Sat, 17 Feb 2007 01:40:22 +0000 (02:40 +0100)]
drivers/ide: PATA driver for Celleb
This is the patch (based on 2.6.19-rc4) for PATA controller of
Toshiba Cell reference set(Celleb). The reference set consists
of Cell, 512MB memory, Super Companion Chip(SCC) and some
peripherals such as HDD, GbE, etc. You can see brief explanation
and picture of Cell reference set at following URLs.
We use a drivers/ide driver because its design is more suitable for
SCC IDE controller than libata driver. Since SCC supports only 32bit
read/write, we must override many callbacks of ata_port_operations
by modifying generic helpers. Each time the libata common code is
updated, we must update those modified helpers. It is very hard for us.
But we will try to implement the libata driver as needed.
Signed-off-by: Kou Ishizaki <kou.ishizaki at toshiba.co.jp> Signed-off-by: Akira Iguchi <akira2.iguchi at toshiba.co.jp> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Sat, 17 Feb 2007 01:40:22 +0000 (02:40 +0100)]
sl82c105: DMA support fixes
Fix a number of issues with the DMA support code:
- driver claims support for all SW/MW DMA modes while supporting only MWDMA2;
- ide_dma_check() method tries to enable DMA on the "known good" drives which
don't support MWDMA2;
- ide_dma_on() method upon failure to set drive to MWDMA2 re-tunes already
tuned PIO mode and calls ide_dma_off() method instead of returning error;
- ide_dma_off() method sets drive->current_speed while it doesn't actually
change (only the PIO timings are re-loaded into the chip's registers);
- init_hwif() method forcibly sets/resets both "drive DMA capable" bits while
this is properly handled by ide_dma_{on,off}() methods being called later...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Sat, 17 Feb 2007 01:40:22 +0000 (02:40 +0100)]
pdc202xx_old: fix PIO mode setup
Fix the driver's tuneproc() method to always set the PIO mode requested and not
pick the best possible one, rename it to pdc202xx_tune_drive(), and change the
calls to it accordingly; remove the preceding comment which has nothing to do
with the code.
Sergei Shtylyov wrote:
> The tuneproc() method should take arg 255 for auto-selecting the best PIO
> mode, not 5 as it did here + this driver's method always auto-selected instead
> of setting the mode it's been told to -- issue typical to drivers/ide/...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Albert Lee [Sat, 17 Feb 2007 01:40:22 +0000 (02:40 +0100)]
ide: remove clearing bmdma status from cdrom_decode_status() (rev #4)
patch 2/2:
Remove clearing bmdma status from cdrom_decode_status() since ATA devices
might need it as well.
(http://lkml.org/lkml/2006/12/4/201 and http://lkml.org/lkml/2006/11/15/94)
Signed-off-by: Albert Lee <albertcc@tw.ibm.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "Adam W. Hawks" <awhawks@us.ibm.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Albert Lee [Sat, 17 Feb 2007 01:40:21 +0000 (02:40 +0100)]
ide: clear bmdma status in ide_intr() for ICHx controllers (revised #4)
patch 1/2 (revised):
- Fix drive->waiting_for_dma to work with CDB-intr devices.
- Do the dma status clearing in ide_intr() and add a new
hwif->ide_dma_clear_irq for Intel ICHx controllers.
Revised per Alan, Sergei and Bart's advice.
Patch against 2.6.20-rc6. Tested ok on my ICH4 and pdc20275 adapters.
Please review/apply, thanks.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "Adam W. Hawks" <awhawks@us.ibm.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Alan Cox [Sat, 17 Feb 2007 01:40:20 +0000 (02:40 +0100)]
ide-floppy: Fix unformatted media crash
A ZIP or similar with unformatted media will cause crashes when attempts
are made to read/write it in some cases. This is because bs_factor is
zero and we divide by it causing an oops.
As the size of a non-accessible/non-existant media is really a bit of a
zen question it doesn't matter if non-existant media is 512 bytes per
sector or zero. Setting it to 1 causes us to generate 512 bytes/sector
accesses and error properly.
Based on a fix found lurking in an ancient bugzilla entry since about 2004 (ugghhh)
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* atiixp: if DMA can't be used atiixp_config_drive_for_dma() should return 0,
atiixp_dma_check() will tune the correct PIO mode anyway
* jmicron: if DMA can't be used config_chipset_for_dma() should return 0,
micron_config_drive_for_dma() will tune the correct PIO mode anyway
config_jmicron_chipset_for_pio(drive, !speed) doesn't program
device transfer mode for speed != 0 (only wastes some CPU cycles
on ide_get_best_pio_mode() call) so remove it
* triflex: if DMA can't be used triflex_config_drive_for_dma() should return 0,
triflex_config_drive_xfer_rate() will tune correct PIO mode anyway
Above changes also fix (theoretical) issue when ->speedproc fails to set
device transfer mode (i.e. when ide_config_drive_speed() fails to program it)
but one of DMA transfer modes is already enabled on the device by the BIOS.
In such scenario ide_dma_enable() will incorrectly return true statement
and ->ide_dma_check will try to enable DMA on the device.
Linus Torvalds [Fri, 16 Feb 2007 16:19:44 +0000 (08:19 -0800)]
Merge branch 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32
* 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32:
[AVR32] Use per-controller spi_board_info structures
[AVR32] Warn, don't BUG if clk_disable is called too many times
[AVR32] Make sure all genclocks have a parent
[AVR32] Remove unnecessary sys_nfsservctl conditional
[AVR32] Wire up the SysV IPC calls properly
[AVR32] Define ioremap_nocache, ioport_map and ioport_unmap
[AVR32] Fix prototypes for __raw_writesb and friends
Michael Halcrow [Fri, 16 Feb 2007 09:28:40 +0000 (01:28 -0800)]
[PATCH] eCryptfs: Reduce stack usage in ecryptfs_generate_key_packet_set()
eCryptfs is gobbling a lot of stack in ecryptfs_generate_key_packet_set()
because it allocates a temporary memory-hungry ecryptfs_key_record struct.
This patch introduces a new kmem_cache for that struct and converts
ecryptfs_generate_key_packet_set() to use it.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Fri, 16 Feb 2007 09:28:38 +0000 (01:28 -0800)]
[PATCH] knfsd: stop NFSD writes from being broken into lots of little writes to filesystem
When NFSD receives a write request, the data is typically in a number of
1448 byte segments and writev is used to collect them together.
Unfortunately, generic_file_buffered_write passes these to the filesystem
one at a time, so an e.g. 32K over-write becomes a series of partial-page
writes to each page, causing the filesystem to have to pre-read those pages
- wasted effort.
generic_file_buffered_write handles one segment of the vector at a time as
it has to pre-fault in each segment to avoid deadlocks. When writing from
kernel-space (and nfsd does) this is not an issue, so
generic_file_buffered_write does not need to break and iovec from nfsd into
little pieces.
This patch avoids the splitting when get_fs is KERNEL_DS as it is
from NFSd.
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Norman Weathers <norman.r.weathers@conocophillips.com> Cc: Vladimir V. Saveliev <vs@namesys.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:37 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: fix handling of directories without default ACLs
When setting an ACL that lacks inheritable ACEs on a directory, we should set
a default ACL of zero length, not a default ACL with all bits denied.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're inserting deny's between some ACEs in order to enforce posix draft acl
semantics which prevent permissions from accumulating across entries in an
acl.
That's fine, but we're doing that by inserting a deny after *every* allow,
which is overkill. We shouldn't be adding them in places where they actually
make no difference.
Also replaced some helper functions for creating acl entries; I prefer just
assigning directly to the struct fields--it takes a few more lines, but the
field names provide some documentation that I think makes the result easier
understand.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return just the effective permissions, and forget about the mask. It isn't
worth the complexity.
WARNING: This breaks backwards compatibility with overly-picky nfsv4->posix
acl translation, as may has been included in some patched versions of libacl.
To our knowledge no such version was every distributed by anyone outside citi.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:34 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: fix error return on unsupported acl
We should be returning ATTRNOTSUPP, not NOTSUPP, when acls are unsupported.
Also fix a comment.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:30 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: fix memory leak on kmalloc failure in savemem
The wrong pointer is being kfree'd in savemem() when defer_free returns with
an error.
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:30 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: represent nfsv4 acl with array instead of linked list
Simplify the memory management and code a bit by representing acls with an
array instead of a linked list.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code that splits an incoming nfsv4 ACL into inheritable and effective
parts can be combined with the the code that translates each to a posix acl,
resulting in simpler code that requires one less pass through the ACL.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:28 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: relax checking of ACL inheritance bits
The rfc allows us to be more permissive about the ACL inheritance bits we
accept:
"If the server supports a single "inherit ACE" flag that applies to
both files and directories, the server may reject the request
(i.e., requiring the client to set both the file and directory
inheritance flags). The server may also accept the request and
silently turn on the ACE4_DIRECTORY_INHERIT_ACE flag."
Let's take the latter option--the ACL is a complex attribute that could be
rejected for a wide variety of reasons, and the protocol gives us little
ability to explain the reason for the rejection, so erroring out is a
user-unfriendly last resort.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Fri, 16 Feb 2007 09:28:27 +0000 (01:28 -0800)]
[PATCH] knfsd: nfsd4: fix non-terminated string
The server name is expected to be a null-terminated string, so we can't pass
in the raw client identifier.
What's more, the client identifier is just a binary, not necessarily
printable, blob. Let's just use the ip address instead. The server name
appears to exist just to help debugging by making some printk's more
informative.
Note that the string is copies into the rpc client structure, so the pointer
to the local variable does not outlive the function call.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:24 +0000 (01:28 -0800)]
[PATCH] genirq: do not mask interrupts by default
Never mask interrupts immediately upon request. Disabling interrupts in
high-performance codepaths is rare, and on the other hand this change could
recover lost edges (or even other types of lost interrupts) by conservatively
only masking interrupts after they happen. (NOTE: with this change the
highlevel irq-disable code still soft-disables this IRQ line - and if such an
interrupt happens then the IRQ flow handler keeps the IRQ masked.)
Mark i8529A controllers as 'never loses an edge'.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul E. McKenney [Fri, 16 Feb 2007 09:28:22 +0000 (01:28 -0800)]
[PATCH] posix timers: RCU optimization for clock_gettime()
Use RCU to avoid the need to acquire tasklist_lock in the single-threaded
case of clock_gettime(). It still acquires tasklist_lock when for a
(potentially multithreaded) process. This change allows realtime
applications to frequently monitor CPU consumption of individual tasks, as
requested (and now deployed) by some off-list users.
This has been in Ingo Molnar's -rt patchset since late 2005 with no
problems reported, and tests successfully on 2.6.20-rc6, so I believe that
it is long-since ready for mainline adoption.
[paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:28:19 +0000 (01:28 -0800)]
[PATCH] time: x86_64: split x86_64/kernel/time.c up
In preparation for the x86_64 generic time conversion, this patch splits out
TSC and HPET related code from arch/x86_64/kernel/time.c into respective
hpet.c and tsc.c files.
[akpm@osdl.org: fix printk timestamps]
[akpm@osdl.org: cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@muc.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john stultz [Fri, 16 Feb 2007 09:28:18 +0000 (01:28 -0800)]
[PATCH] time: x86_64: hpet_address cleanup
In preparation for supporting generic timekeeping, this patch cleans up
x86-64's use of vxtime.hpet_address, changing it to just hpet_address as is
also used in i386. This is necessary since the vxtime structure will be going
away.
Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@muc.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:28:13 +0000 (01:28 -0800)]
[PATCH] Add debugging feature /proc/timer_stat
Add /proc/timer_stats support: debugging feature to profile timer expiration.
Both the starting site, process/PID and the expiration function is captured.
This allows the quick identification of timer event sources in a system.
[ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ]
[bunk@stusta.de: nr_entries can become static] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:11 +0000 (01:28 -0800)]
[PATCH] hrtimers: add high resolution timer support
Implement high resolution timers on top of the hrtimers infrastructure and the
clockevents / tick-management framework. This provides accurate timers for
all hrtimer subsystem users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:09 +0000 (01:28 -0800)]
[PATCH] i386 prepare nmi watchdog for dynticks
The NMI watchdog implementation assumes that the local APIC timer interrupt is
happening. This assumption is not longer true when high resolution timers and
dynamic ticks come into play, as they may switch off the local APIC timer
completely. Take the PIT/HPET interrupts into account too, to avoid false
positives.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rohit Seth <rohitseth@google.com> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:06 +0000 (01:28 -0800)]
[PATCH] i386 rework local apic timer calibration
The local apic timer calibration has two problem cases:
1. The calibration is based on readout of the PIT/HPET timer to detect the
wrap of the periodic tick. It happens that a box gets stuck in the
calibration loop due to a PIT with a broken readout function.
2. CoreDuo boxen show a sporadic PIT runs too slow defect, which results
in a wrong lapic calibration. The PIT goes back to normal operation once
the lapic timer is switched to periodic mode.
Both are existing and unfixed problems in the current upstream kernel and
prevent certain laptops and other systems from booting Linux.
Rework the code to address both problems:
- Make the calibration interrupt driven. This removes the wait_timer_tick
magic hackery from lapic.c and time_hpet.c. The clockevents framework
allows easy substitution of the global tick event handler for the
calibration. This is more accurate than monitoring jiffies. At this point
of the boot process, nothing disturbes the interrupt delivery, so the
results are very accurate.
- Verify the calibration against the PM timer, when available by using the
early access function. When the measured calibration period is outside of
an one percent window, then the lapic timer calibration is adjusted to the
pm timer result.
- Verify the calibration by running the lapic timer with the calibration
handler. Disable lapic timer in case of deviation.
This also removes the "synchronization" of the local apic timer to the global
tick. This synchronization never worked, as there is no way to synchronize
PIT(HPET) and local APIC timer. The synchronization by waiting for the tick
just alignes the local APIC timer for the first events, but later the events
drift away due to the different clocks. Removing the "sync" is just
randomizing the asynchronous behaviour at setup time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Zachary Amsden <zach@vmware.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Andi Kleen <ak@suse.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:04 +0000 (01:28 -0800)]
[PATCH] clockevents: i386 drivers
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver. The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()
Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.
No changes to existing functionality.
[ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
[ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
Cleanups-from: Adrian Bunk <bunk@stusta.de>
Build-fixes-from: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add functions to provide dynamic ticks and high resolution timers. The code
which keeps track of jiffies and handles the long idle periods is shared
between tick based and high resolution timer based dynticks. The dyntick
functionality can be disabled on the kernel commandline. Provide also the
infrastructure to support high resolution timers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:02 +0000 (01:28 -0800)]
[PATCH] tick-management: broadcast functionality
With Ingo Molnar <mingo@elte.hu>
Add broadcast functionality, so per cpu clock event devices can be registered
as dummy devices or switched from/to broadcast on demand. The broadcast
function distributes the events via the broadcast function of the clock event
device. This is primarily designed to replace the switch apic timer to / from
IPI in power states, where the apic stops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:01 +0000 (01:28 -0800)]
[PATCH] tick-management: core functionality
With Ingo Molnar <mingo@elte.hu>
The tick-management code is the first user of the clockevents layer. It takes
clock event devices from the clock events core and uses them to provide the
periodic tick.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:28:00 +0000 (01:28 -0800)]
[PATCH] clockevents: add core functionality
Architectures register their clock event devices, in the clock events core.
Users of the clockevents core can get clock event devices for their use. The
clockevents core code provides notification mechanisms for various clock
related management events.
This allows to control the clock event devices without the architectures
having to worry about the details of function assignment. This is also a
preliminary for high resolution timers and dynamic ticks to allow the core
code to control the clock functionality without intrusive changes to the
architecture code.
[Fixes-by: Ingo Molnar <mingo@elte.hu>] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:58 +0000 (01:27 -0800)]
[PATCH] i386, apic: clean up the APIC code
The apic code is quite unstructured and missing a lot of comments.
- Restructure the code into helper functions, timer, setup/shutdown,
interrupt and power management blocks.
- Fixup comments.
- Namespace fixups
- Inline helpers for version and is_integrated
- Combine the ack_bad_irq functions
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Zachary Amsden <zach@vmware.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Andi Kleen <ak@suse.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:57 +0000 (01:27 -0800)]
[PATCH] Allow early access to the power management timer
Allow early access to the power management timer by exposing the verified read
function and providing a helper function which checks the pmtmr_ioport
variable and returns either the pm timer readout or 0 in case the pm timer is
not available.
Create a new header file and replace also the ifdef'ed extern definition in
arch/i386/kernel/acpi/boot.c
This is a preperatory patch for the rework of the local apic timer
calibration.
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:54 +0000 (01:27 -0800)]
[PATCH] ACPI: fix missing include for UP
apic.h does not get included on UP compiles. That way the
APICTIMER_STOPS_ON_C3 is not there and UP boxen have no support for timer
broadcasting. This was never noticed, because the lapic timer is only used
for profiling on UP.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Len Brown <len.brown@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:51 +0000 (01:27 -0800)]
[PATCH] hrtimers; add state tracking
Reintroduce ktimers feature "optimized away" by the ktimers review process:
multiple hrtimer states to enable the running of hrtimers without holding the
cpu-base-lock.
(The "optimized" rbtree hack carried only 2 states worth of information and we
need 4 for high resolution timers and dynamic ticks.)
No functional changes.
Build-fixes-from: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:50 +0000 (01:27 -0800)]
[PATCH] hrtimers: cleanup locking
Improve kernel/hrtimers.c locking: use a per-CPU base with a lock to control
locking of all clocks belonging to a CPU. This simplifies code that needs to
lock all clocks at once. This makes life easier for high-res timers and
dyntick.
No functional changes.
[ optimization change from Andrew Morton <akpm@osdl.org> ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:49 +0000 (01:27 -0800)]
[PATCH] hrtimers: namespace and enum cleanup
- hrtimers did not use the hrtimer_restart enum and relied on the implict
int representation. Fix the prototypes and the functions using the enums.
- Use seperate name spaces for the enumerations
- Convert hrtimer_restart macro to inline function
- Add comments
No functional changes.
[akpm@osdl.org: fix input driver] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:47 +0000 (01:27 -0800)]
[PATCH] Extend next_timer_interrupt() to use a reference jiffie
For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a
given jiffie value. Extend the existing code to allow the extra 'now'
argument. Provide a compability function for the existing implementations to
call the function with now == jiffies. (This also solves the racyness of the
original code vs. jiffies changing during the iteration.)
No functional changes to existing users of this infrastructure.
[ remove WARN_ON() that triggered on s390, by Carsten Otte <cotte@de.ibm.com> ]
[ made new helper static, Adrian Bunk <bunk@stusta.de> ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:46 +0000 (01:27 -0800)]
[PATCH] Fix cascade lookup of next_timer_interrupt
When searching for the next pending timer in the timer wheel we need to take
the cascade into account. The current code has several problems:
1. it looks into the previous cascade
2. it ignores a pending cascade
3. it ignores multiple cascades
Change the cascade lookup, so it calculates the array index from the point of
the next cascade and always look at the cascade buckets, when the cascade is
pending, i.e. gets executed in the next timer softirq. When multiple
cascades are pending, then lookup the next buckets too.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcelo Tosatti [Fri, 16 Feb 2007 09:27:44 +0000 (01:27 -0800)]
[PATCH] Mark TSC on GeodeLX reliable
The Geode can safely use the TSC for highres, since:
1) Does not support frequency scaling,
2) The TSC _does_ count when the CPU is halted. Furthermore, the Geode
supports a mode called "suspension on halt", where Suspend mode (which
interacts with the power management states) is entered. TSC counting
during suspend mode is controlled by bit 8 of the Bus Controller
Configuration Register #0 (thanks Tom!).
3) no SMP :)
Check if "RTSC counts during suspension" and remove the requirement for
verification, so the clocksource code can safely select it as an timesource
for the highres timers subsystem.
Signed-off-by: Marcelo Tosatti <marcelo@kvack.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The TSC needs to be verified against another clocksource. Instead of using
hardwired assumptions of available hardware, provide a generic verification
mechanism. The verification uses the best available clocksource and handles
the usability for high resolution timers / dynticks of the clocksource which
needs to be verified.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:42 +0000 (01:27 -0800)]
[PATCH] clocksource: Remove the update callback
The clocksource code allows direct updates of the rating of a given
clocksource now. Change TSC unstable tracking to use this interface and
remove the update callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:37 +0000 (01:27 -0800)]
[PATCH] clocksource: fixup is_continous changes on ARM
Fixup the is_contionous replacement by a flag field.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Fri, 16 Feb 2007 09:27:36 +0000 (01:27 -0800)]
[PATCH] clocksource: replace is_continuous by a flag field
Using a flag filed allows to encode more than one information into a variable.
Preparatory patch for the generic clocksource verification.
[mingo@elte.hu: convert vmitime.c to the new clocksource flag] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:27:34 +0000 (01:27 -0800)]
[PATCH] x86: rewrite SMP TSC sync code
make the TSC synchronization code more robust, and unify it between x86_64 and
i386.
The biggest change is the removal of the 'fix up TSCs' code on x86_64 and
i386, in some rare cases it was /causing/ time-warps on SMP systems.
The new code only checks for TSC asynchronity - and if it can prove a
time-warp (if it can observe the TSC going backwards when going from one CPU
to another within a critical section), then the TSC clock-source is turned
off.
The TSC synchronization-checking code also got moved into a separate file.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Fri, 16 Feb 2007 09:27:29 +0000 (01:27 -0800)]
[PATCH] Fix timeout overflow with jiffies
Prevent timeout overflow if timer ticks are behind jiffies (due to high
softirq load or due to dyntick), by limiting the valid timeout range to
MAX_LONG/2.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>