Originally from Nick Piggin, just adapted to the newer branch.
You can't check PageLRU without holding zone->lru_lock. The page
release code can get away with it only because the page refcount is 0 at
that point. Also, you can't reliably remove pages from the LRU unless
the refcount is 0. Ever.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Jens Axboe <axboe@suse.de>
[PATCH] splice: page stealing needs to wait_on_page_writeback()
Thanks to Andrew for the good explanation of why this is so. akpm writes:
If a page is under writeback and we remove it from pagecache, it's still
going to get written to disk. But the VFS no longer knows about that page,
nor that this page is about to modify disk blocks.
So there might be scenarios in which those
blocks-which-are-about-to-be-written-to get reused for something else.
When writeback completes, it'll scribble on those blocks.
This won't happen in ext2/ext3-style filesystems in normal mode because the
page has buffers and try_to_release_page() will fail.
But ext2 in nobh mode doesn't attach buffers at all - it just sticks the
page in a BIO, finds some new blocks, points the BIO at those blocks and
lets it rip.
While that write IO's in flight, someone could truncate the file. Truncate
won't block on the writeout because the page isn't in pagecache any more.
So truncate will the free the blocks from the file under the page's feet.
Then something else can reallocate those blocks. Then write data to them.
Now, the original write completes, corrupting the filesystem.
[PATCH] splice: improve writeback and clean up page stealing
By cleaning up the writeback logic (killing write_one_page() and the manual
set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and
just keep it local in pipe_to_file().
This also adds dirty page balancing logic and O_SYNC handling.
* git://oss.sgi.com:8090/oss/git/xfs-2.6:
[XFS] Provide XFS support for the splice syscall.
[XFS] Reenable write barriers by default.
[XFS] Make project quota enforcement return an error code consistent with
[XFS] Implement the silent parameter to fill_super, previously ignored.
[XFS] Cleanup comment to remove reference to obsoleted function
* master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa: (28 commits)
[ALSA] Kconfig SND_SEQUENCER_OSS help text fix
[ALSA] Add Aux input switch control for Aureon Universe
[ALSA] pcxhr - Fix the crash with REV01 board
[ALSA] sound/pci/hda: use create_singlethread_workqueue()
[ALSA] hda-intel - Add support of ATI SB600
[ALSA] cs4281 - Fix the check of timeout in probe
[ALSA] cs4281 - Fix the check of right channel
[ALSA] Test volume resolution of usb audio at initialization
[ALSA] maestro3.c: fix BUG, optimization
[ALSA] HDA/Realtek: multiple input mux definitions and pin mode additions
[ALSA] AdLib FM card driver
[ALSA] Fix / clean up PCM-OSS setup hooks
[ALSA] Clean up PCM codes (take 2)
[ALSA] Tiny clean up of PCM codes
[ALSA] ISA drivers bailing on first !enable[i]
[ALSA] Remove obsolete kfree_nocheck call
[ALSA] Remove obsolete kfree_nocheck call
[ALSA] Add snd-als300 driver for Avance Logic ALS300/ALS300+ soundcards
[ALSA] Add snd-riptide driver for Conexant Riptide chip
[ALSA] hda-codec - Fix noisy output wtih AD1986A 3stack model
...
[PATCH] revert incorrect mutex conversion in hdaps driver
This reverts the mutex conversion that was recently done to the hdaps
driver; this coversion was buggy because the hdaps driver started using
this semaphore in IRQ context, which mutexes do not allow. Easiest
solution for now is to just revert the patch (the patch was part of a
bigger GIT commit, 9a61bf6300533d3b64d7ff29adfec00e596de67d but this
only reverts this one file)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits)
Documentation: fix minor kernel-doc warnings
BUG_ON() Conversion in drivers/net/
BUG_ON() Conversion in drivers/s390/net/lcs.c
BUG_ON() Conversion in mm/slab.c
BUG_ON() Conversion in mm/highmem.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/ptrace.c
BUG_ON() Conversion in ipc/shm.c
BUG_ON() Conversion in fs/freevxfs/
BUG_ON() Conversion in fs/udf/
BUG_ON() Conversion in fs/sysv/
BUG_ON() Conversion in fs/inode.c
BUG_ON() Conversion in fs/fcntl.c
BUG_ON() Conversion in fs/dquot.c
BUG_ON() Conversion in md/raid10.c
BUG_ON() Conversion in md/raid6main.c
BUG_ON() Conversion in md/raid5.c
Fix minor documentation typo
BFP->BPF in Documentation/networking/tuntap.txt
...
Stefan Richter [Sat, 1 Apr 2006 19:11:41 +0000 (21:11 +0200)]
[PATCH] sbp2: fix spinlock recursion
sbp2util_mark_command_completed takes a lock which was already taken by
sbp2scsi_complete_all_commands. This is a regression in Linux 2.6.15.
Reported by Kristian Harms at
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187394
[ More complete commentary, as response to questions by Andrew: ]
> This changes the call environment for all implementations of
> ->Current_done(). Are they all safe to call under this lock?
Short answer: Yes, trust me. ;-) Long answer:
The done() callbacks are passed on to sbp2 from the SCSI stack along
with each SCSI command via the queuecommand hook. The done() callback
is safe to call in atomic context. So does
Documentation/scsi/scsi_mid_low_api.txt say, and many if not all SCSI
low-level handlers rely on this fact. So whatever this callback does,
it is "self-contained" and it won't conflict with sbp2's internal ORB
list handling. In particular, it won't race with the
sbp2_command_orb_lock.
Moreover, sbp2 already calls the done() handler with
sbp2_command_orb_lock taken in sbp2scsi_complete_all_commands(). I
admit this is ultimately no proof of correctness, especially since this
portion of code introduced the spinlock recursion in the first place and
we didn't realize it since this code's submission before 2.6.15 until
now. (I have learned a lesson from this.)
I stress-tested my patch on x86 uniprocessor with a preemptible SMP
kernel (alas I have no SMP machine yet) and made sure that all code
paths which involve the sbp2_command_orb_lock were gone through multiple
times.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/ipath: kbuild infrastructure
IB/ipath: infiniband verbs support
IB/ipath: misc infiniband code, part 2
IB/ipath: misc infiniband code, part 1
IB/ipath: infiniband RC protocol support
IB/ipath: infiniband UC and UD protocol support
IB/ipath: infiniband header files
IB/ipath: layering interfaces used by higher-level driver code
IB/ipath: support for userspace apps using core driver
IB/ipath: sysfs and ipathfs support for core driver
IB/ipath: misc driver support code
IB/ipath: chip initialisation code, and diag support
IB/ipath: support for PCI Express devices
IB/ipath: support for HyperTransport devices
IB/ipath: core driver header files
IB/ipath: core device driver
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC]: Wire up sys_sync_file_range() into syscall tables.
[SPARC]: Wire up sys_splice() into the syscall tables.
[SPARC64]: Update defconfig.
[SPARC64]: Align address in huge_pte_alloc().
[SPARC64]: Document the instruction checks we do in do_sparc64_fault().
[SPARC64]: Make tsb_sync() mm comparison more precise.
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Pavel Pisa [Sun, 2 Apr 2006 18:27:07 +0000 (19:27 +0100)]
[ARM] 3457/1: i.MX: SD/MMC support for i.MX/MX1
Patch from Pavel Pisa
This patch adds support of i.MX/MX1 SD/MMC controller.
It has been significantly redesigned from the original Sascha Hauer's
version to support scatter-gather DMA, to conform to latest Pierre Ossman's
and Russell King's MMC-SD Linux 2.6.x infrastructure.
The handling of all events has been moved to the softirq context
and is designed with no busy-looping in mind. Unfortunately
some controller bugs has to be overcome by limited looping
about 2-20 usec but these are observed only for initial card
recognition phase.
There are still some missing/missed IRQs problems under heavy load.
Help of somebody with access to the full SDHC design information
is probably necessary.
Regenerated against 2.6.16-git-060402 to solve clash with other patches.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Wim Van Sebroeck [Sun, 12 Feb 2006 16:44:57 +0000 (17:44 +0100)]
[WATCHDOG] pcwd.c general clean-up after patches
removal of includes (since we don't use kmalloc and
TASK_INTERRUPTABLE anymore).
Addition of missing commands.
Printk that lets the user know when the module was
unloaded.
Tony Lindgren [Sun, 2 Apr 2006 16:46:30 +0000 (17:46 +0100)]
[ARM] 3433/1: ARM: OMAP: 8/8 Update board files
Patch from Tony Lindgren
This patch syncs OMAP board support with linux-omap tree.
The highlights of the patch are:
- Add support for Nokia 770 by Juha Yrjola
- Add support for Samsung Apollon by Kyungmin Park
- Add support for Amstrad E3 videophone by Jonathan McDowell
- Remove board-netstar.c board support as requested by Ladislav Michl
- Do platform_device registration in board files by Komal Shah et al.
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tony Lindgren [Sun, 2 Apr 2006 16:46:25 +0000 (17:46 +0100)]
[ARM] 3430/1: ARM: OMAP: 5/8 Update PM
Patch from Tony Lindgren
Update OMAP PM code from linux-omap tree:
- Move PM code from plat-omap to mach-omap1 and mach-omap2
by Tony Lindgren
- Add minimal PM support for omap24xx by Tony Lindgren and
Richard Woodruff
- Misc updates to omap1 PM code by Tuukka Tikkanen et al
- Updates to the SRAM code needed for PM and FB by Imre Deak
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tony Lindgren [Sun, 2 Apr 2006 16:46:23 +0000 (17:46 +0100)]
[ARM] 3429/1: ARM: OMAP: 4/8 Update GPIO
Patch from Tony Lindgren
Update OMAP GPIO code from linux-omap tree:
- Fix omap16xx edge control by Juha Yrjola
- Support for additional omap16xx trigger modes by Dirk Behme
- Fix edge detection by Tony Lindgren et al.
- Better support for omap15xx and omap310 by Andrej Zaborowski
- Fix omap15xx interrupt bug by Petukhov Nikolay
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Update OMAP pin multiplexing code from linux-omap tree.
This patch adds new pin configurations by various OMAP
developers, and suport for omap730 by Brian Swetland.
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tony Lindgren [Sun, 2 Apr 2006 16:46:21 +0000 (17:46 +0100)]
[ARM] 3427/1: ARM: OMAP: 2/8 Update timers
Patch from Tony Lindgren
Update OMAP timers from linux-omap tree. The highlights of the
patch are:
- Move timer32k code from mach-omap1 to plat-omap and make it
work also on omap24xx by Tony Lindgren
- Add support for dmtimer idle check for PM by Tuukka Tikkanen
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Update OMAP clock framework from linux-omap tree.
The highlights of the patch are:
- Add support for omap730 clocks by Andrzej Zaborowski
- Fix compile warnings by Dirk Behme
- Add support for using dev id by Tony Lindgren and Komal Shah
- Move memory timings and PRCM into separate files by Tony Lindgren
Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andrew Victor [Sun, 2 Apr 2006 16:15:51 +0000 (17:15 +0100)]
[ARM] 3396/2: AT91RM9200 Platform devices update
Patch from Andrew Victor
This patch updates the platform device resources for the Ethernet and
MMC peripherals. It also adds platform device information for the NAND
(SmartMedia), I2C and the RTC.
(This version of the patch can be applied before Patch 3392/1)
Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pavel Pisa [Sun, 2 Apr 2006 15:58:37 +0000 (16:58 +0100)]
[ARM] 3444/1: i.MX: Scatter-gather DMA emulation for i.MX/MX1
Patch from Pavel Pisa
This patch contains simplified set of changes to add scatter-gather
emulation capability into MX1 DMA support. The result should
be still usable for next combination of DMA transfers
Statter-Gather/linear/2D/FIFO to linear/2D/FIFO and
linear/2D/FIFO to Statter-Gather/2D/FIFO
The patch corrects channel priority allocation to be compatible
with MX1 hardware implementation.
Previous code has not been adapted from its PXA original.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
A lot of EH codes are about to be added to libata. Separate out
libata-eh.c. ata_scsi_timed_out(), ata_scsi_error(),
ata_qc_timeout(), ata_eng_timeout(), ata_eh_qc_complete() and
ata_eh_qc_retry() are moved. No code is changed by this patch.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: dec scmd->retries for qcs with zero err_mask
qcs might get retried because of unrelated failure. e.g. NCQ command
failure causes the whole command set to be aborted. Decrement
scmd->retries for such retrials to avoid unnecessarily failing
commands. Note that scmd->retries will be incremented the first time.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: don't read TF directly from sense generation functions
TF register might not be directly accessible depending on errors.
e.g. TF of failed NCQ command is in log page 10h. Make reading TF
responsibility of error handlers. For the current EH, simply push TF
reading into qc completion functions as they are practically part of
EH. New EH will fill qc->tf with status registers before complting
qcs.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: always generate sense if qc->err_mask is non-zero
Current sense generation code does not generate sense error if status
register value doesn't indicate error condition. However, LLDD's may
indicate errors which 't show up in status register. Completing such
qc's without generating sense results in successful completion of
failed commands.
Invoke ata_to_sense_error() regardless of status register if
qc->err_mask is not zero such that ata_to_sense_error() generates
default sense error.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: pass qc around intead of ap during PIO
The current code passes pointer to ap around and repeatedly performs
ata_qc_from_tag() to access the ongoing qc. This is unnatural and
makes EH synchronization cumbersome. Make PIO codes deal with qc
instead of ap.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add a new qc flag ATA_QCFLAG_IO. This flag gets set for normal IO
commands originating from SCSI midlayer. This information will be
used by EH to determine transfer speed reconfiguration.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: clear only affected flags during ata_dev_configure()
ata_dev_configure() should not clear dynamic device flags determined
elsewhere. Lower eight bits are reserved for feature flags, define
ATA_DFLAG_CFG_MASK and clear only those bits before configuring
device. Without this patch, ATA_DFLAG_PIO gets turned off during
revalidation making PIO mode unuseable.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* Reorder ATA_DFLAG_* such that feature flags determined by
ata_dev_configure() are on lower bits. Reserve lower eight bits
for this purpose and allocate dynamic flags from bit 8.
* Reorder ATA_FLAG_* such that feature flags determined during driver
initiailization are on bits 0:15, dynamic flags on 16:23 and LLDD
specific flags on 24:31.
* Kill trailing white space and lower-case an one line comment for
consistency.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: consider disabled devices in ata_dev_xfermask()
ata_bus_probe() now marks failed devices properly and leaves
meaningful transfer mode masks. This patch makes ata_dev_xfermask()
consider disable devices when determining PIO mode to avoid violating
device selection timing.
While at it, move port-wide resttriction out of device iteration loop
and try to make the function look a bit prettier.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Improve ata_bus_probe() such that configuration failures are handled
better. Each device is given ATA_PROBE_MAX_TRIES chances, but any
non-transient error (revalidation failure with -ENODEV, configuration
failure with -EINVAL...) disables the device directly. Any IO error
results in SATA PHY speed down and ata_set_mode() failure lowers
transfer mode. The last try always puts a device into PIO-0.
After each failure, the whole port is reset to make sure that the
controller and all the devices are in a known and stable state. The
reset also applies SATA SPD configuration if necessary.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement ata_down_xfermask_limit(). This function manipulates
@dev->pio/mwdma/udma_mask such that the next lower transfer mode is
selected. This will be used to improve ata_bus_probe() failure
handling and later by EH.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: use SATA speed down in ata_drive_probe_reset()
Make ata_drive_probe_reset() use SATA SPD configuration. Hardreset
will be force if speed renegotiation is necessary. Also, if a
hardreset fails, PHY speed is stepped down and hardreset is retried
until the lowest speed is reached.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
[PATCH] libata: implement ap->sata_spd_limit and helpers
ap->sata_spd_limit contrains SATA PHY speed of the port. It is
initialized to the configured value prior to probing thus preserving
BIOS configured value. hardreset is responsible for applying SPD
limit and sata_std_hardreset() is updated to do that. SATA SPD limit
will be used to enhance failure handling during probing and later by
EH.
This patch also normalizes some comments around affected code.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
For the time being we cannot use ata_dev_present() as it was renamed
to ata_dev_enabled() but we still need presence test. Implement
negation of the test. Conveniently, the negated result is needed in
more places. This is suggested by Jeff Garzik.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>