Chris Mason [Mon, 7 Sep 2009 22:22:14 +0000 (00:22 +0200)]
ext3: Add locking to ext3_do_update_inode
I've been struggling with this off and on while I've been testing the
data=guarded work. The symptom is corrupted orphan lists and inodes
with the wrong i_size stored on disk. I was convinced the
data=guarded code was just missing a call to ext3_mark_inode_dirty, but
tracing showed the i_disksize I was sending to ext3_mark_inode_dirty
wasn't actually making it to the drive.
ext3_mark_inode_dirty can be called without locks held (atime updates
and a few others), so the data=guarded code uses locks while updating
the in-memory inode, and then calls ext3_mark_inode_dirty
without any locks held.
But, ext3_mark_inode_dirty has no internal locking to make sure that
only one CPU is updating the buffer head at a time. Generally this
works out ok because everyone that changes the inode then calls
ext3_mark_inode_dirty themselves. Even though it races, eventually
someone updates the buffer heads and things move on.
But there is still a risk of the wrong values getting in, and the
data=guarded code seems to hit the race very often.
Since everyone that changes the inode also logs it, it should be
possible to fix this with some memory barriers. I'll leave that as an
exercise to the reader and lock the buffer head instead.
It it probably a good idea to have a different patch series for lockless
bit flipping on the ext3 i_state field. ext3_do_update_inode &= clears
EXT3_STATE_NEW without any locks held.
Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Tue, 11 Aug 2009 17:06:10 +0000 (19:06 +0200)]
ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks()
During truncate we are sometimes forced to start a new transaction as the
amount of blocks to be journaled is both quite large and hard to predict. So
far we restarted a transaction while holding truncate_mutex and that violates
lock ordering because truncate_mutex ranks below transaction start (and it
can lead to a real deadlock with ext3_get_blocks() allocating new blocks
from ext3_writepage()).
Luckily, the problem is easy to fix: We just drop the truncate_mutex before
restarting the transaction and acquire it afterwards. We are safe to do this as
by the time ext3_truncate() is called, all the page cache for the truncated
part of the file is dropped and so writepage() cannot come and allocate new
blocks in the part of the file we are truncating. The rest of writers is
stopped by us holding i_mutex.
Jan Kara [Tue, 11 Aug 2009 15:27:21 +0000 (17:27 +0200)]
jbd: Annotate transaction start also for journal_restart()
lockdep annotation for a transaction start has been at the end of
journal_start(). But a transaction is also started from journal_restart(). Move
the lockdep annotation to start_this_handle() which covers both cases.
Jan Kara [Mon, 3 Aug 2009 17:21:00 +0000 (19:21 +0200)]
jbd: Journal block numbers can ever be only 32-bit use unsigned int for them
It does not make sense to store block number for journal as unsigned long
since they can be only 32-bit (because of on-disk format limitation). So
change in-memory structures and variables to use unsigned int instead.
Jan Kara [Mon, 3 Aug 2009 17:00:57 +0000 (19:00 +0200)]
ext3: Update MAINTAINERS for ext3 and JBD
Stephen agreed that he's unlikely to find time for working on ext3/JBD in the
near future and is not working on it for some time already. So remove him.
Added myself to JBD since after Andrew I'm probably the second most sensible
contact ;).
CC: Stephen Tweedie <sct@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
Andreas Dilger [Thu, 30 Jul 2009 18:09:46 +0000 (20:09 +0200)]
JBD: round commit timer up to avoid uncommitted transaction
Fix jiffie rounding in jbd commit timer setup code. Rounding down could cause
the timer to be fired before the corresponding transaction has expired. That
transaction can stay not committed forever if no new transaction is created or
explicit sync/umount happens.
Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
debugfs: Modify default debugfs directory for debugging pktcdvd.
debugfs: Modified default dir of debugfs for debugging UHCI.
debugfs: Change debugfs directory of IWMC3200
debugfs: Change debuhgfs directory of trace-events-sample.h
debugfs: Fix mount directory of debugfs by default in events.txt
hpilo: add poll f_op
hpilo: add interrupt handler
hpilo: staging for interrupt handling
driver core: platform_device_add_data(): use kmemdup()
Driver core: Add support for compatibility classes
uio: add generic driver for PCI 2.3 devices
driver-core: move dma-coherent.c from kernel to driver/base
mem_class: fix bug
mem_class: use minor as index instead of searching the array
driver model: constify attribute groups
UIO: remove 'default n' from Kconfig
Driver core: Add accessor for device platform data
Driver core: move dev_get/set_drvdata to drivers/base/dd.c
Driver core: add new device to bus's list before probing
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pcmcia-2.6:
pcmcia: document return value of pcmcia_loop_config
pcmcia: dtl1_cs: fix pcmcia_loop_config logic
pcmcia: drop non-existant includes
pcmcia: disable prefetch/burst for OZ6933
pcmcia: fix incorrect argument order to list_add_tail()
pcmcia: drivers/pcmcia/pcmcia_resource.c: Remove unnecessary semicolons
pcmcia: Use phys_addr_t for physical addresses
pcmcia: drivers/pcmcia: Make static
Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
PCI hotplug: clean up acpi_run_hpp()
PCI hotplug: acpiphp: use generic pci_configure_slot()
PCI hotplug: shpchp: use generic pci_configure_slot()
PCI hotplug: pciehp: use generic pci_configure_slot()
PCI hotplug: add pci_configure_slot()
PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
PCI: Clear saved_state after the state has been restored
PCI PM: Return error codes from pci_pm_resume()
PCI: use dev_printk in quirk messages
PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
PCI Hotplug: acpiphp: find bridges the easy way
PCI: pcie portdrv: remove unused variable
PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
ACPI PM: Replace wakeup.prepared with reference counter
PCI PM: Introduce device flag wakeup_prepared
PCI / ACPI PM: Rework some debug messages
PCI PM: Simplify PCI wake-up code
...
Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases. The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
writeback: fix possible bdi writeback refcounting problem
writeback: Fix bdi use after free in wb_work_complete()
writeback: improve scalability of bdi writeback work queues
writeback: remove smp_mb(), it's not needed with list_add_tail_rcu()
writeback: use schedule_timeout_interruptible()
writeback: add comments to bdi_work structure
writeback: splice dirty inode entries to default bdi on bdi_destroy()
writeback: separate starting of sync vs opportunistic writeback
writeback: inline allocation failure handling in bdi_alloc_queue_work()
writeback: use RCU to protect bdi_list
writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout
fs: Assign bdi in super_block
writeback: make wb_writeback() take an argument structure
writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE
writeback: get rid of wbc->for_writepages
fs: remove bdev->bd_inode_backing_dev_info
Nick Piggin [Tue, 15 Sep 2009 19:37:55 +0000 (21:37 +0200)]
writeback: fix possible bdi writeback refcounting problem
wb_clear_pending AFAIKS should not be called after the item has been
put on the list, except by the worker threads. It could lead to the
situation where the refcount is decremented below 0 and cause lots of
problems.
Presumably the !wb_has_dirty_io case is not a common one, so it can
be discovered when the thread wakes up to check?
Also add a comment in bdi_work_clear.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nick Piggin [Tue, 15 Sep 2009 19:34:51 +0000 (21:34 +0200)]
writeback: Fix bdi use after free in wb_work_complete()
By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
can already have been deallocated and used for something else in the
!on stack case, giving a false positive in this test and causing
corruption.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Nick Piggin [Tue, 15 Sep 2009 19:34:12 +0000 (21:34 +0200)]
writeback: improve scalability of bdi writeback work queues
If you're going to do an atomic RMW on each list entry, there's not much
point in all the RCU complexities of the list walking. This is only going
to help the multi-thread case I guess, but it doesn't hurt to do now.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
writeback: splice dirty inode entries to default bdi on bdi_destroy()
We cannot safely ensure that the inodes are all gone at this point
in time, and we must not destroy this bdi with inodes having off it.
So just splice our entries to the default bdi since that one will
always persist.
writeback: separate starting of sync vs opportunistic writeback
bdi_start_writeback() is currently split into two paths, one for
WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
only WB_SYNC_NONE.
Push down the writeback_control allocation and only accept the
parameters that make sense for each function. This cleans up
the API considerably.
Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.
We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().
Note that ->s_bdi assignment is required for proper writeback!
Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
writeback: make wb_writeback() take an argument structure
We need to be able to pass in range_cyclic as well, so instead
of growing yet another argument, split the arguments into a
struct wb_writeback_args structure that we can use internally.
Also makes it easier to just copy all members to an on-stack
struct, since we can't access work after clearing the pending
bit.
writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE
Since it's an opportunistic writeback and not a data integrity action,
don't punt to blocking writeback. Just wakeup the thread and it will
flush old data.
Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
slub: Fix build error in kmem_cache_open() with !CONFIG_SLUB_DEBUG
This build bug:
mm/slub.c: In function 'kmem_cache_open':
mm/slub.c:2476: error: 'disable_higher_order_debug' undeclared (first use in this function)
mm/slub.c:2476: error: (Each undeclared identifier is reported only once
mm/slub.c:2476: error: for each function it appears in.)
Triggers because there's no !CONFIG_SLUB_DEBUG definition for
disable_higher_order_debug.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Intel has officially abandoned this project and does not want to
maintian it or have it included in the main kernel tree, as no one
should use the code, it's not needed anymore.
There is already an in-kernel driver for this hardware (since 2.6.30),
at76c50x-usb, and it supports all of the same devices. So this driver
can now be deleted.
Acked-by: Kalle Valo <kalle.valo@iki.fi> Cc: linux-wireless <linux-wireless@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Axel K [Thu, 3 Sep 2009 19:24:19 +0000 (21:24 +0200)]
Staging: rt3090: remove possible conflict with rt2860
Both drivers (rt2860 and rt3090) register themselves as "rt2860" on
loading the module.
In the very rare case of somebody having two cards in his machine, one
using rt3090 and the other one using the rt2860 driver, loading both
modules would be impossible, the second one will not be loaded as the
kernel will tell you that the driver is already registered.
This was also present with rt2870/rt3070 (with both driver registering
as "rt2870"), but the code has been merged to one driver recently.
The follwoing patch fixes this potential problem until merging of
rt2860/rt3090 code to a single driver.
Axel K [Thu, 3 Sep 2009 19:13:56 +0000 (21:13 +0200)]
Staging: rt2860/rt2870/rt3070/rt3090: fix compiler warning on x86_64
When compiling rt2860/rt2870/rt3070 or rt3090 on x86_64, the following warning
is displayed:
drivers/staging/rt3090/rt_linux.c: In function 'duplicate_pkt':
drivers/staging/rt3090/rt_linux.c:531: warning: passing argument 1 of 'memmove' makes pointer from integer without a cast
include2/asm/string_64.h:58: note: expected 'void *' but argument is of type 'sk_buff_data_t'
drivers/staging/rt3090/rt_linux.c:533: warning: passing argument 1 of 'memmove' makes pointer from integer without a cast
include2/asm/string_64.h:58: note: expected 'void *' but argument is of type 'sk_buff_data_t'
The following patch fixes this warning.
Credits go to Helmut Schaa <hschaa@suse.de> for his kind advice/help on this
patch.
Signed-off-by: Axel Koellhofer <rain_maker@root-forum.org> Cc: Helmut Schaa <hschaa@suse.de> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Axel K [Thu, 3 Sep 2009 18:53:36 +0000 (20:53 +0200)]
Staging: rt2860: add new device ids
This patch adds new device IDs to ralink rt2860 driver in linux staging. The
device IDs were retrieved from the latest vendor release (version 2.1.2.0).
Axel K [Thu, 3 Sep 2009 18:47:11 +0000 (20:47 +0200)]
Staging: rt3090: add device id 1462:891a
This patch adds a new device ID (1462:819a) to ralink rt3090 driver in linux
staging. The device ID was retrieved from the latest vendor release (version
2.2.0.0).
Signed-off-by: Kevin A. Granade <kevin.granade@gmail.com> Cc: Belisko Marek <marek.belisko@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Julia Lawall [Mon, 31 Aug 2009 19:34:25 +0000 (21:34 +0200)]
Staging: rtl8192e: Drop unnecessary NULL test
The result of container_of should not be NULL. In particular, in this case
the argument to the enclosing function has passed though INIT_WORK, which
dereferences it, implying that its container cannot be NULL.
A simplified version of the semantic patch that makes this change is as
follows:
(http://www.emn.fr/x-info/coccinelle/)
H.J. Thomassen [Tue, 25 Aug 2009 22:39:04 +0000 (15:39 -0700)]
Staging: add cowloop driver
Cowloop is a "copy-on-write" pseudo block driver. It can
be stacked on top of a "real" block driver, and catches
all write operations on their way from the file systems
layer above to the real driver below, effectively shielding
the lower driver from those write accesses. The requests are
then diverted to an ordinary file, located somewhere else
(configurable). Later read requests are checked to see whether
they can be serviced by the "real" block driver below, or
must be pulled in from the diverted location. More information
is on the project's website http://www.ATComputing.nl/cowloop/
On TI DA850/OMAP-L138 EVM, HD44780 (24x2) LCD panel is being
used[1], but it is interfaced through the SoC specific LCD
interface and not through parallel port. A parallel port
driver has been developed which interfaces to the panel driver
through the SoC specific LCD interface.
Basically, both the serial and parallel interfaces supported
by the panel driver do not suit the specific interface SoC is
supporting so, a new interface type has been introduced.
Ideally the panel driver should be de-coupled from parallel
and serial port related items but this patch is something
that can be merged in the meantime.
[1]Specification of the character LCD interface on TI DA850/OMAP-L138:
http://www.ti.com/litv/pdf/sprufm0a.
Alan Cox [Thu, 27 Aug 2009 10:02:25 +0000 (11:02 +0100)]
Staging: et131x: put the jagcore routines in with their users
We have two trivial IRQ routines, a single statement and a real function -
relocate them. While we are at it kill the trivial to sort out soft reset
and slv bits in the same areas of code.
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>