git.karo-electronics.de Git - linux-beck.git/log

]> git.karo-electronics.de Git - linux-beck.git/log

Chris Mason [Thu, 17 Jul 2008 16:54:48 +0000 (12:54 -0400)]

Btrfs: Force caching of metadata block groups on mount to avoid deadlock

This is a temporary change to avoid deadlocks until the extent tree locking
is fixed up.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 17 Jul 2008 16:54:43 +0000 (12:54 -0400)]

btrfs_next_leaf: do readahead when skip_locking is turned on

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 17 Jul 2008 16:54:40 +0000 (12:54 -0400)]

Add a per-inode lock around btrfs_drop_extents

btrfs_drop_extents is always called with a range lock held on the inode.
But, it may operate on extents outside that range as it drops and splits
them.

This patch adds a per-inode mutex that is held while calling
btrfs_drop_extents and while inserting new extents into the tree. It
prevents races from two procs working against adjacent ranges in the tree.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 17 Jul 2008 16:54:15 +0000 (12:54 -0400)]

Btrfs: Don't pin pages in ram until the entire ordered extent is on disk.

Checksum items are not inserted until the entire ordered extent is on disk,
but individual pages might be clean and available for reclaim long before
the whole extent is on disk.

In order to allow those pages to be freed, we need to be able to search
the list of ordered extents to find the checksum that is going to be inserted
in the tree. This way if the page needs to be read back in before
the checksums are in the btree, we'll be able to verify the checksum on
the page.

This commit adds the ability to search the pending ordered extents for
a given offset in the file, and changes btrfs_releasepage to allow
ordered pages to be freed.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 17 Jul 2008 16:54:14 +0000 (12:54 -0400)]

btrfs_start_transaction: wait for commits in progress to finish

btrfs_commit_transaction has to loop waiting for any writers in the
transaction to finish before it can proceed. btrfs_start_transaction
should be polite and not join a transaction that is in the process
of being finished off.

There are a few places that can't wait, basically the ones doing IO that
might be needed to finish the transaction. For them, btrfs_join_transaction
is added.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 17 Jul 2008 16:54:05 +0000 (12:54 -0400)]

Btrfs: Update on disk i_size only after pending ordered extents are done

This changes the ordered data code to update i_size after the extent
is on disk. An on disk i_size is maintained in the in-memory btrfs inode
structures, and this is updated as extents finish.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 17 Jul 2008 16:53:51 +0000 (12:53 -0400)]

Btrfs: Use async helpers to deal with pages that have been improperly dirtied

Higher layers sometimes call set_page_dirty without asking the filesystem
to help. This causes many problems for the data=ordered and cow code.
This commit detects pages that haven't been properly setup for IO and
kicks off an async helper to deal with them.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 17 Jul 2008 16:53:50 +0000 (12:53 -0400)]

Btrfs: New data=ordered implementation

The old data=ordered code would force commit to wait until
all the data extents from the transaction were fully on disk.  This
introduced large latencies into the commit and stalled new writers
in the transaction for a long time.

The new code changes the way data allocations and extents work:

* When delayed allocation is filled, data extents are reserved, and
  the extent bit EXTENT_ORDERED is set on the entire range of the extent.
  A struct btrfs_ordered_extent is allocated an inserted into a per-inode
  rbtree to track the pending extents.

* As each page is written EXTENT_ORDERED is cleared on the bytes corresponding
  to that page.

* When all of the bytes corresponding to a single struct btrfs_ordered_extent
  are written, The previously reserved extent is inserted into the FS
  btree and into the extent allocation trees.  The checksums for the file
  data are also updated.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Tue, 8 Jul 2008 18:32:12 +0000 (14:32 -0400)]

Btrfs: Drop some verbose printks

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Tue, 8 Jul 2008 18:19:17 +0000 (14:19 -0400)]

Btrfs: Add locking around volume management (device add/remove/balance)

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Thu, 26 Jun 2008 14:34:20 +0000 (10:34 -0400)]

Btrfs: Fix deadlock while searching for dead roots on mount

btrfs_find_dead_roots called btrfs_read_fs_root_no_radix, which
means we end up calling btrfs_search_slot with a path already held.

The fix is to remember the key inside btrfs_find_dead_roots and drop
the path.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Wed, 25 Jun 2008 20:14:04 +0000 (16:14 -0400)]

Btrfs: Reduce contention on the root node

This calls unlock_up sooner in btrfs_search_slot in order to decrease the
amount of work done with the higher level tree locks held.

Also, it changes btrfs_tree_lock to spin for a big against the page lock
before scheduling. This makes a big difference in context switch rate under
highly contended workloads.

Longer term, a better locking structure is needed than the page lock.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Wed, 25 Jun 2008 20:01:31 +0000 (16:01 -0400)]

Btrfs: Online btree defragmentation fixes

The btree defragger wasn't making forward progress because the new key wasn't
being saved by the btrfs_search_forward function.

This also disables the automatic btree defrag, it wasn't scaling well to
huge filesystems. The auto-defrag needs to be done differently.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Wed, 25 Jun 2008 20:01:31 +0000 (16:01 -0400)]

Btrfs: Add a per-inode csum mutex to avoid races creating csum items

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Wed, 25 Jun 2008 20:01:31 +0000 (16:01 -0400)]

Btrfs: Change find_extent_buffer to use TestSetPageLocked

This makes it possible for callers to check for extent_buffers in cache
without deadlocking against any btree locks held.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Wed, 25 Jun 2008 20:01:31 +0000 (16:01 -0400)]

Btrfs: Add btree locking to the tree defragmentation code

The online btree defragger is simplified and rewritten to use
standard btree searches instead of a walk up / down mechanism.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Wed, 25 Jun 2008 20:01:31 +0000 (16:01 -0400)]

Btrfs: Replace the transaction work queue with kthreads

This creates one kthread for commits and one kthread for
deleting old snapshots. All the work queues are removed.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree

Chris Mason [Wed, 25 Jun 2008 20:01:31 +0000 (16:01 -0400)]

Add btrfs_end_transaction_throttle to force writers to wait for pending commits

The existing throttle mechanism was often not sufficient to prevent
new writers from coming in and making a given transaction run forever.
This adds an explicit wait at the end of most operations so they will
allow the current transaction to close.

There is no wait inside file_write, inode updates, or cow filling, all which
have different deadlock possibilities.

This is a temporary measure until better asynchronous commit support is
added. This code leads to stalls as it waits for data=ordered
writeback, and it really needs to be fixed.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

commit | commitdiff | tree