Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
Btrfs: use the commit_root for reading free_space_inode crcs
Btrfs: reduce extent_state lock contention for metadata
Btrfs: remove lockdep magic from btrfs_next_leaf
Btrfs: make a lockdep class for each root
Btrfs: switch the btrfs tree locks to reader/writer
Btrfs: fix deadlock when throttling transactions
Btrfs: stop using highmem for extent_buffers
Btrfs: fix BUG_ON() caused by ENOSPC when relocating space
Btrfs: tag pages for writeback in sync
Btrfs: fix enospc problems with delalloc
Btrfs: don't flush delalloc arbitrarily
Btrfs: use find_or_create_page instead of grab_cache_page
Btrfs: use a worker thread to do caching
Btrfs: fix how we merge extent states and deal with cached states
Btrfs: use the normal checksumming infrastructure for free space cache
Btrfs: serialize flushers in reserve_metadata_bytes
Btrfs: do transaction space reservation before joining the transaction
Btrfs: try to only do one btrfs_search_slot in do_setxattr
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: optimize the negative xattr caching
xfs: prevent against ioend livelocks in xfs_file_fsync
xfs: flag all buffers as metadata
xfs: encapsulate a block of debug code
iscsi-target: Fix CONFIG_SMP=n and CONFIG_MODULES=n build failure
This patch fixes the following CONFIG_SMP=n and CONFIG_MODULES=n build
failure, because iscsit_thread_get_cpumask() is defined as a macro in
iscsi_target.c, but needed by iscsi_target_login.c
drivers/built-in.o: In function `iscsi_post_login_handler':
iscsi_target_login.c:(.text+0x13a315): undefined reference to `iscsit_thread_get_cpumask'
iscsi_target_login.c:(.text+0x13a4b4): undefined reference to `iscsit_thread_get_cpumask'
make: *** [.tmp_vmlinux1] Error 1
Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
iscsi-target: Fix uninitialized usage of cmd->pad_bytes
This patch fixes an uninitialized usage of cmd->pad_bytes inside of
iscsit_handle_text_cmd() introduced during a v4.1 change to use cmd
members instead of local pad_bytes variables.
Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Dan Carpenter [Wed, 27 Jul 2011 09:58:17 +0000 (12:58 +0300)]
iscsi-target: Fix NULL dereference on allocation failure
This patch fixes a bug in iscsi_target_init_negotiation() where
the "goto out" path dereferences "login" which is NULL upon a
memory allocation failure.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
nfs: don't use d_move in nfs_async_rename_done
RDMA: Increasing RPCRDMA_MAX_DATA_SEGS
SUNRPC: Replace xprt->resend and xprt->sending with a priority queue
SUNRPC: Allow caller of rpc_sleep_on() to select priority levels
SUNRPC: Support dynamic slot allocation for TCP connections
SUNRPC: Clean up the slot table allocation
SUNRPC: Initalise the struct xprt upon allocation
SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
pnfs: simplify pnfs files module autoloading
nfs: document nfsv4 sillyrename issues
NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL
SUNRPC: sunrpc should not explicitly depend on NFS config options
NFS: Clean up - simplify the switch to read/write-through-MDS
NFS: Move the pnfs write code into pnfs.c
NFS: Move the pnfs read code into pnfs.c
NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
NFS: Cache rpc_ops in struct nfs_pageio_descriptor
...
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
iscsi-target: Add iSCSI fabric support for target v4.1
iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
Chris Mason [Wed, 27 Jul 2011 19:57:44 +0000 (15:57 -0400)]
Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
The btrfs transaction code will return any errors that come from
reserve_metadata_bytes. We need to make sure we don't return funny
things like 1 or EAGAIN.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Now that the data writing is part of fsync proper, we can split
the waiting part out and do it later on. This reduces the
number of waits that we do during fsync on average.
There is also no need to take the i_mutex unless we are flushing
metadata to disk, so we can move that to within the metadata
flushing code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since there is now only a single caller to gfs2_dir_read_data()
and it has a number of constant arguments, we can factor
those out. Also some tests relating to the inode size were
being done twice.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
signals: sys_ssetmask/sys_rt_sigsuspend should use set_current_blocked()
sys_ssetmask(), sys_rt_sigsuspend() and compat_sys_rt_sigsuspend()
change ->blocked directly. This is not correct, see the changelog in e6fa16ab "signal: sigprocmask() should do retarget_shared_pending()"
Change them to use set_current_blocked().
Another change is that now we are doing ->saved_sigmask = ->blocked
lockless, it doesn't make any sense to do this under ->siglock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 27 Jul 2011 18:47:03 +0000 (21:47 +0300)]
proc: make struct proc_dir_entry::name a terminal array rather than a pointer
Since __proc_create() appends the name it is given to the end of the PDE
structure that it allocates, there isn't a need to store a name pointer.
Instead we can just replace the name pointer with a terminal char array of
_unspecified_ length. The compiler will simply append the string to statically
defined variables of PDE type overlapping any hole at the end of the structure
and, unlike specifying an explicitly _zero_ length array, won't give a warning
if you try to statically initialise it with a string of more than zero length.
Also, whilst we're at it:
(1) Move namelen to end just prior to name and reduce it to a single byte
(name shouldn't be longer than NAME_MAX).
(2) Move pde_unload_lock two places further on so that if it's four bytes in
size on a 64-bit machine, it won't cause an unused hole in the PDE struct.
Chris Mason [Tue, 26 Jul 2011 19:35:09 +0000 (15:35 -0400)]
Btrfs: use the commit_root for reading free_space_inode crcs
Now that we are using regular file crcs for the free space cache,
we can deadlock if we try to read the free_space_inode while we are
updating the crc tree.
This commit fixes things by using the commit_root to read the crcs. This is
safe because we the free space cache file would already be loaded if
that block group had been changed in the current transaction.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Mon, 25 Jul 2011 10:50:50 +0000 (06:50 -0400)]
Btrfs: reduce extent_state lock contention for metadata
For metadata buffers that don't straddle pages (all of them), btrfs
can safely use the page uptodate bits and extent_buffer uptodate bit
instead of needing to use the extent_state tree.
This greatly reduces contention on the state tree lock.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 26 Jul 2011 20:11:19 +0000 (16:11 -0400)]
Btrfs: make a lockdep class for each root
This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.
This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Sat, 16 Jul 2011 19:23:14 +0000 (15:23 -0400)]
Btrfs: switch the btrfs tree locks to reader/writer
The btrfs metadata btree is the source of significant
lock contention, especially in the root node. This
commit changes our locking to use a reader/writer
lock.
The lock is built on top of rw spinlocks, and it
extends the lock tracking to remember if we have a
read lock or a write lock when we go to blocking. Atomics
count the number of blocking readers or writers at any
given time.
It removes all of the adaptive spinning from the old code
and uses only the spinning/blocking hints inside of btrfs
to decide when it should continue spinning.
In read heavy workloads this is dramatically faster. In write
heavy workloads we're still faster because of less contention
on the root node lock.
We suffer slightly in dbench because we schedule more often
during write locks, but all other benchmarks so far are improved.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Sun, 24 Jul 2011 19:45:34 +0000 (15:45 -0400)]
Btrfs: fix deadlock when throttling transactions
Hit this nice little deadlock. What happens is this
__btrfs_end_transaction with throttle set, --use_count so it equals 0
btrfs_commit_transaction
<somebody else actually manages to start the commit>
btrfs_end_transaction --use_count so now its -1 <== BAD
we just return and wait on the transaction
This is bad because we just return after our use_count is -1 and don't let go
of our num_writer count on the transaction, so the guy committing the
transaction just sits there forever. Fix this by inc'ing our use_count if we're
going to call commit_transaction so that if we call btrfs_end_transaction it's
valid. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
The reason is:
Task1 Space balance task
do_chunk_alloc()
__finish_chunk_alloc()
update device info
in the chunk tree
alloc system metadata block
relocate system metadata block group
set system metadata block group
readonly, This block group is the
only one that can allocate space. So
there is no free space that can be
allocated now.
find no space and don't try
to alloc new chunk, and then
return ENOSPC
BUG_ON() in __finish_chunk_alloc()
was triggered.
Fix this bug by allocating a new system metadata chunk before relocating the
old one if we find there is no free space which can be allocated after setting
the old block group to be read-only.
Josef Bacik [Fri, 15 Jul 2011 21:26:38 +0000 (21:26 +0000)]
Btrfs: tag pages for writeback in sync
Everybody else does this, we need to do it too. If we're syncing, we need to
tag the pages we're going to write for writeback so we don't end up writing the
same stuff over and over again if somebody is constantly redirtying our file.
This will keep us from having latencies with heavy sync workloads. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Fri, 15 Jul 2011 15:16:44 +0000 (15:16 +0000)]
Btrfs: fix enospc problems with delalloc
So I had this brilliant idea to use atomic counters for outstanding and reserved
extents, but this turned out to be a bad idea. Consider this where we have 1
outstanding extent and 1 reserved extent
Reserver Releaser
atomic_dec(outstanding) now 0
atomic_read(outstanding)+1 get 1
atomic_read(reserved) get 1
don't actually reserve anything because
they are the same
atomic_cmpxchg(reserved, 1, 0)
atomic_inc(outstanding)
atomic_add(0, reserved)
free reserved space for 1 extent
Then the reserver now has no actual space reserved for it, and when it goes to
finish the ordered IO it won't have enough space to do it's allocation and you
get those lovely warnings.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Fri, 15 Jul 2011 16:01:03 +0000 (16:01 +0000)]
Btrfs: don't flush delalloc arbitrarily
Kill the check to see if we have 512mb of reserved space in delalloc and
shrink_delalloc if we do. This causes unexpected latencies and we have other
logic to see if we need to throttle. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Mon, 11 Jul 2011 14:47:06 +0000 (10:47 -0400)]
Btrfs: use find_or_create_page instead of grab_cache_page
grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we
need GFP_NOFS so we don't deadlock. Thanks,
Josef Bacik [Thu, 30 Jun 2011 18:42:28 +0000 (14:42 -0400)]
Btrfs: use a worker thread to do caching
A user reported a deadlock when copying a bunch of files. This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread. The page was locked by the person
starting the caching kthread. To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks. Thanks,
Reported-by: Roman Mamedov <rm@romanrm.ru> Signed-off-by: Josef Bacik <josef@redhat.com>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (53 commits)
Input: synaptics - fix reporting of min coordinates
Input: tegra-kbc - enable key autorepeat
Input: kxtj9 - fix locking typo in kxtj9_set_poll()
Input: kxtj9 - fix bug in probe()
Input: intel-mid-touch - remove pointless checking for variable 'found'
Input: hp_sdc - staticize hp_sdc_kicker()
Input: pmic8xxx-keypad - fix a leak of the IRQ during init failure
Input: cy8ctmg110_ts - set reset_pin and irq_pin from platform data
Input: cy8ctmg110_ts - constify i2c_device_id table
Input: cy8ctmg110_ts - fix checking return value of i2c_master_send
Input: lifebook - make dmi callback functions return 1
Input: atkbd - make dmi callback functions return 1
Input: gpio_keys - switch to using SIMPLE_DEV_PM_OPS
Input: gpio_keys - add support for device-tree platform data
Input: aiptek - remove double define
Input: synaptics - set minimum coordinates as reported by firmware
Input: synaptics - process button bits in AGM packets
Input: synaptics - rename set_slot to be more descriptive
Input: synaptics - fuzz position for touchpad with reduced filtering
Input: synaptics - set resolution for MT_POSITION_X/Y axes
...
Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze
* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Do not show error message for 32 interrupt lines
Revert "microblaze: PCI fix typo fault in of_node pointer moving into pci_bus"
microblaze: PCI fix typo fault in of_node pointer moving into pci_bus
microblaze: Add support for early console on mdm
microblaze: Simplify early console binding from DT
microblaze: Get early printk console earlier
microblaze: Standardise cpuinfo output for cache policy
microblaze: Unprivileged stream instruction awareness
microblaze: trivial: Fix typo fault
microblaze: exec: Remove redundant set_fs(USER_DS)
microblaze: Remove duplicated prototype of start_thread()
microblaze: Fix unaligned value saving to the stack for system with MMU
microblaze/irqs: Do not trace arch_local_{*,irq_*} functions
Improve the documentation for the slave and cyclic DMA engine support
reformatting it for easier reading, adding further APIs, splitting it
into five steps, and including references to the documentation in
dmaengine.h.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[Fixed the index title to reflect new changes] Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Julia Lawall [Mon, 11 Jul 2011 21:08:25 +0000 (14:08 -0700)]
[SCSI] ipr: reorder error handling code to include iounmap
The out_msi_disable label should be before cleanup_nomem to additionally
benefit from the call to iounmap. Subsequent gotos are adjusted to go to
out_msi_disable instead of cleanup_nomem, which now follows it. This is
safe because pci_disable_msi does nothing if pci_enable_msi was not called.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
expression e1,e2;
statement S;
@@
e1 = pci_ioremap_bar(...);
... when != e1 = e2
when != iounmap(e1)
when any
(
if (<+...e1...+>) S
|
if(...) { ... return 0; }
|
if (...) { ... when != iounmap(e1)
when != if (...) { ... iounmap(e1) ... }
* return ...;
} else S
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Dan Rosenberg [Mon, 11 Jul 2011 21:08:23 +0000 (14:08 -0700)]
[SCSI] pmcraid: reject negative request size
There's a code path in pmcraid that can be reached via device ioctl that
causes all sorts of ugliness, including heap corruption or triggering the
OOM killer due to consecutive allocation of large numbers of pages.
First, the user can call pmcraid_chr_ioctl(), with a type
PMCRAID_PASSTHROUGH_IOCTL. This calls through to
pmcraid_ioctl_passthrough(). Next, a pmcraid_passthrough_ioctl_buffer
is copied in, and the request_size variable is set to
buffer->ioarcb.data_transfer_length, which is an arbitrary 32-bit
signed value provided by the user. If a negative value is provided
here, bad things can happen. For example,
pmcraid_build_passthrough_ioadls() is called with this request_size,
which immediately calls pmcraid_alloc_sglist() with a negative size.
The resulting math on allocating a scatter list can result in an
overflow in the kzalloc() call (if num_elem is 0, the sglist will be
smaller than expected), or if num_elem is unexpectedly large the
subsequent loop will call alloc_pages() repeatedly, a high number of
pages will be allocated and the OOM killer might be invoked.
It looks like preventing this value from being negative in
pmcraid_ioctl_passthrough() would be sufficient.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Herbert Xu [Wed, 27 Jul 2011 13:16:28 +0000 (06:16 -0700)]
gro: Only reset frag0 when skb can be pulled
Currently skb_gro_header_slow unconditionally resets frag0 and
frag0_len. However, when we can't pull on the skb this leaves
the GRO fields in an inconsistent state.
This patch fixes this by only resetting those fields after the
pskb_may_pull test.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[SCSI] libsas: remove expander from dev list on error
If expander discovery fails (sas_discover_expander()), remove the
expander from the port device list (sas_ex_discover_expander()),
before freeing it. Else the list is corrupted and, e.g., when we
attempt to send SMP commands to other devices, the kernel oopses.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Cc: stable@kernel.org Reviewed-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] bnx2fc: Enable REC & CONF support for the session
Based on PRLI response, identify if the target is FCP-2 (seq level error
recovery) capable, and appropriately set the corresponding CONF, REC flags when
offloading the session.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] bnx2fc: Introduce interface structure for each vlan interface
Currently, bnx2fc has a hba structure that can work with only a single vlan
interface. When there is a change in vlan id, it does not have the capability
to switch to different vlan interface. To solve this problem, a new structure
called 'interface' has been introduced, and each hba can now have multiple
interfaces, one per vlan id.
Most of the patch is a moving the interface specific fields from hba to the
interface structure, and appropriately modifying the dereferences. A list of
interfaces (if_list) is maintained along with adapter list. During a create
call, the interface structure is allocated and added to if_list and deleted &
freed on a destroy call. Link events are propagated to all interfaces
belonging to the hba.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[SCSI] hpsa: retry commands completing with status of UNSOLICITED_ABORT
In a shared SAS setup, target devices may be reset by one of
several hosts, and outstanding commands on that device will be
completed to corresponding hosts with status of UNSOLICITED_ABORT.
Such commands should be retried instead of being treated as i/o
errors. Also fixed a nearby spelling error.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
James Smart [Fri, 22 Jul 2011 22:38:16 +0000 (18:38 -0400)]
[SCSI] lpfc 8.3.25: Change driver version to 8.3.25
Change driver version to 8.3.25
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>