Jan Kara [Fri, 12 Apr 2013 04:03:42 +0000 (00:03 -0400)]
jbd2: reduce journal_head size
Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem
to be happening) and change b_modified and b_jlist to bitfields thus
saving 8 bytes in the structure.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Jan Kara [Fri, 12 Apr 2013 04:03:19 +0000 (00:03 -0400)]
ext4: clear buffer_uninit flag when submitting IO
Currently noone cleared buffer_uninit flag. This results in writeback
needlessly marking io_end as needing extent conversion scanning extent
tree for extents to convert. So clear the buffer_uninit flag once the
buffer is submitted for IO and the flag is transformed into
EXT4_IO_END_UNWRITTEN flag.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Jan Kara [Fri, 12 Apr 2013 03:56:53 +0000 (23:56 -0400)]
ext4: use io_end for multiple bios
Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Jan Kara [Fri, 12 Apr 2013 03:48:32 +0000 (23:48 -0400)]
ext4: make ext4_bio_write_page() use BH_Async_Write flags
So far ext4_bio_write_page() attached all the pages to ext4_io_end
structure. This makes that structure pretty heavy (1 KB for pointers
+ 16 bytes per page attached to the bio). Also later we would like to
share ext4_io_end structure among several bios in case IO to a single
extent needs to be split among several bios and pointing to pages from
ext4_io_end makes this complex.
We remove page pointers from ext4_io_end and use pointers from bio
itself instead. This isn't as easy when blocksize < pagesize because
then we can have several bios in flight for a single page and we have
to be careful when to call end_page_writeback(). However this is a
known problem already solved by block_write_full_page() /
end_buffer_async_write() so we mimic its behavior here. We mark
buffers going to disk with BH_Async_Write flag and in
ext4_bio_end_io() we check whether there are any buffers with
BH_Async_Write flag left. If there are not, we can call
end_page_writeback().
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
- grab_cache_page_write_begin() may not wait on page's writeback since
(1d1d1a767206). But it is still reasonable to wait on page's writeback
here in order to be on the safe side.
- Fix miss typo: pass 'length' instead of 'end' to __block_write_begin()
https://bugzilla.kernel.org/show_bug.cgi?id=56241
ext4: do not convert to indirect with bigalloc enabled
With bigalloc feature enabled we do not support indirect addressing at all
so we have to prevent extent addressing to indirect addressing
conversion in this case. The problem has been introduced with the commit
"ext4: support simple conversion of extent-mapped inodes to use i_blocks"
Currently in ENOSPC condition when writing into unwritten space, or
punching a hole, we might need to split the extent and grow extent tree.
However since we can not allocate any new metadata blocks we'll have to
zero out unwritten part of extent or punched out part of extent, or in
the worst case return ENOSPC even though use actually does not allocate
any space.
Also in delalloc path we do reserve metadata and data blocks for the
time we're going to write out, however metadata block reservation is
very tricky especially since we expect that logical connectivity implies
physical connectivity, however that might not be the case and hence we
might end up allocating more metadata blocks than previously reserved.
So in future, metadata reservation checks should be removed since we can
not assure that we do not under reserve.
And this is where reserved space comes into the picture. When mounting
the file system we slice off a little bit of the file system space (2%
or 4096 clusters, whichever is smaller) which can be then used for the
cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.
The number of reserved clusters can be set via sysfs, however it can
never be bigger than number of free clusters in the file system.
Note that this patch fixes the failure of xfstest 274 as expected.
Improve mb_free_blocks speed by clearing entire range at once instead
of iterating over each bit. Freeing block-by-block also makes buddy
bitmap subtree flip twice making most of the work a no-op. Very few
bits in buddy bitmap require change, e.g. freeing entire group is a 1
bit flip only. As a result, releasing blocks of 60G file now takes
5ms instead of 2.7s. This is especially good for non-preemptive
kernels as there is no rescheduling during release.
Eric Whitney [Tue, 9 Apr 2013 13:27:31 +0000 (09:27 -0400)]
ext4: fix free space estimate in ext4_nonda_switch()
Values stored in s_freeclusters_counter and s_dirtyclusters_counter
are both in cluster units. Remove the cluster to block conversion
applied to s_freeclusters_counter causing an inflated estimate of
free space because s_dirtyclusters_counter is not similarly
converted. Rename free_blocks and dirty_blocks to better reflect
the units these variables contain to avoid future confusion. This
fix corrects ENOSPC failures for xfstests 127 and 231 on bigalloc
file systems.
Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Tue, 9 Apr 2013 13:21:41 +0000 (09:21 -0400)]
ext4: fix deadlock with quota feature
We didn't mark hidden quota files with S_NOQUOTA flag and thus quota was
accounted even for quota files. Thus we could recurse back to quota code
when adding new blocks to quota file which can easily deadlock. Mark
hidden quota files properly.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT
Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
associated attributes (like i_blocks, i_size, i_flags, ...) from the
specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
typically used to store a boot loader in a secure part of the
filesystem, where it can't be changed by a normal user by accident.
The data blocks of the previous boot loader will be associated with
the given inode.
This usercode program is a simple example of the usage:
int main(int argc, char *argv[])
{
int fd;
int err;
Currently when inserting extent in ext4_ext_insert_extent() we would
only try to to see if we can append new extent to the found extent. If
we can not, then we proceed with adding new extent into the extent tree,
but then possibly merging it back again.
We can avoid this situation by trying to append and prepend new extent
to the existing ones. However since the new extent can be on either
sides of the existing extent, we have to pick the right extent to try to
append/prepend to.
This patch adds the conditions to pick the right extent to
append/prepend to and adds the actual prepending condition as well. This
will also eliminate the need to use "reserved" block for possibly
growing extent tree.
ext4: Transfer initialized block to right neighbor if possible
Currently when converting extent to initialized we attempt to transfer
initialized block to the left neighbour if possible when certain
criteria are met. However we do not attempt to do the same for the
right neighbor.
This commit adds the possibility to transfer initialized block to the
right neighbour if:
1. We're not converting the whole extent
2. Both extents are stored in the same extent tree node
3. Right neighbor is initialized
4. Right neighbor is logically abutting the current one
5. Right neighbor is physically abutting the current one
6. Right neighbor would not overflow the length limit
This is basically the same logic as with transferring to the left. This
will gain us some performance benefits since it is faster than inserting
extent and then merging it.
It would also prevent some situation in delalloc patch when we might run
out of metadata reservation. This is due to the fact that we would
attempt to split the extent first (possibly allocating new metadata
block) even though we did not counted for that because it can (and will)
be merged again. This commit fix that scenario, because we no longer
need to split the extent in such case.
Currently on many places in ext4 we're using
ext4_get_group_no_and_offset() even though we're only interested in
knowing the block group of the particular block, not the offset within
the block group so we can use more efficient way to compute block
group.
This patch introduces ext4_get_group_number() which computes block
group for a given block much more efficiently. Use this function
instead of ext4_get_group_no_and_offset() everywhere where we're only
interested in knowing the block group.
ext4: make ext4_block_in_group() much more efficient
Currently in when getting the block group number for a particular
block in ext4_block_in_group() we're using
ext4_get_group_no_and_offset() which uses do_div() to get the block
group and the remainer which is offset within the group.
We don't need all of that in ext4_block_in_group() as we only need to
figure out the group number.
This commit changes ext4_block_in_group() to calculate group number
directly. This shows as a big improvement with regards to cpu
utilization. Measuring fallocate -l 15T on fresh file system with perf
showed that 23% of cpu time was spend in the
ext4_get_group_no_and_offset(). With this change it completely
disappears from the list only bumping the occurrence of
ext4_init_block_bitmap() which is the biggest user of
ext4_block_in_group() by 4%. As the result of this change on my system
the fallocate call was approx. 10% faster.
However since there is '-g' option in mkfs which allow us setting
different groups size (mostly for developers) I've introduced new per
file system flag whether we have a standard block group size or
not. The flag is used to determine whether we can use the bit shift
optimization or not.
It is incorrect to use list_for_each_entry_safe() for journal callback
traversial because ->next may be removed by other task:
->ext4_mb_free_metadata()
->ext4_mb_free_metadata()
->ext4_journal_callback_del()
This patch fix the issue as follows:
- ext4_journal_commit_callback() make list truly traversial safe
simply by always starting from list_head
- fix race between two ext4_journal_callback_del() and
ext4_journal_callback_try_del()
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.com
In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk. This makes callback longer and race
window becomes wider.
In order to fix this we should mark transaction as finished only after
callbacks have completed
ext4: support simple conversion of extent-mapped inodes to use i_blocks
In order to make it simpler to test the code which support
i_blocks/indirect-mapped inodes, support the conversion of inodes
which are less than 12 blocks and which are contained in no more than
a single extent.
The primary intended use of this code is to converting freshly created
zero-length files and empty directories.
Note that the version of chattr in e2fsprogs 1.42.7 and earlier has a
check that prevents the clearing of the extent flag. A simple patch
which allows "chattr -e <file>" to work will be checked into the
e2fsprogs git repository.
ext4/jbd2: don't wait (forever) for stale tid caused by wraparound
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Ben Hutchings <ben@decadent.org.uk> Reported-by: George Barnett <gbarnett@atlassian.com> Cc: stable@vger.kernel.org
Move common code in ext4_ind_punch_hole() and ext4_ext_punch_hole()
into ext4_punch_hole(). This saves over 150 lines of code.
This also fixes a potential bug when the punch_hole() code is racing
against indirect-to-extents or extents-to-indirect migation. We are
currently using i_mutex to protect against changes to the inode flag;
specifically, the append-only, immutable, and extents inode flags. So
we need to take i_mutex before deciding whether to use the
extents-specific or indirect-specific punch_hole code.
Also, there was a missing call to ext4_inode_block_unlocked_dio() in
the indirect punch codepath. This was added in commit 02d262dffcf4c
to block DIO readers racing against the punch operation in the
codepath for extent-mapped inodes, but it was missing for
indirect-block mapped inodes. One of the advantages of refactoring
the code is that it makes such oversights much less likely.
ext4: fold ext4_alloc_blocks() in ext4_alloc_branch()
The older code was far more complicated than it needed to be because
of how we spliced in the ext4's new multiblock allocator into ext3's
indirect block code. By folding ext4_alloc_blocks() into
ext4_alloc_branch(), we make the code far more understable, shave off
over 130 lines of code and half a kilobyte of compiled object code.
Zheng Liu [Wed, 3 Apr 2013 16:41:17 +0000 (12:41 -0400)]
ext4: fold ext4_generic_write_end() into ext4_write_end()
After collapsing the handling of data ordered and data writeback
codepath, ext4_generic_write_end() has only one caller,
ext4_write_end(). So we fold it into ext4_write_end().
ext4: collapse handling of data=ordered and data=writeback codepaths
The only difference between how we handle data=ordered and
data=writeback is a single call to ext4_jbd2_file_inode(). Eliminate
code duplication by factoring out redundant the code paths.
Zheng Liu [Wed, 3 Apr 2013 16:27:18 +0000 (12:27 -0400)]
ext4: fix big-endian bugs which could cause fs corruptions
When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out. So
fix it.
[ Also fix a bug found by Dmitry Monakhov where we were missing
le32_to_cpu() calls in the new indirect punch hole code.
There are a number of other big endian warnings found by static code
analyzers, but we'll wait for the next merge window to fix them all
up. These fixes are designed to be Obviously Correct by code
inspection, and easy to demonstrate that it won't make any
difference (and hence, won't introduce any bugs) on little endian
architectures such as x86. --tytso ]
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: CAI Qian <caiqian@redhat.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Linus Torvalds [Sun, 31 Mar 2013 18:41:47 +0000 (11:41 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes from Vinod Koul:
"Two fixes for slave-dmaengine.
The first one is for making slave_id value correct for dw_dmac and
the other one fixes the endieness in DT parsing"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dw_dmac: adjust slave_id accordingly to request line base
dmaengine: dw_dma: fix endianess for DT xlate function
Linus Torvalds [Sun, 31 Mar 2013 18:40:33 +0000 (11:40 -0700)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For a some fixes for Kernel 3.9:
- subsystem build fix when VIDEO_DEV=y, VIDEO_V4L2=m and I2C=m
- compilation fix for arm multiarch preventing IR_RX51 to be selected
- regression fix at bttv crop logic
- s5p-mfc/m5mols/exynos: a few fixes for cameras on exynos hardware"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] [REGRESSION] bt8xx: Fix too large height in cropcap
[media] fix compilation with both V4L2 and I2C as 'm'
[media] m5mols: Fix bug in stream on handler
[media] s5p-fimc: Do not attempt to disable not enabled media pipeline
[media] s5p-mfc: Fix encoder control 15 issue
[media] s5p-mfc: Fix frame skip bug
[media] s5p-fimc: send valid m2m ctx to fimc_m2m_job_finish
[media] exynos-gsc: send valid m2m ctx to gsc_m2m_job_finish
[media] fimc-lite: Fix the variable type to avoid possible crash
[media] fimc-lite: Initialize 'step' field in fimc_lite_ctrl structure
[media] ir: IR_RX51 only works on OMAP2
Linus Torvalds [Sun, 31 Mar 2013 18:38:59 +0000 (11:38 -0700)]
Merge tag 'for-linus-20130331' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Alright, this time from 10K up in the air.
Collection of fixes that have been queued up since the merge window
opened, hence postponed until later in the cycle. The pull request
contains:
- A bunch of fixes for the xen blk front/back driver.
- A round of fixes for the new IBM RamSan driver, fixing various
nasty issues.
- Fixes for multiple drives from Wei Yongjun, bad handling of return
values and wrong pointer math.
- A fix for loop properly killing partitions when being detached."
* tag 'for-linus-20130331' of git://git.kernel.dk/linux-block: (25 commits)
mg_disk: fix error return code in mg_probe()
rsxx: remove unused variable
rsxx: enable error return of rsxx_eeh_save_issued_dmas()
block: removes dynamic allocation on stack
Block: blk-flush: Fixed indent code style
cciss: fix invalid use of sizeof in cciss_find_cfgtables()
loop: cleanup partitions when detaching loop device
loop: fix error return code in loop_add()
mtip32xx: fix error return code in mtip_pci_probe()
xen-blkfront: remove frame list from blk_shadow
xen-blkfront: pre-allocate pages for requests
xen-blkback: don't store dev_bus_addr
xen-blkfront: switch from llist to list
xen-blkback: fix foreach_grant_safe to handle empty lists
xen-blkfront: replace kmalloc and then memcpy with kmemdup
xen-blkback: fix dispatch_rw_block_io() error path
rsxx: fix missing unlock on error return in rsxx_eeh_remap_dmas()
Adding in EEH support to the IBM FlashSystem 70/80 device driver
block: IBM RamSan 70/80 error message bug fix.
block: IBM RamSan 70/80 branding changes.
...
Commit 6aa9707099c4 ("lockdep: check that no locks held at freeze time")
causes problems with NFS root filesystems. The failures were noticed on
OMAP2 and 3 boards during kernel init:
[ BUG: swapper/0/1 still has locks held! ] 3.9.0-rc3-00344-ga937536 #1 Not tainted
-------------------------------------
1 lock held by swapper/0/1:
#0: (&type->s_umount_key#13/1){+.+.+.}, at: [<c011e84c>] sget+0x248/0x574
Pull SCSI target fixes from Nicholas Bellinger:
"This includes the bug-fix for a >= v3.8-rc1 regression specific to
iscsi-target persistent reservation conflict handling (CC'ed to
stable), and a tcm_vhost patch to drop VIRTIO_RING_F_EVENT_IDX usage
so that in-flight qemu vhost-scsi-pci device code can detect the
proper vhost feature bits.
Also, there are two more tcm_vhost patches still being discussed by
MST and Asias for v3.9 that will be required for the in-flight qemu
vhost-scsi-pci device patch to function properly, and that should
(hopefully) be the last target fixes for this round."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case
tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit
Andy Shevchenko [Wed, 20 Feb 2013 11:52:17 +0000 (13:52 +0200)]
dw_dmac: adjust slave_id accordingly to request line base
On some hardware configurations we have got the request line with the offset.
The patch introduces convert_slave_id() helper for that cases. The request line
base is came from the driver data provided by the platform_device_id table.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Arnd Bergmann [Sun, 3 Mar 2013 20:51:28 +0000 (20:51 +0000)]
dmaengine: dw_dma: fix endianess for DT xlate function
As reported by Wu Fengguang's build robot tracking sparse warnings, the
dma_spec arguments in the dw_dma_xlate are already byte swapped on
little-endian platforms and must not get swapped again. This code is
currently not used anywhere, but will be used in Linux 3.10 when the
ARM SPEAr platform starts using the generic DMA DT binding.
The Adam Belay's e-mail address in MAINTAINERS under PNP SUPPORT
is not valid any more and I started to maintain that code in the
meantime as a matter of fact, so list myself as a maintainer of it
along with Bjorn and remove the Adam's entry from it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Elder [Wed, 27 Mar 2013 14:16:30 +0000 (09:16 -0500)]
rbd: don't zero-fill non-image object requests
A result of ENOENT from a read request for an object that's part of
an rbd image indicates that there is a hole in that portion of the
image. Similarly, a short read for such an object indicates that
the remainder of the read should be interpreted a full read with
zeros filling out the end of the request.
This behavior is not correct for objects that are not backing rbd
image data. Currently rbd_img_obj_request_callback() assumes it
should be done for all objects.
Change rbd_img_obj_request_callback() so it only does this zeroing
for image objects. Encapsulate that special handling in its own
function. Add an assertion that the image object request is a bio
request, since we assume that (and we currently don't support any
other types).
This resolves a problem identified here:
http://tracker.ceph.com/issues/4559
Linus Torvalds [Fri, 29 Mar 2013 18:13:25 +0000 (11:13 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"We've had a busy two weeks of bug fixing. The biggest patches in here
are some long standing early-enospc problems (Josef) and a very old
race where compression and mmap combine forces to lose writes (me).
I'm fairly sure the mmap bug goes all the way back to the introduction
of the compression code, which is proof that fsx doesn't trigger every
possible mmap corner after all.
I'm sure you'll notice one of these is from this morning, it's a small
and isolated use-after-free fix in our scrub error reporting. I
double checked it here."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: don't drop path when printing out tree errors in scrub
Btrfs: fix wrong return value of btrfs_lookup_csum()
Btrfs: fix wrong reservation of csums
Btrfs: fix double free in the btrfs_qgroup_account_ref()
Btrfs: limit the global reserve to 512mb
Btrfs: hold the ordered operations mutex when waiting on ordered extents
Btrfs: fix space accounting for unlink and rename
Btrfs: fix space leak when we fail to reserve metadata space
Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
Btrfs: fix race between mmap writes and compression
Btrfs: fix memory leak in btrfs_create_tree()
Btrfs: fix locking on ROOT_REPLACE operations in tree mod log
Btrfs: fix missing qgroup reservation before fallocating
Btrfs: handle a bogus chunk tree nicely
Btrfs: update to use fs_state bit
Len Brown [Fri, 29 Mar 2013 18:02:30 +0000 (11:02 -0700)]
ia64 idle: delete stale (*idle)() function pointer
Commit 3e7fc708eb41 ("ia64 idle: delete pm_idle") in 3.9-rc1 didn't
finish the job, leaving an un-initialized reference to (*idle)().
[ Haven't seen a crash from this - but seems like we are just being
lucky that "idle" is zero so it does get initialized before we jump to
randomland - Len ]
Reported-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 29 Mar 2013 18:00:43 +0000 (11:00 -0700)]
Merge branch 'for-curr' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull arc architecture fixes from Vineet Gupta:
"This includes fix for a serious bug in DMA mapping API, make
allyesconfig wreckage, removal of bogus email-list placeholder in
MAINTAINERS, a typo in ptrace helper code and last remaining changes
for syscall ABI v3 which we are finally starting to transition-to
internally.
The request is late than I intended to - but I was held up with
debugging a timer link list corruption, for which a proposed fix to
generic timer code was sent out to lkml/tglx earlier today."
* 'for-curr' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Fix the typo in event identifier flags used by ptrace
arc: fix dma_address assignment during dma_map_sg()
ARC: Remove SET_PERSONALITY (tracks cross-arch change)
ARC: ABIv3: fork/vfork wrappers not needed in "no-legacy-syscall" ABI
ARC: ABIv3: Print the correct ABI ver
ARC: make allyesconfig build breakages
ARC: MAINTAINERS update for ARC
Josef Bacik [Fri, 29 Mar 2013 14:09:34 +0000 (08:09 -0600)]
Btrfs: don't drop path when printing out tree errors in scrub
A user reported a panic where we were panicing somewhere in
tree_backref_for_extent from scrub_print_warning. He only captured the trace
but looking at scrub_print_warning we drop the path right before we mess with
the extent buffer to print out a bunch of stuff, which isn't right. So fix this
by dropping the path after we use the eb if we need to. Thanks,
Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case
This patch fixes a regression introduced in v3.8-rc1 code where a failed
target_check_reservation() check in target_setup_cmd_from_cdb() was causing
an incorrect SAM_STAT_GOOD status to be returned during a WRITE operation
performed by an unregistered / unreserved iscsi initiator port.
This regression is only effecting iscsi-target due to a special case check
for TCM_RESERVATION_CONFLICT within iscsi_target_erl1.c:iscsit_execute_cmd(),
and was still correctly disallowing WRITE commands from backend submission
for unregistered / unreserved initiator ports, while returning the incorrect
SAM_STAT_GOOD status due to the missing SAM_STAT_RESERVATION_CONFLICT
assignment.
Go ahead and re-add the missing SAM_STAT_RESERVATION_CONFLICT assignment
during a target_check_reservation() failure, so that iscsi-target code
sends the correct SCSI status.
All other fabrics using target_submit_cmd_*() with a RESERVATION_CONFLICT
call to transport_generic_request_failure() are not effected by this bug.
Reported-by: Jeff Leung <jleung@curriegrad2004.ca> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit
This patch adds a VHOST_SCSI_FEATURES mask minus VIRTIO_RING_F_EVENT_IDX
so that vhost-scsi-pci userspace will strip this feature bit once
GET_FEATURES reports it as being unsupported on the host.
This is to avoid a bug where ->handle_kicks() are missed when EVENT_IDX
is enabled by default in userspace code.
(mst: Rename to VHOST_SCSI_FEATURES + add comment)
Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Asias He <asias@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs"
This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").
VM_POPULATE only has any effect when userspace plays racy games with
vmas by trying to unmap and remap memory regions that mmap or mlock are
operating on.
Also, the only effect of VM_POPULATE when userspace plays such games is
that it avoids populating new memory regions that get remapped into the
address range that was being operated on by the original mmap or mlock
calls.
Let's remove VM_POPULATE as there isn't any strong argument to mandate a
new vm_flag.
Linus Torvalds [Thu, 28 Mar 2013 22:54:25 +0000 (15:54 -0700)]
Merge tag 'usb-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are some USB fixes to resolve issues reported recently, as well
as a new device id for the ftdi_sio driver."
* tag 'usb-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: ftdi_sio: Add support for Mitsubishi FX-USB-AW/-BD
usb: Fix compile error by selecting USB_OTG_UTILS
USB: serial: fix hang when opening port
USB: EHCI: fix bug in iTD/siTD DMA pool allocation
xhci: Don't warn on empty ring for suspended devices.
usb: xhci: Fix TRB transfer length macro used for Event TRB.
usb/acpi: binding xhci root hub usb port with ACPI
usb: add find_raw_port_number callback to struct hc_driver()
usb: xhci: fix build warning
Linus Torvalds [Thu, 28 Mar 2013 22:53:33 +0000 (15:53 -0700)]
Merge tag 'tty-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY/serial fixes from Greg Kroah-Hartman:
"Here are some tty/serial driver fixes for 3.9.
The big thing here is the fix for the huge mess we caused renaming the
8250 driver accidentally in the 3.7 kernel release, without realizing
that there were users of the module options that suddenly broke. This
is now resolved, and, to top the injury off, we have a backwards-
compatible option for those users who got used to the new name since
3.7. Ugh, sorry about that.
Other than that, some other minor fixes for issues that have been
reported by users."
* tag 'tty-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Xilinx: ARM: UART: clear pending irqs before enabling irqs
TTY: 8250, deprecated 8250_core.* options
TTY: 8250, revert module name change
serial: 8250_pci: Add WCH CH352 quirk to avoid Xscale detection
tty: atmel_serial_probe(): index of atmel_ports[] fix
Linus Torvalds [Thu, 28 Mar 2013 22:52:14 +0000 (15:52 -0700)]
Merge tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull sysfs fixes from Greg Kroah-Hartman:
"Here are two fixes for sysfs that resolve issues that have been found
by the Trinity fuzz tool, causing oopses in sysfs. They both have
been in linux-next for a while to ensure that they do not cause any
other problems."
* tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: handle failure path correctly for readdir()
sysfs: fix race between readdir and lseek
Linus Torvalds [Thu, 28 Mar 2013 22:51:33 +0000 (15:51 -0700)]
Merge tag 'char-misc-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg Kroah-Hartman:
"Here are some small char/misc driver fixes that resolve issues
recently reported against the 3.9-rc kernels. All have been in
linux-next for a while."
* tag 'char-misc-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
VMCI: Fix process-to-process DRGAMs.
mei: ME hardware reset needs to be synchronized
mei: add mei_stop function to stop mei device
extcon: max77693: Initialize register of MUIC device to bring up it without platform data
extcon: max77693: Fix bug of wrong pointer when platform data is not used
extcon: max8997: Check the pointer of platform data to protect null pointer error
Linus Torvalds [Thu, 28 Mar 2013 20:47:31 +0000 (13:47 -0700)]
Merge tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael J Wysocki:
- Fix for a recent cpufreq regression related to acpi-cpufreq and
suspend/resume from Viresh Kumar.
- cpufreq stats reference counting fix from Viresh Kumar.
- intel_pstate driver fixes from Dirk Brandewie and Konrad Rzeszutek
Wilk.
- New ACPI suspend blacklist entry for Sony Vaio VGN-FW21M from Fabio
Valentini.
- ACPI Platform Error Interface (APEI) fix from Chen Gong.
- PCI root bridge hotplug locking fix from Yinghai Lu.
* tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / ACPI: hold acpi_scan_lock during root bus hotplug
ACPI / APEI: fix error status check condition for CPER
ACPI / PM: fix suspend and resume on Sony Vaio VGN-FW21M
cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()
cpufreq: stats: do cpufreq_cpu_put() corresponding to cpufreq_cpu_get()
intel-pstate: Use #defines instead of hard-coded values.
cpufreq / intel_pstate: Fix calculation of current frequency
cpufreq / intel_pstate: Add function to check that all MSRs are valid
Pull crypto fixes from Herbert Xu:
"This removes IPsec ESN support from the talitos/caam drivers since
they were implemented incorrectly, causing interoperability problems
if ESN is used with them."
Linus Torvalds [Thu, 28 Mar 2013 20:44:40 +0000 (13:44 -0700)]
Merge tag 'fbdev-fixes-3.9-rc4' of git://gitorious.org/linux-omap-dss2/linux
Pull fbdev fixes from Tomi Valkeinen:
"Since Florian is still away/inactive, I volunteered to collect fbdev
fixes for 3.9 and changes for 3.10. I didn't receive any other fbdev
fixes than OMAP yet, but I didn't want to delay this further as
there's a compilation fix for OMAP1. So there could be still some
fbdev fixes on the way a bit later.
This contains:
- Fix OMAP1 compilation
- OMAP display fixes"
* tag 'fbdev-fixes-3.9-rc4' of git://gitorious.org/linux-omap-dss2/linux:
omapdss: features: fix supported outputs for OMAP4
OMAPDSS: tpo-td043 panel: fix data passing between SPI/DSS parts
omapfb: fix broken build on OMAP1
Linus Torvalds [Thu, 28 Mar 2013 20:43:46 +0000 (13:43 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fixes from Eric W Biederman:
"The bulk of the changes are fixing the worst consequences of the user
namespace design oversight in not considering what happens when one
namespace starts off as a clone of another namespace, as happens with
the mount namespace.
The rest of the changes are just plain bug fixes.
Many thanks to Andy Lutomirski for pointing out many of these issues."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Restrict when proc and sysfs can be mounted
ipc: Restrict mounting the mqueue filesystem
vfs: Carefully propogate mounts across user namespaces
vfs: Add a mount flag to lock read only bind mounts
userns: Don't allow creation if the user is chrooted
yama: Better permission check for ptraceme
pid: Handle the exit of a multi-threaded init.
scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
usb: ftdi_sio: Add support for Mitsubishi FX-USB-AW/-BD
It enhances the driver for FTDI-based USB serial adapters
to recognize Mitsubishi Electric Corp. USB/RS422 Converters
as FT232BM chips and support them.
https://search.meau.com/?q=FX-USB-AW
Signed-off-by: Konstantin Holoborodko <klh.kernel@gmail.com> Tested-by: Konstantin Holoborodko <klh.kernel@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miao Xie [Thu, 28 Mar 2013 08:12:15 +0000 (08:12 +0000)]
Btrfs: fix wrong return value of btrfs_lookup_csum()
If we don't find the expected csum item, but find a csum item which is
adjacent to the specified extent, we should return -EFBIG, or we should
return -ENOENT. But btrfs_lookup_csum() return -EFBIG even the csum item
is not adjacent to the specified extent. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Miao Xie [Thu, 28 Mar 2013 08:08:20 +0000 (08:08 +0000)]
Btrfs: fix wrong reservation of csums
We reserve the space for csums only when we write data into a file, in
the other cases, such as tree log, log replay, we don't do reservation,
so we can use the reservation of the transaction handle just for the former.
And for the latter, we should use the tree's own reservation. But the
function - btrfs_csum_file_blocks() didn't differentiate between these
two types of the cases, fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Wang Shilong [Mon, 25 Mar 2013 11:08:23 +0000 (11:08 +0000)]
Btrfs: fix double free in the btrfs_qgroup_account_ref()
The function btrfs_find_all_roots is responsible to allocate
memory for 'roots' and free it if errors happen,so the caller should not
free it again since the work has been done.
Besides,'tmp' is allocated after the function btrfs_find_all_roots,
so we can return directly if btrfs_find_all_roots() fails.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Tue, 26 Mar 2013 19:31:45 +0000 (15:31 -0400)]
Btrfs: limit the global reserve to 512mb
A user reported a problem where he was getting early ENOSPC with hundreds of
gigs of free data space and 6 gigs of free metadata space. This is because the
global block reserve was taking up the entire free metadata space. This is
ridiculous, we have infrastructure in place to throttle if we start using too
much of the global reserve, so instead of letting it get this huge just limit it
to 512mb so that users can still get work done. This allowed the user to
complete his rsync without issues. Thanks
Cc: stable@vger.kernel.org Reported-and-tested-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Tue, 26 Mar 2013 19:29:11 +0000 (15:29 -0400)]
Btrfs: hold the ordered operations mutex when waiting on ordered extents
We need to hold the ordered_operations mutex while waiting on ordered extents
since we splice and run the ordered extents list. We need to make sure anybody
else who wants to wait on ordered extents does actually wait for them to be
completed. This will keep us from bailing out of flushing in case somebody is
already waiting on ordered extents to complete. Thanks,
Josef Bacik [Tue, 26 Mar 2013 19:26:55 +0000 (15:26 -0400)]
Btrfs: fix space accounting for unlink and rename
We are way over-reserving for unlink and rename. Rename is just some random
huge number and unlink accounts for tree log operations that don't actually
happen during unlink, not to mention the tree log doesn't take from the trans
block rsv anyway so it's completely useless. Thanks,
Josef Bacik [Mon, 25 Mar 2013 20:03:35 +0000 (16:03 -0400)]
Btrfs: fix space leak when we fail to reserve metadata space
Dave reported a warning when running xfstest 275. We have been leaking delalloc
metadata space when our reservations fail. This is because we were improperly
calculating how much space to free for our checksum reservations. The problem
is we would sometimes free up space that had already been freed in another
thread and we would end up with negative usage for the delalloc space. This
patch fixes the problem by calculating how much space the other threads would
have already freed, and then calculate how much space we need to free had we not
done the reservation at all, and then freeing any excess space. This makes
xfstests 275 no longer have leaked space. Thanks
Cc: stable@vger.kernel.org Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Jan Schmidt [Thu, 21 Mar 2013 14:30:23 +0000 (14:30 +0000)]
Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
When you take a snapshot, punch a hole where there has been data, then take
another snapshot and try to send an incremental stream, btrfs send would
give you EIO. That is because is_extent_unchanged had no support for holes
being punched. With this patch, instead of returning EIO we just return
0 (== the extent is not unchanged) and we're good.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Cc: Alexander Block <ablock84@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* acpi-fixes:
PCI / ACPI: hold acpi_scan_lock during root bus hotplug
ACPI / APEI: fix error status check condition for CPER
ACPI / PM: fix suspend and resume on Sony Vaio VGN-FW21M
* pm-fixes:
cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()
cpufreq: stats: do cpufreq_cpu_put() corresponding to cpufreq_cpu_get()
intel-pstate: Use #defines instead of hard-coded values.
cpufreq / intel_pstate: Fix calculation of current frequency
cpufreq / intel_pstate: Add function to check that all MSRs are valid
Linus Torvalds [Wed, 27 Mar 2013 22:50:24 +0000 (15:50 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes
Pull powerpc build fixes from Stephen Rothwell:
"Just a couple of build fixes for powerpc all{mod,yes}config.
Submitted by me since BenH is on vacation."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes:
powerpc: define the conditions where the ePAPR idle hcall can be supported
powerpc: make additional room in exception vector area
Linus Torvalds [Wed, 27 Mar 2013 19:56:25 +0000 (12:56 -0700)]
Merge tag 'stable/for-linus-3.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
"This is mostly just the last stragglers of the regression bugs that
this merge window had. There are also two bug-fixes: one that adds an
extra layer of security, and a regression fix for a change that was
added in v3.7 (the v1 was faulty, the v2 works).
- Regression fixes for C-and-P states not being parsed properly.
- Fix possible security issue with guests triggering DoS via
non-assigned MSI-Xs.
- Fix regression (introduced in v3.7) with raising an event (v2).
- Fix hastily introduced band-aid during c0 for the CR3 blowup."
* tag 'stable/for-linus-3.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/events: avoid race with raising an event in unmask_evtchn()
xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup.
xen/acpi-stub: Disable it b/c the acpi_processor_add is no longer called.
xen-pciback: notify hypervisor about devices intended to be assigned to guests
xen/acpi-processor: Don't dereference struct acpi_processor on all CPUs.
Linus Torvalds [Wed, 27 Mar 2013 18:18:43 +0000 (11:18 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- fix for potential 3.9 regression in handling of buttons for touchpads
following HID mt specification; potential because reportedly there is
no retail product on the market that would be using this feature, but
nevertheless we'd better follow the spec. Fix by Benjamin Tissoires.
- support for two quirky devices added by Josh Boyer.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: multitouch: fix touchpad buttons
HID: usbhid: fix build problem
HID: usbhid: quirk for MSI GX680R led panel
HID: usbhid: quirk for Realtek Multi-card reader
Linus Torvalds [Wed, 27 Mar 2013 16:25:11 +0000 (09:25 -0700)]
Merge tag 'iommu-fixes-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Here are some fixes which have collected since Linux v3.9-rc1.
The most important one fixes a long-standing regressen which make
re-hotplugged devices unusable when AMD IOMMU is used.
The other patches fix build issues (build regression on OMAP and a
section mismatch). One patch just removes a duplicate header include."
* tag 'iommu-fixes-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Make sure dma_ops are set for hotplug devices
x86, io_apic: remove duplicated include from irq_remapping.c
iommu: OMAP: build only on OMAP2+
amd_iommu_init: remove __init from amd_iommu_erratum_746_workaround
Al Viro [Wed, 27 Mar 2013 15:20:30 +0000 (15:20 +0000)]
vfs/splice: Fix missed checks in new __kernel_write() helper
Commit 06ae43f34bcc ("Don't bother with redoing rw_verify_area() from
default_file_splice_from()") lost the checks to test existence of the
write/aio_write methods. My apologies ;-/
Eventually, we want that in fs/splice.c side of things (no point
repeating it for every buffer, after all), but for now this is the
obvious minimal fix.
Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Vrabel [Mon, 25 Mar 2013 14:11:19 +0000 (14:11 +0000)]
xen/events: avoid race with raising an event in unmask_evtchn()
In unmask_evtchn(), when the mask bit is cleared after testing for
pending and the event becomes pending between the test and clear, then
the upcall will not become pending and the event may be lost or
delayed.
Avoid this by always clearing the mask bit before checking for
pending. If a hypercall is needed, remask the event as
EVTCHNOP_unmask will only retrigger pending events if they were
masked.
This fixes a regression introduced in 3.7 by b5e579232d635b79a3da052964cb357ccda8d9ea (xen/events: fix
unmask_evtchn for PV on HVM guests) which reordered the clear mask and
check pending operations.
Changes in v2:
- set mask before hypercall.
Cc: stable@vger.kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup.
We move the setting of write_cr3 from the early bootup variant
(see git commit 0cc9129d75ef8993702d97ab0e49542c15ac6ab9
"x86-64, xen, mmu: Provide an early version of write_cr3.")
to a more appropiate location.
This new location sets all of the other non-early variants
of pvops calls - and most importantly is before the
alternative_asm mechanism kicks in.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/acpi-stub: Disable it b/c the acpi_processor_add is no longer called.
With the Xen ACPI stub code (CONFIG_XEN_STUB=y) enabled, the power
C and P states are no longer uploaded to the hypervisor.
The reason is that the Xen CPU hotplug code: xen-acpi-cpuhotplug.c
and the xen-acpi-stub.c register themselves as the "processor" type object.
That means the generic processor (processor_driver.c) stops
working and it does not call (acpi_processor_add) which populates the
per_cpu(processors, pr->id) = pr;
structure. The 'pr' is gathered from the acpi_processor_get_info function
which does the job of finding the C-states and figuring out PBLK address.
The 'processors->pr' is then later used by xen-acpi-processor.c (the one that
uploads C and P states to the hypervisor). Since it is NULL, we end
skip the gathering of _PSD, _PSS, _PCT, etc and never upload the power
management data.
The end result is that enabling the CONFIG_XEN_STUB in the build means that
xen-acpi-processor is not working anymore.
This temporary patch fixes it by marking the XEN_STUB driver as
BROKEN until this can be properly fixed.
CC: jinsong.liu@intel.com Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Roland Stigge [Tue, 26 Mar 2013 17:36:01 +0000 (18:36 +0100)]
usb: Fix compile error by selecting USB_OTG_UTILS
The current lpc32xx_defconfig breaks like this, caused by recent phy
restructuring:
LD init/built-in.o
drivers/built-in.o: In function `usb_hcd_nxp_probe':
drivers/usb/host/ohci-nxp.c:224: undefined reference to `isp1301_get_client'
drivers/built-in.o: In function `lpc32xx_udc_probe':
drivers/usb/gadget/lpc32xx_udc.c:3104: undefined reference to
`isp1301_get_client' distcc[27867] ERROR: compile (null) on localhost failed
make: *** [vmlinux] Error 1
userns: Restrict when proc and sysfs can be mounted
Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.
proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.
Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.
In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).
Only allow mounting the mqueue filesystem if the caller has CAP_SYS_ADMIN
rights over the ipc namespace. The principle here is if you create
or have capabilities over it you can mount it, otherwise you get to live
with what other people have mounted.
This information is not particularly sensitive and mqueue essentially
only reports which posix messages queues exist. Still when creating a
restricted environment for an application to live any extra
information may be of use to someone with sufficient creativity. The
historical if imperfect way this information has been restricted has
been not to allow mounts and restricting this to ipc namespace
creators maintains the spirit of the historical restriction.
vfs: Carefully propogate mounts across user namespaces
As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.
Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.
When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.
This protects both mount propagation and the initial creation of a less
privileged mount namespace.
Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
vfs: Add a mount flag to lock read only bind mounts
When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.
Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.
userns: Don't allow creation if the user is chrooted
Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.
Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.
For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace. Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.
Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead. With this result that this is not a practical
limitation for using user namespaces.
Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Commit "HID: multitouch: use the callback "report" instead..." breaks the
buttons of touchpads following the HID multitouch specification.
The buttons were emmitted through hid-input, but as now the events
are generated only in hid-multitouch, the buttons are not emmitted anymore.
The input_event() call is far much simpler than the hid-input one as
many of the different tests do not apply to multitouch touchpads.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Joerg Roedel [Tue, 26 Mar 2013 21:48:23 +0000 (22:48 +0100)]
iommu/amd: Make sure dma_ops are set for hotplug devices
There is a bug introduced with commit 27c2127 that causes
devices which are hot unplugged and then hot-replugged to
not have per-device dma_ops set. This causes these devices
to not function correctly. Fixed with this patch.
Cc: stable@vger.kernel.org Reported-by: Andreas Degert <andreas.degert@googlemail.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
Linus Torvalds [Wed, 27 Mar 2013 00:42:55 +0000 (17:42 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"stable fodder; assorted deadlock fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vt: synchronize_rcu() under spinlock is not nice...
Nest rename_lock inside vfsmount_lock
Don't bother with redoing rw_verify_area() from default_file_splice_from()
Al Viro [Wed, 27 Mar 2013 00:30:17 +0000 (20:30 -0400)]
vt: synchronize_rcu() under spinlock is not nice...
vcs_poll_data_free() calls unregister_vt_notifier(), which calls
atomic_notifier_chain_unregister(), which calls synchronize_rcu().
Do it *after* we'd dropped ->f_lock.
Cc: stable@vger.kernel.org (all kernels since 2.6.37) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Chen Gong [Tue, 19 Mar 2013 06:48:07 +0000 (06:48 +0000)]
ACPI / APEI: fix error status check condition for CPER
In Table 18-289, ACPI5.0 SPEC, the error data length in CPER
Generic Error Data Entry can be 0, which means this generic
error data entry can have only one header. So fix the check
conditon for it.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Al Viro [Tue, 26 Mar 2013 22:25:57 +0000 (18:25 -0400)]
Nest rename_lock inside vfsmount_lock
... lest we get livelocks between path_is_under() and d_path() and friends.
The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.
As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.
Spotted-by: Brad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1) Always increment IPV4 ID field in encapsulated GSO packets, even
when DF is set. Regression fix from Pravin B Shelar.
2) Fix per-net subsystem initialization in netfilter conntrack,
otherwise we may access dynamically allocated memory before it is
actually allocated. From Gao Feng.
3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka.
4) Fix race between submission of sync vs async commands in mwifiex
driver, from Amitkumar Karwar.
5) Add missing cancel of command timer in mwifiex driver, from Bing
Zhao.
6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna.
7) Thermal layer tries to use a genetlink multicast string that is
longer than the 16 character limit. Fix it and add a BUG check to
prevent this kind of thing from happening in the future.
From Masatake YAMATO.
8) Fix many bugs in the handling of the teardown of L2TP connections,
UDP encapsulation instances, and sockets. From Tom Parkin.
9) Missing socket release in IRDA, from Kees Cook.
10) Fix fec driver modular build, from Fabio Estevam.
11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop,
from Wei Yongjun.
12) Fix bugs in handling of queue numbers and steering rules in mlx4
driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz.
13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey
Vagin.
14) TCP segmentation deferral is unintentionally done too strongly,
breaking ACK clocking. Fix from Eric Dumazet.
15) net_enable_timestamp() can legitimately be invoked from software
interrupts, and in a way that is safe, so remove the WARN_ON().
Also from Eric Dumazet.
16) Fix use after free in VLANs, from Cong Wang.
17) Fix TCP slow start retransmit storms after SACK reneging, from
Yuchung Cheng.
18) Unix socket release should mark a socket dead before NULL'ing out
sock->sk, otherwise we can race. Fix from Paul Moore.
19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo.
20) Fix register mis-programming, NULL pointer derefs, and wrong PHC
clock frequency in IGB driver. From Lior LevyAlex Williamson, Jiri
Benc, and Jeff Kirsher.
21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet
forwarding. Fix from Veaceslav Falico.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
ipv4: Fix ip-header identification for gso packets.
bonding: remove already created master sysfs link on failure
af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
pch_gbe: fix ip_summed checksum reporting on rx
igb: fix PHC stopping on max freq
igb: make sensor info static
igb: SR-IOV init reordering
igb: Fix null pointer dereference
igb: fix i350 anti spoofing config
ixgbevf: don't release the soft entries
ipv6: fix bad free of addrconf_init_net
unix: fix a race condition in unix_release()
tcp: undo spurious timeout after SACK reneging
bnx2x: fix assignment of signed expression to unsigned variable
bridge: fix crash when set mac address of br interface
8021q: fix a potential use-after-free
net: remove a WARN_ON() in net_enable_timestamp()
tcp: preserve ACK clocking in TSO
net: fix *_DIAG_MAX constants
net/mlx4_core: Disallow releasing VF QPs which have steering rules
...