Eric Sandeen [Wed, 28 Aug 2013 23:05:07 +0000 (19:05 -0400)]
ext4: allow specifying external journal by pathname mount option
It's always been a hassle that if an external journal's
device number changes, the filesystem won't mount.
And since boot-time enumeration can change, device number
changes aren't unusual.
The current mechanism to update the journal location is by
passing in a mount option w/ a new devnum, but that's a hassle;
it's a manual approach, fixing things after the fact.
Adding a mount option, "-o journal_path=/dev/$DEVICE" would
help, since then we can do i.e.
# mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...
and it'll mount even if the devnum has changed, as shown here:
# mount /dev/sdb1 /mnt/test
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
But best of all we can just always mount by journal-path, and
it'll always work:
# mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
#
So the journal_path option can be specified in fstab, and as long as
the disk is available somewhere, and findable by label (or by UUID),
we can mount.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Darrick J. Wong [Wed, 28 Aug 2013 22:46:56 +0000 (18:46 -0400)]
ext4: mark group corrupt on group descriptor checksum
If the group descriptor fails validation, mark the whole blockgroup
corrupt so that the inode/block allocators skip this group. The
previous approach takes the risk of writing to a damaged group
descriptor; hopefully it was never the case that the [ib]bitmap fields
pointed to another valid block and got dirtied, since the memset would
fill the page with 1s.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Wed, 28 Aug 2013 22:32:58 +0000 (18:32 -0400)]
ext4: mark block group as corrupt on inode bitmap error
If we detect either a discrepancy between the inode bitmap and the
inode counts or the inode bitmap fails to pass validation checks, mark
the block group corrupt and refuse to allocate or deallocate inodes
from the group.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Wed, 28 Aug 2013 21:35:51 +0000 (17:35 -0400)]
ext4: mark block group as corrupt on block bitmap error
When we notice a block-bitmap corruption (because of device failure or
something else), we should mark this group as corrupt and prevent
further block allocations/deallocations from it. Currently, we end up
generating one error message for every block in the bitmap. This
potentially could make the system unstable as noticed in some
bugs. With this patch, the error will be printed only the first time
and mark the entire block group as corrupted. This prevents future
access allocations/deallocations from it.
Also tested by corrupting the block
bitmap and forcefully introducing the mb_free_blocks error:
(1) create a largefile (2Gb)
$ dd if=/dev/zero of=largefile oflag=direct bs=10485760 count=200
(2) umount filesystem. use dumpe2fs to see which block-bitmaps
are in use by largefile and note their block numbers
(3) use dd to zero-out the used block bitmaps
$ dd if=/dev/zero of=/dev/hdc4 bs=4096 seek=14 count=8 oflag=direct
(4) mount the FS and delete the largefile.
(5) recreate the largefile. verify that the new largefile does not
get any blocks from the groups marked as bad.
Without the patch, we will see mb_free_blocks error for each bit in
each zero'ed out bitmap at (4). With the patch, we only see the error
once per blockgroup:
[ 309.706803] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 15: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[ 309.720824] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 14: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[ 309.732858] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[ 309.748321] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 13: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[ 309.760331] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[ 309.769695] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 12: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[ 309.781721] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[ 309.798166] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 11: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[ 309.810184] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[ 309.819532] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 10: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[darrick.wong@oracle.com]
Further modifications (by Darrick) to make more obvious that this corruption
bit applies to blocks only. Set the corruption flag if the block group bitmap
verification fails.
Original-author: Aditya Kali <adityakali@google.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Wed, 28 Aug 2013 19:59:51 +0000 (15:59 -0400)]
ext4: fix type declaration of ext4_validate_block_bitmap
The block_group parameter to ext4_validate_block_bitmap is both used
as a ext4_group_t inside the function and the same type is passed in
by all callers. We might as well use the typedef consistently instead
of open-coding the 'unsigned int'.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Wed, 28 Aug 2013 19:35:27 +0000 (15:35 -0400)]
ext4: error out if verifying the block bitmap fails
The block bitmap verification code assumes that calling ext4_error()
either panics the system or makes the fs readonly. However, this is
not always true: when 'errors=continue' is specified, an error is
printed but we don't return any indication of error to the caller,
which is (probably) the block allocator, which pretends that the crud
we read in off the disk is a usable bitmap. Yuck.
A block bitmap that fails the check should at least return no bitmap
to the caller. The block allocator should be told to go look in a
different group, but that's a separate issue.
The easiest way to reproduce this is to modify bg_block_bitmap (on a
^flex_bg fs) to point to a block outside the block group; or you can
create a metadata_csum filesystem and zero out the block bitmaps.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Wed, 28 Aug 2013 18:59:58 +0000 (14:59 -0400)]
jbd2: Fix endian mixing problems in the checksumming code
In the jbd2 checksumming code, explicitly declare separate variables with
endianness information so that we don't get confused and screw things up again.
Also fixes sparse warnings.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Zheng Liu [Wed, 28 Aug 2013 18:47:06 +0000 (14:47 -0400)]
ext4: isolate ext4_extents.h file
After applied the commit (4a092d73), we have reduced the number of
source files that need to #include ext4_extents.h. But we can do
better.
This commit defines ext4_zeroout_es() in extents.c and move
EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
indirect.c and ioctl.c. Meanwhile we just need to include this file in
extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this
commit removes a duplicated declaration in trace/events/ext4.h.
After applied this patch, we just need to include ext4_extents.h file
in {super,migrate,move_extents,extents}.c, and it is easy for us to
define a new extent disk layout.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Sat, 17 Aug 2013 14:09:31 +0000 (10:09 -0400)]
ext4: fix lost truncate due to race with writeback
The following race can lead to a loss of i_disksize update from truncate
thus resulting in a wrong inode size if the inode size isn't updated
again before inode is reclaimed:
ext4_setattr() mpage_map_and_submit_extent()
EXT4_I(inode)->i_disksize = attr->ia_size;
... ...
disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT
/* False because i_size isn't
* updated yet */
if (disksize > i_size_read(inode))
/* True, because i_disksize is
* already truncated */
if (disksize > EXT4_I(inode)->i_disksize)
/* Overwrite i_disksize
* update from truncate */
ext4_update_i_disksize()
i_size_write(inode, attr->ia_size);
For other places updating i_disksize such race cannot happen because
i_mutex prevents these races. Writeback is the only place where we do
not hold i_mutex and we cannot grab it there because of lock ordering.
We fix the race by doing both i_disksize and i_size update in truncate
atomically under i_data_sem and in mpage_map_and_submit_extent() we move
the check against i_size under i_data_sem as well.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
Jan Kara [Sat, 17 Aug 2013 14:07:17 +0000 (10:07 -0400)]
ext4: simplify truncation code in ext4_setattr()
Merge conditions in ext4_setattr() handling inode size changes, also
move ext4_begin_ordered_truncate() call somewhat earlier because it
simplifies error recovery in case of failure. Also add error handling in
case i_disksize update fails.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
Jan Kara [Sat, 17 Aug 2013 14:02:33 +0000 (10:02 -0400)]
ext4: fix ext4_writepages() in presence of truncate
Inode size can arbitrarily change while writeback is in progress. When
ext4_writepages() has prepared a long extent for mapping and truncate
then reduces i_size, mpage_map_and_submit_buffers() will always map just
one buffer in a page instead of all of them due to lblk < blocks check.
So we end up not using all blocks we've allocated (thus leaking them)
and also delalloc accounting goes wrong manifesting as a warning like:
ext4_da_release_space:1333: ext4_da_release_space: ino 12, to_free 1
with only 0 reserved data blocks
Note that the problem can happen only when blocksize < pagesize because
otherwise we have only a single buffer in the page.
Fix the problem by removing the size check from the mapping loop. We
have an extent allocated so we have to use it all before checking for
i_size. We also rename add_page_bufs_to_extent() to
mpage_process_page_bufs() and make that function submit the page for IO
if all buffers (upto EOF) in it are mapped.
Reported-by: Dave Jones <davej@redhat.com> Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
Jan Kara [Sat, 17 Aug 2013 13:57:56 +0000 (09:57 -0400)]
ext4: move test whether extent to map can be extended to one place
Currently the logic whether the current buffer can be added to an extent
of buffers to map is split between mpage_add_bh_to_extent() and
add_page_bufs_to_extent(). Move the whole logic to
mpage_add_bh_to_extent() which makes things a bit more straightforward
and make following i_size fixes easier.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
Jan Kara [Sat, 17 Aug 2013 13:36:54 +0000 (09:36 -0400)]
ext4: fix warning in ext4_da_update_reserve_space()
reaim workfile.dbase test easily triggers warning in
ext4_da_update_reserve_space():
EXT4-fs warning (device ram0): ext4_da_update_reserve_space:365:
ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1
blocks with reserved 9 data blocks)
The problem is that (one of) tests creates file and then randomly writes
to it with O_SYNC. That results in writing back pages of the file in
random order so we create extents for written blocks say 0, 2, 4, 6, 8
- this last allocation also allocates new block for extents. Then we
writeout block 1 so we have extents 0-2, 4, 6, 8 and we release
indirect extent block because extents fit in the inode again. Then we
writeout block 10 and we need to allocate indirect extent block again
which triggers the warning because we don't have the reservation
anymore.
Fix the problem by giving back freed metadata blocks resulting from
extent merging into inode's reservation pool.
Jan Kara [Sat, 17 Aug 2013 13:32:32 +0000 (09:32 -0400)]
quota: provide interface for readding allocated space into reserved space
ext4 needs to convert allocated (metadata) blocks back into blocks
reserved for delayed allocation. Add functions into quota code for
supporting such operation.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sat, 17 Aug 2013 02:06:55 +0000 (22:06 -0400)]
ext4: avoid reusing recently deleted inodes in no journal mode
In no journal mode, if an inode has recently been deleted, we
shouldn't reuse it right away. Otherwise it's possible, after an
unclean shutdown, to hit a situation where a recently deleted inode
gets reused for some other purpose before the inode table block has
been written to disk. However, if the directory entry has been
updated, then the directory entry will be pointing at the old inode
contents.
E2fsck will make sure the file system is consistent after the
unclean shutdown. However, if the recently deleted inode is a
character mode device, or an inode with the immutable bit set, even
after the file system has been fixed up by e2fsck, it can be
possible for a *.pyc file to be pointing at a character mode
device, and when python tries to open the *.pyc file, Hilarity
Ensues. We could change all of userspace to be very suspicious
about stat'ing files before opening them, and clearing the
immutable flag if necessary --- or we can just avoid reusing an
inode number if it has been recently deleted.
Theodore Ts'o [Sat, 17 Aug 2013 02:06:53 +0000 (22:06 -0400)]
ext4: allocate delayed allocation blocks before rename
When ext4_rename() overwrites an already existing file, call
ext4_alloc_da_blocks() before starting the journal handle which
actually does the rename, instead of doing this afterwards. This
improves the likelihood that the contents will survive a crash if an
application replaces a file using the sequence:
1) write replacement contents to foo.new
2) <omit fsync of foo.new>
3) rename foo.new to foo
It is still not a guarantee, since ext4_alloc_da_blocks() is *not*
doing a file integrity sync; this means if foo.new is a very large
file, it may not be completely flushed out to disk.
However, for files smaller than a megabyte or so, any dirty pages
should be flushed out before we do the rename operation, and so at the
next journal commit, the CACHE FLUSH command will make sure al of
these pages are safely on the disk platter.
Theodore Ts'o [Sat, 17 Aug 2013 02:05:14 +0000 (22:05 -0400)]
ext4: add support for extent pre-caching
Add a new fiemap flag which forces the all of the extents in an inode
to be cached in the extent_status tree. This is critically important
when using AIO to a preallocated file, since if we need to read in
blocks from the extent tree, the io_submit(2) system call becomes
synchronous, and the AIO is no longer "A", which is bad.
In addition, for most files which have an external leaf tree block,
the cost of caching the information in the extent status tree will be
less than caching the entire 4k block in the buffer cache. So it is
generally a win to keep the extent information cached.
Theodore Ts'o [Sat, 17 Aug 2013 01:23:41 +0000 (21:23 -0400)]
ext4: cache all of an extent tree's leaf block upon reading
When we read in an extent tree leaf block from disk, arrange to have
all of its entries cached. In nearly all cases the in-memory
representation will be more compact than the on-disk representation in
the buffer cache, and it allows us to get the information without
having to traverse the extent tree for successive extents.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Theodore Ts'o [Sat, 17 Aug 2013 01:22:41 +0000 (21:22 -0400)]
ext4: use unsigned int for es_status values
Don't use an unsigned long long for the es_status flags; this requires
that we pass 64-bit values around which is painful on 32-bit systems.
Instead pass the extent status flags around using the low 4 bits of an
unsigned int, and shift them into place when we are reading or writing
es_pblk.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Theodore Ts'o [Sat, 17 Aug 2013 01:20:41 +0000 (21:20 -0400)]
ext4: refactor code to read the extent tree block
Refactor out the code needed to read the extent tree block into a
single read_extent_tree_block() function. In addition to simplifying
the code, it also makes sure that we call the ext4_ext_load_extent
tracepoint whenever we need to read an extent tree block from disk.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Jan Kara [Sat, 17 Aug 2013 01:19:41 +0000 (21:19 -0400)]
jbd2: Fix oops in jbd2_journal_file_inode()
Commit 0713ed0cde76438d05849f1537d3aab46e099475 added
jbd2_journal_file_inode() call into ext4_block_zero_page_range().
However that function gets called from truncate path and thus inode
needn't have jinode attached - that happens in ext4_file_open() but
the file needn't be ever open since mount. Calling
jbd2_journal_file_inode() without jinode attached results in the oops.
We fix the problem by attaching jinode to inode also in ext4_truncate()
and ext4_punch_hole() when we are going to zero out partial blocks.
Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Mon, 12 Aug 2013 13:53:28 +0000 (09:53 -0400)]
jbd2: Fix use after free after error in jbd2_journal_dirty_metadata()
When jbd2_journal_dirty_metadata() returns error,
__ext4_handle_dirty_metadata() stops the handle. However callers of this
function do not count with that fact and still happily used now freed
handle. This use after free can result in various issues but very likely
we oops soon.
The motivation of adding __ext4_journal_stop() into
__ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to
improve error reporting. So replace __ext4_journal_stop() with
ext4_journal_abort_handle() which was there before that commit and add
WARN_ON_ONCE() to dump stack to provide useful information.
Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # 3.2+
Theodore Ts'o [Mon, 12 Aug 2013 13:29:30 +0000 (09:29 -0400)]
ext4: flush the extent status cache during EXT4_IOC_SWAP_BOOT
Previously we weren't swapping only some of the extent_status LRU
fields during the processing of the EXT4_IOC_SWAP_BOOT ioctl. The
much safer thing to do is to just completely flush the extent status
tree when doing the swap.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com> Cc: stable@vger.kernel.org
Piotr Sarna [Fri, 9 Aug 2013 03:02:24 +0000 (23:02 -0400)]
ext4: fix mount/remount error messages for incompatible mount options
Commit 5688978 ("ext4: improve handling of conflicting mount options")
introduced incorrect messages shown while choosing wrong mount options.
First of all, both cases of incorrect mount options,
"data=journal,delalloc" and "data=journal,dioread_nolock" result in
the same error message.
Secondly, the problem above isn't solved for remount option: the
mismatched parameter is simply ignored. Moreover, ext4_msg states
that remount with options "data=journal,delalloc" succeeded, which is
not true.
To fix it up, I added a simple check after parse_options() call to
ensure that data=journal and delalloc/dioread_nolock parameters are
not present at the same time.
Signed-off-by: Piotr Sarna <p.sarna@partner.samsung.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
Theodore Ts'o [Fri, 9 Aug 2013 03:01:24 +0000 (23:01 -0400)]
ext4: allow the mount options nodelalloc and data=journal
Commit 26092bf ("ext4: use a table-driven handler for mount options")
wrongly disallows the specifying the mount options nodelalloc and
data=journal simultaneously. This is incorrect; it should have only
disallowed the combination of delalloc and data=journal
simultaneously.
Reported-by: Piotr Sarna <p.sarna@partner.samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
Zheng Liu [Mon, 29 Jul 2013 16:51:42 +0000 (12:51 -0400)]
ext4: add WARN_ON to check the length of allocated blocks
In commit 921f266b: ext4: add self-testing infrastructure to do a
sanity check, some sanity checks were added in map_blocks to make sure
'retval == map->m_len'.
Enable these checks by default and report any assertion failures using
ext4_warning() and WARN_ON() since they can help us to figure out some
bugs that are otherwise hard to hit.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4: make sure group number is bumped after a inode allocation race
When we try to allocate an inode, and there is a race between two
CPU's trying to grab the same inode, _and_ this inode is the last free
inode in the block group, make sure the group number is bumped before
we continue searching the rest of the block groups. Otherwise, we end
up searching the current block group twice, and we end up skipping
searching the last block group. So in the unlikely situation where
almost all of the inodes are allocated, it's possible that we will
return ENOSPC even though there might be free inodes in that last
block group.
Merge tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI video support fixes from Rafael Wysocki:
"I'm sending a separate pull request for this as it may be somewhat
controversial. The breakage addressed here is not really new and the
fixes may not satisfy all users of the affected systems, but we've had
so much back and forth dance in this area over the last several weeks
that I think it's time to actually make some progress.
The source of the problem is that about a year ago we started to tell
BIOSes that we're compatible with Windows 8, which we really need to
do, because some systems shipping with Windows 8 are tested with it
and nothing else, so if we tell their BIOSes that we aren't compatible
with Windows 8, we expose our users to untested BIOS/AML code paths.
However, as it turns out, some Windows 8-specific AML code paths are
not tested either, because Windows 8 actually doesn't use the ACPI
methods containing them, so if we declare Windows 8 compatibility and
attempt to use those ACPI methods, things break. That occurs mostly
in the backlight support area where in particular the _BCM and _BQC
methods are plain unusable on some systems if the OS declares Windows
8 compatibility.
[ The additional twist is that they actually become usable if the OS
says it is not compatible with Windows 8, but that may cause
problems to show up elsewhere ]
Investigation carried out by Matthew Garrett indicates that what
Windows 8 does about backlight is to leave backlight control up to
individual graphics drivers. At least there's evidence that it does
that if the Intel graphics driver is used, so we've decided to follow
Windows 8 in that respect and allow i915 to control backlight (Daniel
likes that part).
The first commit from Aaron Lu makes ACPICA export the variable from
which we can infer whether or not the BIOS believes that we are
compatible with Windows 8.
The second commit from Matthew Garrett prepares the ACPI video driver
by making it initialize the ACPI backlight even if it is not going to
be used afterward (that is needed for backlight control to work on
Thinkpads).
The third commit implements the actual workaround making i915 take
over backlight control if the firmware thinks it's dealing with
Windows 8 and is based on the work of multiple developers, including
Matthew Garrett, Chun-Yi Lee, Seth Forshee, and Aaron Lu.
The final commit from Aaron Lu makes us follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled by
GUI.
Hopefully, this approach will allow us to avoid using blacklists of
systems that should not declare Windows 8 compatibility just to avoid
backlight control problems in the future.
- Change from Aaron Lu makes ACPICA export a variable which can be
used by driver code to determine whether or not the BIOS believes
that we are compatible with Windows 8.
- Change from Matthew Garrett makes the ACPI video driver initialize
the ACPI backlight even if it is not going to be used afterward
(that is needed for backlight control to work on Thinkpads).
- Fix from Rafael J Wysocki implements Windows 8 backlight support
workaround making i915 take over bakclight control if the firmware
thinks it's dealing with Windows 8. Based on the work of multiple
developers including Matthew Garrett, Chun-Yi Lee, Seth Forshee,
and Aaron Lu.
- Fix from Aaron Lu makes the kernel follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled
by GUI"
* tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: no automatic brightness changes by win8-compatible firmware
ACPI / video / i915: No ACPI backlight if firmware expects Windows 8
ACPI / video: Always call acpi_video_init_brightness() on init
ACPICA: expose OSI version
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext[34] tmpfile bugfix from Ted Ts'o:
"Fix regression caused by commit af51a2ac36d1f which added ->tmpfile()
support (along with a similar fix for ext3)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext3: fix a BUG when opening a file with O_TMPFILE flag
ext4: fix a BUG when opening a file with O_TMPFILE flag
Zheng Liu [Sun, 21 Jul 2013 02:03:20 +0000 (22:03 -0400)]
ext3: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always(). We can use the following program to trigger it:
Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink. So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk>
Zheng Liu [Sun, 21 Jul 2013 01:58:38 +0000 (21:58 -0400)]
ext4: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always(). We can use the following program to trigger it:
Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink. So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Tested-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Merge tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging tree fixes from Greg KH:
"Here are a few iio driver fixes for 3.11-rc2. They are still spread
across drivers/iio and drivers/staging/iio so they are coming in
through this tree.
I've also removed the drivers/staging/csr/ driver as the developers
who originally sent it to me have moved on to other companies, and CSR
still will not send us the specs for the device, making the driver
pretty much obsolete and impossible to fix up. Deleting it now
prevents people from sending in lots of tiny codingsyle fixes that
will never go anywhere.
It also helps to offset the large lustre filesystem merge that
happened in 3.11-rc1 in the overall 3.11.0 diffstat. :)"
* tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: csr: remove driver
iio: lps331ap: Fix wrong in_pressure_scale output value
iio staging: fix lis3l02dq, read error handling
staging:iio:ad7291: add missing .driver_module to struct iio_info
iio: ti_am335x_adc: add missing .driver_module to struct iio_info
iio: mxs-lradc: Remove useless check in read_raw
iio: mxs-lradc: Fix misuse of iio->trig
iio: inkern: fix iio_convert_raw_to_processed_unlocked
iio: Fix iio_channel_has_info
iio:trigger: device_unregister->device_del to avoid double free
iio: dac: ad7303: fix error return code in ad7303_probe()
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"The sget() one is a long-standing bug and will need to go into -stable
(in fact, it had been originally caught in RHEL6), the other two are
3.11-only"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: constify dentry parameter in d_count()
livelock avoidance in sget()
allow O_TMPFILE to work with O_WRONLY
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"Fixes for 3.11-rc2, sent at 5pm, in the professoinal style. :-)"
I'm not sure I like this new level of "professionalism".
9-5, people, 9-5.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: call ext4_es_lru_add() after handling cache miss
ext4: yield during large unlinks
ext4: make the extent_status code more robust against ENOMEM failures
ext4: simplify calculation of blocks to free on error
ext4: fix error handling in ext4_ext_truncate()
Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix a regression against NFSv4 FreeBSD servers when creating a new
file
- Fix another regression in rpc_client_register()
* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix a regression against the FreeBSD server
SUNRPC: Fix another issue with rpc_client_register()
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next
Pull btrfs fixes from Josef Bacik:
"I'm playing the role of Chris Mason this week while he's on vacation.
There are a few critical fixes for btrfs here, all regressions and
have been tested well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next:
Btrfs: fix wrong write offset when replacing a device
Btrfs: re-add root to dead root list if we stop dropping it
Btrfs: fix lock leak when resuming snapshot deletion
Btrfs: update drop progress before stopping snapshot dropping
Al Viro [Fri, 19 Jul 2013 23:13:55 +0000 (03:13 +0400)]
livelock avoidance in sget()
Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive,
->s_active is 1. Along comes two more processes, trying to mount the same
thing; sget() in each is picking that superblock, bumping ->s_count and
trying to grab ->s_umount. ->s_active is 3 now. Original mount(2)
finally gets to deactivate_locked_super() on failure; ->s_active is 2,
superblock is still ->fs_supers because shutdown will *not* happen until
->s_active hits 0. ->s_umount is dropped and now we have two processes
chasing each other:
s_active = 2, A acquired ->s_umount, B blocked
A sees that the damn thing is stillborn, does deactivate_locked_super()
s_active = 1, A drops ->s_umount, B gets it
A restarts the search and finds the same superblock. And bumps it ->s_active.
s_active = 2, B holds ->s_umount, A blocked on trying to get it
... and we are in the earlier situation with A and B switched places.
The root cause, of course, is that ->s_active should not grow until we'd
got MS_BORN. Then failing ->mount() will have deactivate_locked_super()
shut the damn thing down. Fortunately, it's easy to do - the key point
is that grab_super() is called only for superblocks currently on ->fs_supers,
so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
bump ->s_active; we must never increment ->s_count for superblocks past
->kill_sb(), but grab_super() is never called for those.
The bug is pretty old; we would've caught it by now, if not for accidental
exclusion between sget() for block filesystems; the things like cgroup or
e.g. mtd-based filesystems don't have anything of that sort, so they get
bitten. The right way to deal with that is obviously to fix sget()...
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
"Special thanks goes to Toralf Föster for continuously testing UML and
reporting issues!"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove dead code
um: siginfo cleanup
uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases
um: Fix wait_stub_done() error handling
um: Mark stub pages mapping with VM_PFNMAP
um: Fix return value of strnlen_user()
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 3.11. Half of then is for Netlogic the remainder
touches things across arch/mips.
Nothing really dramatic and by rc1 standards MIPS will be in fairly
good shape with this applied. Tested by building all MIPS defconfigs
of which with this pull request four platforms won't build. And yes,
it boots also on my favorite test systems"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: kvm: Kconfig: Drop HAVE_KVM dependency from VIRTUALIZATION
MIPS: Octeon: Fix DT pruning bug with pip ports
MIPS: KVM: Mark KVM_GUEST (T&E KVM) as BROKEN_ON_SMP
MIPS: tlbex: fix broken build in v3.11-rc1
MIPS: Netlogic: Add XLP PIC irqdomain
MIPS: Netlogic: Fix USB block's coherent DMA mask
MIPS: tlbex: Fix typo in r3000 tlb store handler
MIPS: BMIPS: Fix thinko to release slave TP from reset
MIPS: Delete dead invocation of exception_exit().
Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
Pull arm64 fixes from Catalin Marinas:
- Post -rc1 update to the common reboot infrastructure.
- Fixes (user cache maintenance fault handling, !COMPAT compilation,
CPU online and interrupt hanlding).
* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: use common reboot infrastructure
arm64: mm: don't treat user cache maintenance faults as writes
arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler()
arm64: Only enable local interrupts after the CPU is marked online
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"An update for the BFP jit to the latest and greatest, two patches to
get kdump working again, the random-abort ptrace extention for
transactional execution, the z90crypt module alias for ap and a tiny
cleanup"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Alias for new zcrypt device driver base module
s390/kdump: Allow copy_oldmem_page() copy to virtual memory
s390/kdump: Disable mmap for s390
s390/bpf,jit: add pkt_type support
s390/bpf,jit: address randomize and write protect jit code
s390/bpf,jit: use generic jit dumper
s390/bpf,jit: call module_free() from any context
s390/qdio: remove unused variable
s390/ptrace: PTRACE_TE_ABORT_RAND
Stefan Behrens [Thu, 4 Jul 2013 14:14:23 +0000 (16:14 +0200)]
Btrfs: fix wrong write offset when replacing a device
Miao Xie reported the following issue:
The filesystem was corrupted after we did a device replace.
Steps to reproduce:
# mkfs.btrfs -f -m single -d raid10 <device0>..<device3>
# mount <device0> <mnt>
# btrfs replace start -rfB 1 <device4> <mnt>
# umount <mnt>
# btrfsck <device4>
The reason for the issue is that we changed the write offset by mistake,
introduced by commit 625f1c8dc.
We read the data from the source device at first, and then write the
data into the corresponding place of the new device. In order to
implement the "-r" option, the source location is remapped using
btrfs_map_block(). The read takes place on the mapped location, and
the write needs to take place on the unmapped location. Currently
the write is using the mapped location, and this commit changes it
back by undoing the change to the write address that the aforementioned
commit added by mistake.
Reported-by: Miao Xie <miaox@cn.fujitsu.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Wed, 17 Jul 2013 23:30:20 +0000 (19:30 -0400)]
Btrfs: re-add root to dead root list if we stop dropping it
If we stop dropping a root for whatever reason we need to add it back to the
dead root list so that we will re-start the dropping next transaction commit.
The other case this happens is if we recover a drop because we will add a root
without adding it to the fs radix tree, so we can leak it's root and commit root
extent buffer, adding this to the dead root list makes this cleanup happen.
Thanks,
Cc: stable@vger.kernel.org Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Mon, 15 Jul 2013 16:41:42 +0000 (12:41 -0400)]
Btrfs: fix lock leak when resuming snapshot deletion
We aren't setting path->locks[level] when we resume a snapshot deletion which
means we won't unlock the buffer when we free the path. This causes deadlocks
if we happen to re-allocate the block before we've evicted the extent buffer
from cache. Thanks,
Cc: stable@vger.kernel.org Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Josef Bacik [Mon, 15 Jul 2013 15:57:06 +0000 (11:57 -0400)]
Btrfs: update drop progress before stopping snapshot dropping
Alex pointed out a problem and fix that exists in the drop one snapshot at a
time patch. If we decide we need to exit for whatever reason (umount for
example) we will just exit the snapshot dropping without updating the drop
progress. So the next time we go to resume we will BUG_ON() because we can't
find the extent we left off at because we never updated it. This patch fixes
the problem.
Cc: stable@vger.kernel.org Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Paolo Bonzini:
"This single patch fixes a regression caused by one of the
optimizations introduced in 3.11, which is generally visible only on
AMD processors"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: avoid fast page fault fixing mmio page fault
Merge tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes collected over the last week, most importnatly two
cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
fix preventing systems using it from crashing during shutdown and two
ACPI scan fixes related to hotplug.
Specifics:
- Two cpufreq commits from the 3.10 cycle introduced regressions.
The first of them was buggy (it did way much more than it needed to
do) and the second one attempted to fix an issue introduced by the
first one. Fixes from Srivatsa S Bhat revert both.
- If autosleep triggers during system shutdown and the shutdown
callbacks of some device drivers have been called already, it may
crash the system. Fix from Liu Shuo prevents that from happening
by making try_to_suspend() check system_state.
- The ACPI memory hotplug driver doesn't clear its driver_data on
errors which may cause a NULL poiter dereference to happen later.
Fix from Toshi Kani.
- The ACPI namespace scanning code should not try to attach scan
handlers to device objects that have them already, which may
confuse things quite a bit, and it should rescan the whole
namespace branch starting at the given node after receiving a bus
check notify event even if the device at that particular node has
been discovered already. Fixes from Rafael J Wysocki.
- New ACPI video blacklist entry for a system whose initial backlight
setting from the BIOS doesn't make sense. From Lan Tianyu.
- Garbage string output avoindance for ACPI PNP from Liu Shuo.
- Two Kconfig fixes for issues introduced recently in the s3c24xx
cpufreq driver (when moving the driver to drivers/cpufreq) from
Paul Bolle.
- Trivial comment fix in pm_wakeup.h from Chanwoo Choi"
* tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
PNP / ACPI: avoid garbage in resource name
cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression
cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
PM / Sleep: Fix comment typo in pm_wakeup.h
PM / Sleep: avoid 'autosleep' in shutdown progress
cpufreq: Revert commit a66b2e to fix suspend/resume regression
ACPI / memhotplug: Fix a stale pointer in error path
ACPI / scan: Always call acpi_bus_scan() for bus check notifications
ACPI / scan: Do not try to attach scan handlers to devices having them
Marc Zyngier [Thu, 11 Jul 2013 11:13:00 +0000 (12:13 +0100)]
arm64: use common reboot infrastructure
Commit 7b6d864b48d9 (reboot: arm: change reboot_mode to use enum
reboot_mode) changed the way reboot is handled on arm, which has a
direct impact on arm64 as we share the reset driver on the VE platform.
The obvious fix is to move arm64 to use the same infrastructure.
Will Deacon [Fri, 19 Jul 2013 14:37:12 +0000 (15:37 +0100)]
arm64: mm: don't treat user cache maintenance faults as writes
On arm64, cache maintenance faults appear as data aborts with the CM
bit set in the ESR. The WnR bit, usually used to distinguish between
faulting loads and stores, always reads as 1 and (slightly confusingly)
the instructions are treated as reads by the architecture.
This patch fixes our fault handling code to treat cache maintenance
faults in the same way as loads.
Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Chen Gang [Mon, 24 Jun 2013 09:27:49 +0000 (10:27 +0100)]
arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler()
If 'COMPAT' not defined, aarch32_break_handler() cannot pass compiling,
and it can work independent with 'COMPAT', so remove dummy definition.
The related error:
arch/arm64/kernel/debug-monitors.c:249:5: error: redefinition of ‘aarch32_break_handler’
In file included from arch/arm64/kernel/debug-monitors.c:29:0:
/root/linux-next/arch/arm64/include/asm/debug-monitors.h:89:12: note: previous definition of ‘aarch32_break_handler’ was here
Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64: Only enable local interrupts after the CPU is marked online
There is a slight chance that (timer) interrupts are triggered before a
secondary CPU has been marked online with implications on softirq thread
affinity.
Markos Chandras [Tue, 11 Jun 2013 09:02:33 +0000 (09:02 +0000)]
MIPS: kvm: Kconfig: Drop HAVE_KVM dependency from VIRTUALIZATION
Virtualization does not always need KVM capabilities so drop the
dependency. The KVM symbol already depends on HAVE_KVM.
Fixes the following problem on a randconfig:
warning: (REMOTEPROC && RPMSG) selects VIRTUALIZATION which has unmet direct
dependencies (HAVE_KVM)
warning: (REMOTEPROC && RPMSG) selects VIRTUALIZATION which has unmet
direct dependencies (HAVE_KVM)
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Acked-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5443/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Currently we use both struct siginfo and siginfo_t.
Let's use struct siginfo internally to avoid ongoing
compiler warning. We are allowed to do so because
struct siginfo and siginfo_t are equivalent.
Signed-off-by: Richard Weinberger <richard@nod.at>
During the pruning of the device tree octeon_fdt_pip_iface() is called
for each PIP interface and every port up to the port count is removed
from the device tree. However, the count was set to the return value of
cvmx_helper_interface_enumerate() which doesn't actually return the
count but just returns zero on success. This effectively removed *all*
ports from the tree.
Use cvmx_helper_ports_on_interface() instead to fix this. This
successfully restores the 3 ports of my ERLite-3 and fixes the "kernel
assigns random MAC addresses" issue.
Signed-off-by: Faidon Liambotis <paravoid@debian.org> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5587/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases
which_tmpdir did the wrong thing if /dev/shm was a symlink (e.g., to /run/shm),
if there were multiple mounts on top of each other, if the mount(s) were
obscured by a later mount, or if /dev/shm was a prefix of another mount point.
This fixes these cases. Applies to 3.9.6.
Signed-off-by: Tristan Schmelcher <tschmelcher@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
James Hogan [Fri, 12 Jul 2013 10:26:11 +0000 (10:26 +0000)]
MIPS: KVM: Mark KVM_GUEST (T&E KVM) as BROKEN_ON_SMP
Make KVM_GUEST depend on BROKEN_ON_SMP so that it cannot be enabled with
SMP.
SMP kernels use ll/sc instructions for an atomic section in the tlb fill
handler, with a tlbp instruction contained in the middle. This cannot be
emulated with trap & emulate KVM because the tlbp instruction traps and
the eret to return to the guest code clears the LLbit which makes the sc
instruction always fail.
Aaro Koskinen [Mon, 15 Jul 2013 07:21:57 +0000 (07:21 +0000)]
MIPS: tlbex: fix broken build in v3.11-rc1
Commit 6ba045f9fbdafb48da42aa8576ea7a3980443136 (MIPS: Move generated code
to .text for microMIPS) deleted tlbmiss_handler_setup_pgd_array, but some
references were not converted. Fix that to enable building a MIPS kernel.
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Jayachandran C. <jchandra@broadcom.com> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5589/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Jayachandran C [Wed, 17 Jul 2013 10:27:26 +0000 (10:27 +0000)]
MIPS: Netlogic: Add XLP PIC irqdomain
Add a legacy irq domain for the XLP PIC interrupts. This will be used
when interrupts are assigned from the device tree. This change is required
after commit c5cdc67 "irqdomain: Remove temporary MIPS workaround code".
Signed-off-by: Jayachandran C <jchandra@broadcom.com> Cc: linux-mips@linux-mips.org Cc: Jayachandran C <jchandra@broadcom.com>
Patchwork: https://patchwork.linux-mips.org/patch/5597/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The on-chip USB controller on Netlogic XLP does not suppport
DMA beyond 32-bit physical address. Set the coherent_dma_mask
of the USB in its PCI fixup to support this.
Tony Wu [Thu, 18 Jul 2013 09:45:47 +0000 (09:45 +0000)]
MIPS: tlbex: Fix typo in r3000 tlb store handler
commit 6ba045f (MIPS: Move generated code to .text for microMIPS)
causes a panic at boot. The handler builder should test against
handle_tlbs_end, not handle_tlbs.
Signed-off-by: Tony Wu <tung7970@gmail.com> Acked-by: Jayachandran C. <jchandra@broadcom.com> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5600/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
MIPS: BMIPS: Fix thinko to release slave TP from reset
Commit 4df715aa ["MIPS: BMIPS: support booting from physical CPU other
than 0"] introduced a thinko which will prevents slave CPUs from being
released from reset on systems where we boot from TP0. The problem is
that we are checking whether the slave CPU logical CPU map is 0, which
is never true for systems booting from TP0, so we do not release the
slave TP from reset and we are just stuck. Fix this by properly checking
that the CPU we intend to boot really is the physical slave CPU (logical
and physical value being 1).
s390/zcrypt: Alias for new zcrypt device driver base module
The zcrypt device driver has been split into base/bus module, api-module,
card modules and message type modules. The base module has been renamed
from z90crypt to ap.
A module alias (with the well-known z90crypt identifier) will be introduced
that enable users to use their existing way to load the zcrypt device driver.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull networking fixes from David Miller:
"A couple interesting SKB fragment handling fixes, plus the usual small
bits here and there:
1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
Tim Gardner.
2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
address printing helper.
3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
device can't do checksumming offloads then it shouldn't say it can
do SG either. From Haiyang Zhang.
4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.
5) Don't leak DMA mappings on mapping failures, from Neil Horman.
6) We need to reset the transport header of SKBs in ipv4 before we
attempt to perform early socket demux, just like ipv6 does. From
Eric Dumazet.
7) Add missing locking on vxlan device removal, from Stephen
Hemminger.
8) xen-netfront has to make two passes over an SKB to prepare it for
transfer. One pass calculates the number of slots needed, the
second massages the SKB and fills the slots. Unfortunately, the
first pass doesn't calculate the number of slots properly so we
can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
work out so well. Fix from Jan Beulich with help and discussion
with several others.
9) Fix a similar problem in tun and macvtap, which have to split up
scatter-gather elements at PAGE_SIZE boundaries. Don't do
zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from
Jason Wang.
10) On receive, once we've decoded the VLAN state completely, clear
skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger
the VLAN code again, corrupting the packet. Fix from Eric
Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
vlan: fix a race in egress prio management
vlan: mask vlan prio bits
macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
pkt_sched: sch_qfq: remove a source of high packet delay/jitter
xen-netfront: pull on receive skb may need to happen earlier
vxlan: add necessary locking on device removal
hyperv: Fix the NETIF_F_SG flag setting in netvsc
net: Fix sysfs_format_mac() code duplication.
be2net: Fix to avoid hardware workaround when not needed
macvtap: do not assume 802.1Q when send vlan packets
macvtap: fix the missing ret value of TUNSETQUEUE
ipv4: set transport header earlier
mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
bgmac: add dependency to phylib
net/irda: fixed style issues in irlan_eth
ethtool: fixed trailing statements in ethtool
ndisc: bool initializations should use true and false
atl1e: unmap partially mapped skb on dma error and free skb
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Trying again to get the fixes queue, including the fixed IDT alignment
patch.
The UEFI patch is by far the biggest issue at hand: it is currently
causing quite a few machines to boot. Which is sad, because the only
reason they would is because their BIOSes touch memory that has
already been freed. The other major issue is that we finally have
tracked down the root cause of a significant number of machines
failing to suspend/resume"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Make sure IDT is page aligned
x86, suspend: Handle CPUs which fail to #GP on RDMSR
x86/platform/ce4100: Add header file for reboot type
Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"
efivars: check for EFI_RUNTIME_SERVICES
Merge tag 'md-3.11-fixes' of git://neil.brown.name/md
Pull md bug fixes from NeilBrown:
"Sorry boss, back at work now boss. Here's them nice shiny patches ya
wanted. All nicely tagged and justified for -stable and everyfing:
Three bug fixes for md in 3.10
3.10 wasn't a good release for md. The bio changes left a couple of
bugs, and an md "fix" created another one.
These three patches appear to fix the issues and have been tagged for
-stable"
* tag 'md-3.11-fixes' of git://neil.brown.name/md:
md/raid1: fix bio handling problems in process_checks()
md: Remove recent change which allows devices to skip recovery.
md/raid10: fix two problems with RAID10 resync.
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"You'll be terribly disappointed in this, I'm not trying to sneak any
features in or anything, its mostly radeon and intel fixes, a couple
of ARM driver fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (34 commits)
drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)
drm/radeon/dpm/atom: fix broken gcc harder
drm/radeon/dpm/atom: restructure logic to work around a compiler bug
drm/radeon/dpm: fix atom vram table parsing
drm/radeon: fix an endian bug in atom table parsing
drm/radeon: add a module parameter to disable aspm
drm/rcar-du: Use the GEM PRIME helpers
drm/shmobile: Use the GEM PRIME helpers
uvesafb: Really allow mtrr being 0, as documented and warn()ed
radeon kms: do not flush uninitialized hotplug work
drm/radeon/dpm/sumo: handle boost states properly when forcing a perf level
drm/radeon: align VM PTBs (Page Table Blocks) to 32K
drm/radeon: allow selection of alignment in the sub-allocator
drm/radeon: never unpin UVD bo v3
drm/radeon: fix UVD fence emit
drm/radeon: add fault decode function for CIK
drm/radeon: add fault decode function for SI (v2)
drm/radeon: add fault decode function for cayman/TN (v2)
drm/radeon: use radeon device for request firmware
drm/radeon: add missing ttm_eu_backoff_reservation to radeon_bo_list_validate
...
Eric Dumazet [Thu, 18 Jul 2013 16:35:10 +0000 (09:35 -0700)]
vlan: fix a race in egress prio management
egress_priority_map[] hash table updates are protected by rtnl,
and we never remove elements until device is dismantled.
We have to make sure that before inserting an new element in hash table,
all its fields are committed to memory or else another cpu could
find corrupt values and crash.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 18 Jul 2013 14:19:26 +0000 (07:19 -0700)]
vlan: mask vlan prio bits
In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.
But we also have a problem if prio bits are set.
Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.
Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :
It seems we are not really ready to properly cope with this right now.
We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.
For stable kernels, lets clear vlan_tci to fix the bugs.
Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 18 Jul 2013 02:55:16 +0000 (10:55 +0800)]
macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.
Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.
This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.
Jason Wang [Thu, 18 Jul 2013 02:55:15 +0000 (10:55 +0800)]
tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.
Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.
This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.
Paolo Valente [Tue, 16 Jul 2013 06:52:30 +0000 (08:52 +0200)]
pkt_sched: sch_qfq: remove a source of high packet delay/jitter
QFQ+ inherits from QFQ a design choice that may cause a high packet
delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
special quantity, the system virtual time, to track the service
provided by the ideal system it approximates. When a packet is
dequeued, this quantity must be incremented by the size of the packet,
divided by the sum of the weights of the aggregates waiting to be
served. Tracking this sum correctly is a non-trivial task, because, to
preserve tight service guarantees, the decrement of this sum must be
delayed in a special way [1]: this sum can be decremented only after
that its value would decrease also in the ideal system approximated by
QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
weight sum, increased and decreased immediately as the weight of an
aggregate changes, and as an aggregate is created or destroyed (which,
in its turn, happens as a consequence of some class being
created/destroyed/changed). However, to avoid the problems caused to
service guarantees by these immediate decreases, QFQ+ increments the
system virtual time using the maximum value allowed for the weight
sum, 2^10, in place of the dynamic, instantaneous value. The
instantaneous value of the weight sum is used only to check whether a
request of weight increase or a class creation can be satisfied.
Unfortunately, the problems caused by this choice are worse than the
temporary degradation of the service guarantees that may occur, when a
class is changed or destroyed, if the instantaneous value of the
weight sum was used to update the system virtual time. In fact, the
fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
equal to the ratio between the weight of the aggregate and the sum of
the weights of the competing aggregates. The packet delay guaranteed
to the aggregate is instead inversely proportional to the guaranteed
bandwidth. By using the maximum possible value, and not the actual
value of the weight sum, QFQ+ provides each aggregate with the worst
possible service guarantees, and not with service guarantees related
to the actual set of competing aggregates. To see the consequences of
this fact, consider the following simple example.
Suppose that only the following aggregates are backlogged, i.e., that
only the classes in the following aggregates have packets to transmit:
one aggregate with weight 10, say A, and ten aggregates with weight 1,
say B1, B2, ..., B10. In particular, suppose that these aggregates are
always backlogged. Given the weight distribution, the smoothest and
fairest service order would be:
A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...
QFQ+ would provide exactly this optimal service if it used the actual
value for the weight sum instead of the maximum possible value, i.e.,
11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
serves aggregates as follows (easy to prove and to reproduce
experimentally):
A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...
By replacing 10 with N in the above example, and by increasing N, one
can increase at will the maximum packet delay and the jitter
experienced by the classes in aggregate A.
This patch addresses this issue by just using the above
'instantaneous' value of the weight sum, instead of the maximum
possible value, when updating the system virtual time. After the
instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
service for a time interval in the order of the time to serve one
maximum-size packet for each backlogged class. The worst-case extent
of the deviation exhibited by QFQ+ during this time interval [1] is
basically the same as of the deviation described above (but, without
this patch, QFQ+ suffers from such a deviation all the time). Finally,
this patch modifies the comment to the function qfq_slot_insert, to
make it coherent with the fact that the weight sum used by QFQ+ can
now be lower than the maximum possible value.
[1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
Proceedings of AAA-IDEA'05, June 2005.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg KH:
"Here are some driver core patches for 3.11-rc2. They aren't really
bugfixes, but a bunch of new helper macros for drivers to properly
create attribute groups, which drivers and subsystems need to fix up a
ton of race issues with incorrectly creating sysfs files (binary and
normal) after userspace has been told that the device is present.
Also here is the ability to create binary files as attribute groups,
to solve that race condition, which was impossible to do before this,
so that's my fault the drivers were broken.
The majority of the .c changes is indenting and moving code around a
bit. It affects no existing code, but allows the large backlog of 70+
patches that I already have created to start flowing into the
different subtrees, instead of having to live in my driver-core tree,
causing merge nightmares in linux-next for the next few months.
These were finalized too late for the -rc1 merge window, which is why
they were didn't make that pull request, testing and review from
others didn't happen until a few weeks ago, and then there's the whole
distraction of the past few days, which prevented these from getting
to you sooner, sorry about that.
Oh, and there's a bugfix for the documentation build warning in here
as well. All of these have been in linux-next this week, with no
reported problems"
* tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver-core: fix new kernel-doc warning in base/platform.c
sysfs: use file mode defines from stat.h
sysfs: add more helper macro's for (bin_)attribute(_groups)
driver core: add default groups to struct class
driver core: Introduce device_create_groups
sysfs: prevent warning when only using binary attributes
sysfs: add support for binary attributes in groups
driver core: device.h: add RW and RO attribute macros
sysfs.h: add BIN_ATTR macro
sysfs.h: add ATTRIBUTE_GROUPS() macro
sysfs.h: add __ATTR_RW() macro
* acpi-fixes:
ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
PNP / ACPI: avoid garbage in resource name
ACPI / memhotplug: Fix a stale pointer in error path
ACPI / scan: Always call acpi_bus_scan() for bus check notifications
ACPI / scan: Do not try to attach scan handlers to devices having them
Lan Tianyu [Tue, 16 Jul 2013 02:07:21 +0000 (10:07 +0800)]
ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
The BIOS of FUjitsu E753 reports an incorrect initial backlight value
for WIN8 compatible OS, causing backlight to be dark during startup.
This change causes the incorrect initial value from BIOS to be ignored.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60161 Reported-and-tested-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Cc: 3.7+ <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge branch 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull phase two of __cpuinit removal from Paul Gortmaker:
"With the __cpuinit infrastructure removed earlier, this group of
commits only removes the function/data tagging that was done with the
various (now no-op) __cpuinit related prefixes.
Now that the dust has settled with yesterday's v3.11-rc1, there
hopefully shouldn't be any new users leaking back in tree, but I think
we can leave the harmless no-op stubs there for a release as a
courtesy to those who still have out of tree stuff and weren't paying
attention.
Although the commits are against the recent tag to allow for minor
context refreshes for things like yesterday's v3.11-rc1~ slab content,
the patches have been largely unchanged for weeks, aside from such
trivial updates.
For detail junkies, the largely boring and mostly irrelevant history
of the patches can be viewed at:
If nothing else, I guess it does at least demonstrate the level of
involvement required to shepherd such a treewide change to completion.
This is the same repository of patches that has been applied to the
end of the daily linux-next branches for the past several weeks"
* 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (28 commits)
block: delete __cpuinit usage from all block files
drivers: delete __cpuinit usage from all remaining drivers files
kernel: delete __cpuinit usage from all core kernel files
rcu: delete __cpuinit usage from all rcu files
net: delete __cpuinit usage from all net files
acpi: delete __cpuinit usage from all acpi files
hwmon: delete __cpuinit usage from all hwmon files
cpufreq: delete __cpuinit usage from all cpufreq files
clocksource+irqchip: delete __cpuinit usage from all related files
x86: delete __cpuinit usage from all x86 files
score: delete __cpuinit usage from all score files
xtensa: delete __cpuinit usage from all xtensa files
openrisc: delete __cpuinit usage from all openrisc files
m32r: delete __cpuinit usage from all m32r files
hexagon: delete __cpuinit usage from all hexagon files
frv: delete __cpuinit usage from all frv files
cris: delete __cpuinit usage from all cris files
metag: delete __cpuinit usage from all metag files
tile: delete __cpuinit usage from all tile files
sh: delete __cpuinit usage from all sh files
...
Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Except for a slightly big OMAP changes, all rest are small, mostly
boring changes; all either 3.11 regression fixes or stable materials.
- ASoC OMAP fixes due to non-DT OMAP4 removals
- Other ASoC driver changes (sglt5000, wm8978, wm8948, samsung)
- Fix missing locking for snd_pcm_stop() calls in many drivers
- Fix the blocking request_module() in OSS sequencer
- Fix old OSS vwsnd driver builds
- Add a new HD-audio HDMI codec ID"
* tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
ALSA: seq-oss: Initialize MIDI clients asynchronously
ALSA: hda - Add new GPU codec ID to snd-hda
staging: line6: Fix unlocked snd_pcm_stop() call
[media] saa7134: Fix unlocked snd_pcm_stop() call
ASoC: s6000: Fix unlocked snd_pcm_stop() call
ASoC: atmel: Fix unlocked snd_pcm_stop() call
ALSA: pxa2xx: Fix unlocked snd_pcm_stop() call
ALSA: usx2y: Fix unlocked snd_pcm_stop() call
ALSA: ua101: Fix unlocked snd_pcm_stop() call
ALSA: 6fire: Fix unlocked snd_pcm_stop() call
ALSA: atiixp: Fix unlocked snd_pcm_stop() call
ALSA: asihpi: Fix unlocked snd_pcm_stop() call
sound: oss/vwsnd: Always define vwsnd_mutex
sound: oss/vwsnd: Add missing inclusion of linux/delay.h
ASoC: wm8978: enable symmetric rates
ASoC: omap-mcbsp: Use different method for DMA request when booted with DT
ASoC: omap-dmic: Do not use platform_get_resource_byname() for DMA
ASoC: omap-mcpdm: Do not use platform_get_resource_byname() for DMA
ASoC: omap-pcm: Request the DMA channel differently when DT is involved
ASoC: Samsung: Set RFS and BFS in slave mode
...
Michael Holzheu [Thu, 18 Jul 2013 10:18:27 +0000 (12:18 +0200)]
s390/kdump: Allow copy_oldmem_page() copy to virtual memory
The kdump mmap patch series (git commit 83086978c63afd7c73e1c) changed the
requirements for copy_oldmem_page(). Now this function is used for copying
to virtual memory.
So implement vmalloc support for the s390 version of copy_oldmem_page().
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Holzheu [Thu, 18 Jul 2013 10:17:57 +0000 (12:17 +0200)]
s390/kdump: Disable mmap for s390
The kdump mmap patch series (git commit 83086978c63afd7c73e1c) directly
map the PT_LOADs to memory. On s390 this does not work because the
copy_from_oldmem() function swaps [0,crashkernel size] with
[crashkernel base, crashkernel base+crashkernel size]. The swap
int copy_from_oldmem() was done in order correctly implement /dev/oldmem.
s390/bpf,jit: address randomize and write protect jit code
This is the s390 variant of 314beb9b "x86: bpf_jit_comp: secure bpf
jit against spraying attacks".
With this change the whole jit code and literal pool will be write
protected after creation. In addition the start address of the jit
code won't be always on a page boundary anymore.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
KVM: MMU: avoid fast page fault fixing mmio page fault
Currently, fast page fault incorrectly tries to fix mmio page fault when
the generation number is invalid (spte.gen != kvm.gen). It then returns
to guest to retry the fault since it sees the last spte is nonpresent.
This causes an infinite loop.
Since fast page fault only works for direct mmu, the issue exists when
1) tdp is enabled. It is only triggered only on AMD host since on Intel host
the mmio page fault is recognized as ept-misconfig whose handler call
fault-page path with error_code = 0
2) guest paging is disabled. Under this case, the issue is hardly discovered
since paging disable is short-lived and the sptes will be invalid after
memslot changed for 150 times
Fix it by filtering out MMIO page faults in page_fault_can_be_fast.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>