git.karo-electronics.de Git - linux-beck.git/log

Merge branch 'misc-4.6' into for-chris-4.6

# Conflicts:
# fs/btrfs/file.c

Merge branch 'cleanups-4.6' into for-chris-4.6

Merge branch 'foreign/liubo/replace-lockup' into for-chris-4.6

Merge branch 'foreign/josef/space-updates' into for-chris-4.6

Merge branch 'foreign/zhaolei/reada' into for-chris-4.6

Merge branch 'foreign/qu/norecovery-v7' into for-chris-4.6

Merge branch 'dev/rename-keys' into for-chris-4.6

Merge branch 'dev/gfp-flags' into for-chris-4.6

Merge branch 'chandan/prep-subpage-blocksize' into for-chris-4.6

# Conflicts:
# fs/btrfs/file.c

Btrfs: fix lockdep deadlock warning due to dev_replace

Xfstests btrfs/011 complains about a deadlock warning,

[ 1226.649039] =========================================================
[ 1226.649039] [ INFO: possible irq lock inversion dependency detected ]
[ 1226.649039] 4.1.0+ #270 Not tainted
[ 1226.649039] ---------------------------------------------------------
[ 1226.652955] kswapd0/46 just changed the state of lock:
[ 1226.652955]  (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
[ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[ 1226.652955]  (&fs_info->dev_replace.lock){+.+.+.}

and interrupts could create inverse lock ordering between them.

[ 1226.652955]
other info that might help us debug this:
[ 1226.652955] Chain exists of:
  &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock

[ 1226.652955]  Possible interrupt unsafe locking scenario:

[ 1226.652955]        CPU0                    CPU1
[ 1226.652955]        ----                    ----
[ 1226.652955]   lock(&fs_info->dev_replace.lock);
[ 1226.652955]                                local_irq_disable();
[ 1226.652955]                                lock(&delayed_node->mutex);
[ 1226.652955]                                lock(&found->groups_sem);
[ 1226.652955]   <Interrupt>
[ 1226.652955]     lock(&delayed_node->mutex);
[ 1226.652955]
*** DEADLOCK ***

Commit 084b6e7c7607 ("btrfs: Fix a lockdep warning when running xfstest.") tried
to fix a similar one that has the exactly same warning, but with that, we still
run to this.

The above lock chain comes from
btrfs_commit_transaction
  ->btrfs_run_delayed_items
    ...
    ->__btrfs_update_delayed_inode
      ...
      ->__btrfs_cow_block
         ...
         ->find_free_extent
            ->cache_block_group
              ->load_free_space_cache
                ->btrfs_readpages
                  ->submit_one_bio
                    ...
                    ->__btrfs_map_block
                      ->btrfs_dev_replace_lock

However, with high memory pressure, tasks which hold dev_replace.lock can
be interrupted by kswapd and then kswapd is intended to release memory occupied
by superblock, inodes and dentries, where we may call evict_inode, and it comes
to

[ 1226.652955]  [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
[ 1226.652955]  [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30
[ 1226.652955]  [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700

delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads
to a ABBA deadlock.

To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but
things are simpler here since we only needs read's spinlock to blocking lock.

With this, btrfs/011 no more produces warnings in dmesg.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: change max_inline default to 2048

The current practical default is ~4k on x86_64 (the logic is more complex,
simplified for brevity), the inlined files land in the metadata group and
thus consume space that could be needed for the real metadata.

The inlining brings some usability surprises:

1) total space consumption measured on various filesystems and btrfs
   with DUP metadata was quite visible because of the duplicated data
   within metadata

2) inlined data may exhaust the metadata, which are more precious in case
   the entire device space is allocated to chunks (ie. balance cannot
   make the space more compact)

3) performance suffers a bit as the inlined blocks are duplicate and
   stored far away on the device.

Proposed fix: set the default to 2048

This fixes namely 1), the total filesysystem space consumption will be on
par with other filesystems.

Partially fixes 2), more data are pushed to the data block groups.

The characteristics of 3) are based on actual small file size
distribution.

The change is independent of the metadata blockgroup type (though it's
most visible with DUP) or system page size as these parameters are not
trival to find out, compared to file size.

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove error message from search ioctl for nonexistent tree

Let's remove the error message that appears when the tree_id is not
present. This can happen with the quota tree and has been observed in
practice. The applications are supposed to handle -ENOENT and we don't
need to report that in the system log as it's not a fatal error.

Reported-by: Vlastimil Babka <vbabka@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: avoid uninitialized variable warning

With CONFIG_SMP and CONFIG_PREEMPT both disabled, gcc decides
to partially inline the get_state_failrec() function but cannot
figure out that means the failrec pointer is always valid
if the function returns success, which causes a harmless
warning:

fs/btrfs/extent_io.c: In function 'clean_io_failure':
fs/btrfs/extent_io.c:2131:4: error: 'failrec' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This marks get_state_failrec() and set_state_failrec() both
as 'noinline', which avoids the warning in all cases for me,
and seems less ugly than adding a fake initialization.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 47dc196ae719 ("btrfs: use proper type for failrec in extent_state")
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix memory leak of fs_info in block group cache

When starting up linux with btrfs filesystem, I got many memory leak
messages by kmemleak as,

unreferenced object 0xffff880066882000 (size 4096):
  comm "modprobe", pid 730, jiffies 4294690024 (age 196.599s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8174d52e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811d09aa>] kmem_cache_alloc_trace+0xea/0x1e0
    [<ffffffffa03620fb>] btrfs_alloc_dummy_fs_info+0x6b/0x2a0 [btrfs]
    [<ffffffffa03624fc>] btrfs_alloc_dummy_block_group+0x5c/0x120 [btrfs]
    [<ffffffffa0360aa9>] btrfs_test_free_space_cache+0x39/0xed0 [btrfs]
    [<ffffffffa03b5a74>] trace_raw_output_xfs_attr_class+0x54/0xe0 [xfs]
    [<ffffffff81002122>] do_one_initcall+0xb2/0x1f0
    [<ffffffff811765aa>] do_init_module+0x5e/0x1e9
    [<ffffffff810fec09>] load_module+0x20a9/0x2690
    [<ffffffff810ff439>] SyS_finit_module+0xb9/0xf0
    [<ffffffff81757daf>] entry_SYSCALL_64_fastpath+0x12/0x76
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff8800573f8000 (size 10256):
  comm "modprobe", pid 730, jiffies 4294690185 (age 196.460s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8174d52e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8119ca6e>] kmalloc_order+0x5e/0x70
    [<ffffffff8119caa4>] kmalloc_order_trace+0x24/0x90
    [<ffffffffa03620b3>] btrfs_alloc_dummy_fs_info+0x23/0x2a0 [btrfs]
    [<ffffffffa03624fc>] btrfs_alloc_dummy_block_group+0x5c/0x120 [btrfs]
    [<ffffffffa036603d>] run_test+0xfd/0x320 [btrfs]
    [<ffffffffa0366f34>] btrfs_test_free_space_tree+0x94/0xee [btrfs]
    [<ffffffffa03b5aab>] trace_raw_output_xfs_attr_class+0x8b/0xe0 [xfs]
    [<ffffffff81002122>] do_one_initcall+0xb2/0x1f0
    [<ffffffff811765aa>] do_init_module+0x5e/0x1e9
    [<ffffffff810fec09>] load_module+0x20a9/0x2690
    [<ffffffff810ff439>] SyS_finit_module+0xb9/0xf0
    [<ffffffff81757daf>] entry_SYSCALL_64_fastpath+0x12/0x76
    [<ffffffffffffffff>] 0xffffffffffffffff

This patch lets btrfs using fs_info stored in btrfs_root for
block group cache directly without allocating a new one.

Fixes: d0bd456074 ("Btrfs: add fragment=* debug mount option")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: Continue write in case of can_not_nocow

btrfs failed in xfstests btrfs/080 with -o nodatacow.

Can be reproduced by following script:
  DEV=/dev/vdg
  MNT=/mnt/tmp

  umount $DEV &>/dev/null
  mkfs.btrfs -f $DEV
  mount -o nodatacow $DEV $MNT

  dd if=/dev/zero of=$MNT/test bs=1 count=2048 &
  btrfs subvolume snapshot -r $MNT $MNT/test_snap &
  wait
  --
  We can see dd failed on NO_SPACE.

Reason:
  __btrfs_buffered_write should run cow write when no_cow impossible,
  and current code is designed with above logic.
  But check_can_nocow() have 2 type of return value(0 and <0) on
  can_not_no_cow, and current code only continue write on first case,
  the second case happened in doing subvolume.

Fix:
  Continue write when check_can_nocow() return 0 and <0.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>

btrfs: drop null testing before destroy functions

Cleanup.

kmem_cache_destroy has support NULL argument checking,
so drop the double null testing before calling it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: fix build warning

We were getting build warning about:
fs/btrfs/extent-tree.c:7021:34: warning: ‘used_bg’ may be used
uninitialized in this function

It is not a valid warning as used_bg is never used uninitilized since
locked is initially false so we can never be in the section where
'used_bg' is used. But gcc is not able to understand that and we can
initialize it while declaring to silence the warning.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: use proper type for failrec in extent_state

We use the private member of extent_state to store the failrec and play
pointless pointer games.

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: Replace CURRENT_TIME by current_fs_time()

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove open-coded swap() in backref.c:__merge_refs

The kernel provides a swap() that does the same thing as this code.

Signed-off-by: Dave Jones <dsj@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: remove redundant error check

While running btrfs_mksubvol(), d_really_is_positive() is called twice.
First in btrfs_mksubvol() and second inside btrfs_may_create(). So I
remove the first one.

Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: simplify expression in btrfs_calc_trans_metadata_size()

Simplify expression in btrfs_calc_trans_metadata_size().

Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.com>

Btrfs: check reserved when deciding to background flush

We will sometimes start background flushing the various enospc related things
(delayed nodes, delalloc, etc) if we are getting close to reserving all of our
available space.  We don't want to do this however when we are actually using
this space as it causes unneeded thrashing.  We currently try to do this by
checking bytes_used >= thresh, but bytes_used is only part of the equation, we
need to use bytes_reserved as well as this represents space that is very likely
to become bytes_used in the future.

My tracing tool will keep count of the number of times we kick off the async
flusher, the following are counts for the entire run of generic/027

No Patch Patch
avg: 5385 5009
median: 5500 4916

We skewed lower than the average with my patch and higher than the average with
the patch, overall it cuts the flushing from anywhere from 5-10%, which in the
case of actual ENOSPC is quite helpful.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Btrfs: add transaction space reservation tracepoints

There are a few places where we add to trans->bytes_reserved but don't have the
corresponding trace point. With these added my tool no longer sees transaction
leaks.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Btrfs: fix truncate_space_check

truncate_space_check is using btrfs_csum_bytes_to_leaves() but forgetting to
multiply by nodesize so we get an actual byte count. We need a tracepoint here
so that we have the matching reserve for the release that will come later. Also
add a comment to make clear what the intent of truncate_space_check is.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Btrfs: change how we update the global block rsv

I'm writing a tool to visualize the enospc system in order to help debug enospc
bugs and I found weird data and ran it down to when we update the global block
rsv.  We add all of the remaining free space to the block rsv, do a trace event,
then remove the extra and do another trace event.  This makes my visualization
look silly and is unintuitive code as well.  Fix this stuff to only add the
amount we are missing, or free the amount we are missing.  This is less clean to
read but more explicit in what it is doing, as well as only emitting events for
values that make sense.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: ignore creating reada_extent for a non-existent device

For a non-existent device, old code bypasses adding it in dev's reada
queue.

And to solve problem of unfinished waitting in raid5/6,
commit 5fbc7c59fd22 ("Btrfs: fix unfinished readahead thread for
raid5/6 degraded mounting")
adding an exception for the first stripe, in short, the first
stripe will always be processed whether the device exists or not.

Actually we have a better way for the above request: just bypass
creation of the reada_extent for non-existent device, it will make
code simple and effective.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: avoid undone reada extents in btrfs_reada_wait

Reada background works is not designed to finish all jobs
completely, it will break in following case:
1: When a device reaches workload limit (MAX_IN_FLIGHT)
2: Total reads reach max limit (10000)
3: All devices don't have queued more jobs, often happened in DUP case

And if all background works exit with remaining jobs,
btrfs_reada_wait() will wait indefinetelly.

Above problem is rarely happened in old code, because:
1: Every work queues 2x new works
So many works reduced chances of undone jobs.
2: One work will continue 10000 times loop in case of no-jobs
It reduced no-thread window time.

But after we fixed above case, the "undone reada extents" frequently
happened.

Fix:
Check to ensure we have at least one thread if there are undone jobs
in btrfs_reada_wait().

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: limit max works count

Reada creates 2 works for each level of tree recursively.

In case of a tree having many levels, the number of created works
is 2^level_of_tree.
Actually we don't need so many works in parallel, this patch limits
max works to BTRFS_MAX_MIRRORS * 2.

The per-fs works_counter will be also used for btrfs_reada_wait() to
check is there are background workers.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: simplify dev->reada_in_flight processing

No need to decrease dev->reada_in_flight in __readahead_hook()'s
internal and reada_extent_put().
reada_extent_put() have no chance to decrease dev->reada_in_flight
in free operation, because reada_extent have additional refcnt when
scheduled to a dev.

We can put inc and dec operation for dev->reada_in_flight to one
place instead to make logic simple and safe, and move useless
reada_extent->scheduled_for to a bool flag instead.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Fix a debug code typo

Remove one copy of loop to fix the typo of iterate zones.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Jump into cleanup in direct way for __readahead_hook()

Current code set nritems to 0 to make for_loop useless to bypass it,
and set generation's value which is not necessary.
Jump into cleanup directly is better choise.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Use fs_info instead of root in __readahead_hook's argument

What __readahead_hook() need exactly is fs_info, no need to convert
fs_info to root in caller and convert back in __readahead_hook()

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Pass reada_extent into __readahead_hook directly

reada_start_machine_dev() already have reada_extent pointer, pass
it into __readahead_hook() directly instead of search radix_tree
will make code run faster.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: move reada_extent_put to place after __readahead_hook()

We can't release reada_extent earlier than __readahead_hook(), because
__readahead_hook() still need to use it, it is necessary to hode a refcnt
to avoid it be freed.

Actually it is not a problem after my patch named:
Avoid many times of empty loop
It make reada_extent in above line include at least one reada_extctl,
which keeps additional one refcnt for reada_extent.

But we still need this patch to make the code in pretty logic.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Remove level argument in severial functions

level is not used in severial functions, remove them from arguments,
and remove relative code for get its value.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: bypass adding extent when all zone failed

When failed adding all dev_zones for a reada_extent, the extent
will have no chance to be selected to run, and keep in memory
for ever.

We should bypass this extent to avoid above case.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: add all reachable mirrors into reada device list

If some device is not reachable, we should bypass and continus addingb
next, instead of break on bad device.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Move is_need_to_readahead contition earlier

Move is_need_to_readahead contition earlier to avoid useless loop
to get relative data for readahead.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Avoid many times of empty loop

We can see following loop(10000 times) in trace_log:
[   75.416137] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
[   75.417413] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
[   75.418611] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
[   75.419793] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1

[   75.421016] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
[   75.422324] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
[   75.423661] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
[   75.424882] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1

...(10000 times)

[  124.101672] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
[  124.102850] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
[  124.104008] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
[  124.105121] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1

Reason:
If more than one user trigger reada in same extent, the first task
finished setting of reada data struct and call reada_start_machine()
to start, and the second task only add a ref_count but have not
add reada_extctl struct completely, the reada_extent can not finished
all jobs, and will be selected in __reada_start_machine() for 10000
times(total times in __reada_start_machine()).

Fix:
For a reada_extent without job, we don't need to run it, just return
0 to let caller break.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Add missed segment checking in reada_find_zone

In rechecking zone-in-tree, we still need to check zone include
our logical address.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: reduce additional fs_info->reada_lock in reada_find_zone

We can avoid additional locking-acquirment and one pair of
kref_get/put by combine two condition.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: reada: Fix in-segment calculation for reada

reada_zone->end is end pos of segment:
end = start + cache->key.offset - 1;

So we need to use "<=" in condition to judge is a pos in the
segment.

The problem happened rearly, because logical pos rarely pointed
to last 4k of a blockgroup, but we need to fix it to make code
right in logic.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Linux 4.5-rc4

Merge tag 'char-misc-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are 3 fixes for some reported issues.  Two nvmem driver fixes,
  and one mei fix.  All have been in linux-next just fine"

* tag 'char-misc-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  nvmem: qfprom: Specify LE device endianness
  nvmem: core: return error for non word aligned access
  mei: validate request value in client notify request ioctl

Merge tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
"Here is one driver core, well klist, fix for 4.5-rc4.

  It fixes a problem found in the scsi device list traversal that
  probably also could be triggered by other subsystems.

  The fix has been in linux-next for a while with no reported problems"

* tag 'driver-core-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  klist: fix starting point removed bug in klist iterators

Merge tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for 4.5-rc4
  that resolve some reported issues.

  One of them got reverted as it wasn't correct based on testing, and
  all have been in linux-next for a while"

* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "8250: uniphier: allow modular build with 8250 console"
  pty: make sure super_block is still valid in final /dev/tty close
  pty: fix possible use after free of tty->driver_data
  tty: Add support for PCIe WCH382 2S multi-IO card
  serial/omap: mark wait_for_xmitr as __maybe_unused
  serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
  8250: uniphier: allow modular build with 8250 console
  tty: Drop krefs for interrupted tty lock

Merge tag 'usb-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull PHY fixes from Greg KH:
"Here are a couple of PHY driver fixes for 4.5-rc4.

  A few small phy issues.  All have been in linux-next with no reported
  issues"

* tag 'usb-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload
  phy: twl4030-usb: Relase usb phy on unload
  phy: core: fix wrong err handle for phy_power_on
  phy: Restrict phy-hi6220-usb to HiSilicon arm64

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf tooling fixes from Thomas Gleixner:
"Another round of fixes for the perf tooling side:

   - Prevent a NULL pointer dereference in tracepoint error handling

   - Fix a thread handling bug in the intel_pt error handling code

   - Search both .eh_frame and .debug_frame sections as toolchains seem
     to have random choices of storing the CFI information

   - Fix the perf state interval output values, which got broken when
     fixing the overall output"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf stat: Fix interval output values
  perf probe: Search both .eh_frame and .debug_frame sections for probe location
  perf tools: Fix thread lifetime related segfaut in intel_pt
  perf tools: tracepoint_error() can receive e=NULL, robustify it

Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull lockdep fix from Thomas Gleixner:
"A single fix for the stack trace caching logic in lockdep, where the
duplicate avoidance managed to store no back trace at all"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix stack trace caching logic

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
"A single fix preventing a 32bit overflow in timespec/val to cputime
conversions on 32bit machines"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irqchip fixes from Thomas Gleixner:
"Another set of ARM SoC related irqchip fixes:
   - Plug a memory leak in gicv3-its
   - Limit features to the root gic interrupt controller
   - Add a missing barrier in the gic-v3 IAR access
   - Another compile test fix for sun4i"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor
  irqchip/gic: Only set the EOImodeNS bit for the root controller
  irqchip/gic: Only populate set_affinity for the root controller
  irqchip/gicv3-its: Fix memory leak in its_free_tables()
  irqchip/sun4i: Fix compilation outside of arch/arm

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"Two small fixlets for x86:

   - Prevent a KASAN false positive in thread_saved_pc()

   - Fix a 32-bit truncation problem in the x86 numa code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels
  x86: Fix KASAN false positives in thread_saved_pc()

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"Here's the first round of MIPS fixes after the merge window:

   - Detect Octeon III's PCI correctly.
   - Fix return value of the MT7620 probing function.
   - Wire up the copy_file_range syscall.
   - Fix 64k page support on 32 bit kernels.
   - Fix the early Coherency Manager probe.
   - Allow only hardware-supported page sizes to be selected for R6000.
   - Fix corner cases for the RDHWR nstruction emulation on old hardware.
   - Fix FPU handling corner cases.
   - Remove stale entry for BCM33xx from the MAINTAINERS file.
   - 32 and 64 bit ELF headers are different, handle them correctly"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  mips: Differentiate between 32 and 64 bit ELF header
  MIPS: Octeon: Update OCTEON_FEATURE_PCIE for Octeon III
  MIPS: pci-mt7620: Fix return value check in mt7620_pci_probe()
  MIPS: Fix early CM probing
  MIPS: Wire up copy_file_range syscall.
  MIPS: Fix 64k page support for 32 bit kernels.
  MIPS: R6000: Don't allow 64k pages for R6000.
  MIPS: traps.c: Correct microMIPS RDHWR emulation
  MIPS: traps.c: Don't emulate RDHWR in the CpU #0 exception handler
  MAINTAINERS: Remove stale entry for BCM33xx chips
  MIPS: Fix FPU disable with preemption
  MIPS: Properly disable FPU in start_thread()
  MIPS: Fix buffer overflow in syscall_get_arguments()

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"A couple of ARM fixes from Linus for the ICST clock generator code"

[ "Linus" here is Linus Walleij.  Name-stealer.

       Linus "there can be only one" Torvalds ]

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8519/1: ICST: try other dividends than 1
  ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()

Merge branch 'component' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull component helper fixes from Russell King:
"A few fixes for problems people have encountered with the recent
  update to the component helpers"

* 'component' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  component: remove device from master match list on failed add
  component: Detach components when deleting master struct
  component: fix crash on x86_64 with hda audio drivers

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma fixes from Doug Ledford:
"I think we are getting pretty close to done now.  There are four
  one-off fixes in this update:

   - fix ipoib multicast joins
   - fix mlx4 error handling
   - fix mlx5 size computation
   - fix a thinko in core code"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Fix RC transport send queue overhead computation
  IB/ipoib: fix for rare multicast join race condition
  IB/core: Fix reading capability mask of the port info class
  net/mlx4: fix some error handling in mlx4_multi_func_init()

Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"This includes the long awaited series to address a set of bugs around
  active I/O remote-port LUN_RESET, as well as properly handling this
  same case with concurrent fabric driver session disconnect ->
  reconnect.

  Note this set of LUN_RESET bug-fixes has been surviving extended
  testing on both v4.5-rc1 and v3.14.y code over the last weeks, and is
  CC'ed for stable as it's something folks using multiple ESX connected
  hosts with slow backends can certainly trigger.

  The highlights also include:

   - Fix WRITE_SAME/DISCARD emulation 4k sector conversion in
     target/iblock (Mike Christie)

   - Fix TMR abort interaction and AIO type TMR response in qla2xxx
     target (Quinn Tran + Swapnil Nagle)

   - Fix >= v3.17 stale descriptor pointer regression in qla2xxx target
     (Quinn Tran)

   - Fix >= v4.5-rc1 return regression with unmap_zeros_data_store new
     configfs store handler (nab)

   - Add CPU affinity flag + convert qla2xxx to use bit (Quinn + HCH +
     Bart)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  qla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity
  target/transport: add flag to indicate CPU Affinity is observed
  target: Fix incorrect unmap_zeroes_data_store return
  qla2xxx: Use ATIO type to send correct tmr response
  qla2xxx: Fix stale pointer access.
  target/user: Fix cast from pointer to phys_addr_t
  target: Drop legacy se_cmd->task_stop_comp + REQUEST_STOP usage
  target: Fix race with SCF_SEND_DELAYED_TAS handling
  target: Fix remote-port TMR ABORT + se_cmd fabric stop
  target: Fix TAS handling for multi-session se_node_acls
  target: Fix LUN_RESET active TMR descriptor handling
  target: Fix LUN_RESET active I/O handling for ACK_KREF
  qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM
  qla2xxx: Fix warning reported by static checker
  target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal

Pull thermal management fixes from Eduardo Valentin:
"Specifics in this pull request:

   - Compilation fixes on SPEAR, and U8500 thermal drivers.
   - RCAR thermal driver now recognizes OF-thermal based thermal zones.
   - Small code rework on OF-thermal.
   - These change have been CI tested using KernelCI bot [1,2].  \o/

  I am taking over on Rui's behalf while he is out.  Happy New Chinese
  Year!

  [1] - https://kernelci.org/build/evalenti/kernel/v4.5-rc3-16-ga53b8394ec3c/
  [2] - https://kernelci.org/boot/all/job/evalenti/kernel/v4.5-rc3-16-ga53b8394ec3c/"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: cpu_cooling: fix out of bounds access in time_in_idle
  thermal: allow u8500-thermal driver to be a module
  thermal: allow spear-thermal driver to be a module
  thermal: spear: use __maybe_unused for PM functions
  thermal: rcar: enable to use thermal-zone on DT
  thermal: of: use for_each_available_child_of_node for child iterator

Merge tag 'sound-fix-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull another sound fix from Takashi Iwai:
"This contains a fix for the double-free of usb-audio MIDI device at
probe failure"

* tag 'sound-fix-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: avoid freeing umidi object twice

Merge tag 'arc-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
"I've been sitting on some of these fixes for a while.

   - Corner case of returning to delay slot from interrupt
   - Changing default interrupt prioiry level
   - Kconfig'ize support for super pages
   - Other minor fixes"

* tag 'arc-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: mm: Introduce explicit super page size support
  ARCv2: intc: Allow interruption by lowest priority interrupt
  ARCv2: Check for LL-SC livelock only if LLSC is enabled
  ARC: shrink cpuinfo by not saving full timer BCR
  ARCv2: clocksource: Rename GRTC -> GFRC ...
  ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2

ALSA: usb-audio: avoid freeing umidi object twice

The 'umidi' object will be free'd on the error path by snd_usbmidi_free()
when tearing down the rawmidi interface. So we shouldn't try to free it
in snd_usbmidi_create() after having registered the rawmidi interface.

Found by KASAN.

Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'pci-v4.5-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"These are some Renesas binding updates for PCI host controllers, a
  Broadcom fix for a regression we added in v4.5-rc1, and a fix for an
  AER use-after-free problem that can cause memory corruption.

  Summary:

  AER:
    Flush workqueue on device remove to avoid use-after-free (Sebastian Andrzej Siewior)

  Broadcom iProc host bridge driver:
    Allow multiple devices except on PAXC (Ray Jui)

  Renesas R-Car host bridge driver:
    Add gen2 device tree support for r8a7793 (Simon Horman)
    Add device tree support for r8a7793 (Simon Horman)"

* tag 'pci-v4.5-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: rcar: Add device tree support for r8a7793
  PCI: rcar: Add gen2 device tree support for r8a7793
  PCI: iproc: Allow multiple devices except on PAXC
  PCI/AER: Flush workqueue on device remove to avoid use-after-free

Merge branch 'akpm'(patches from Andrew)

Merge fixes from Andrew Morton:
"10 fixes"

The lockdep hlist conversion is in the locking tree too, waiting for the
next merge window.  Andrew thought it should go in now.  I'll take it,
since it fixes a real problem and looks trivially correct (famous last
words).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI
  mm: fix pfn_t vs highmem
  kernel/locking/lockdep.c: convert hash tables to hlists
  mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
  mm,thp: khugepaged: call pte flush at the time of collapse
  mm/backing-dev.c: fix error path in wb_init()
  mm, dax: check for pmd_none() after split_huge_pmd()
  vsprintf: kptr_restrict is okay in IRQ when 2
  mm: fix filemap.c kernel doc warning
  ubsan: cosmetic fix to Kconfig text

IB/mlx5: Fix RC transport send queue overhead computation

Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.

The ATOMIC and UMR segments can't coexist together, so chose maximum out
of them.

The commit 9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead
computation") was intended to update RC transport as commit messages
states, but added the code to UC transport.

Fixes: 9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead computation")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

IB/ipoib: fix for rare multicast join race condition

A narrow window for race condition still exist between
multicast join thread and *dev_flush workers.
A kernel crash caused by prolong erratic link state changes
was observed (most likely a faulty cabling):

[167275.656270] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
[167275.674443] PGD 0
[167275.677373] Oops: 0000 [#1] SMP
...
[167275.977530] Call Trace:
[167275.982225]  [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
[167275.992024]  [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
[ib_ipoib]
[167276.002149]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[167276.010754]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[167276.019088]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[167276.027737]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
Here was a hit spot:
ipoib_mcast_join() {
..............
      rec.qkey      = priv->broadcast->mcmember.qkey;
                                       ^^^^^^^
.....
}
Proposed patch should prevent multicast join task to continue
if link state change is detected.

Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Changes from v4:
- as suggested by Doug Ledford, optimized spinlock usage,
i.e. ipoib_mcast_join() is called with lock held.
Changes from v3:
- sync with priv->lock before flag check.
Chages from v2:
- Move check for OPER_UP flag state to mcast_join() to
ensure no event worker is in progress.
- minor style fixes.
Changes from v1:
- No need to lock again if error detected.
Signed-off-by: Doug Ledford <dledford@redhat.com>

Merge tag 'mmc-v4.5-rc2' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC fixes from Ulf Hansson:
"Here are some mmc fixes intended for v4.5 rc4.

  MMC core:
   - Fix an sysfs ABI regression
   - Return an error in a specific error path dealing with mmc ioctls

  MMC host:
   - sdhci-pci|acpi: Fix card detect race for Intel BXT/APL
   - sh_mmcif: Correct TX DMA channel allocation
   - mmc_spi: Fix error handling for dma mapping errors
   - sdhci-of-at91: Fix an unbalance issue for the runtime PM usage count
   - pxamci: Fix the device-tree probe deferral path
   - pxamci: Fix read-only GPIO polarity"

* tag 'mmc-v4.5-rc2' of git://git.linaro.org/people/ulf.hansson/mmc:
  Revert "mmc: block: don't use parameter prefix if built as module"
  mmc: sdhci-acpi: Fix card detect race for Intel BXT/APL
  mmc: sdhci-pci: Fix card detect race for Intel BXT/APL
  mmc: sdhci: Allow override of get_cd() called from sdhci_request()
  mmc: sdhci: Allow override of mmc host operations
  mmc: sh_mmcif: Correct TX DMA channel allocation
  mmc: block: return error on failed mmc_blk_get()
  mmc: pxamci: fix the device-tree probe deferral path
  mmc: mmc_spi: add checks for dma mapping error
  mmc: sdhci-of-at91: fix pm runtime unbalanced issue in error path
  mmc: pxamci: fix again read-only gpio detection polarity

Merge tag 'sound-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"In this rc, we've got more volume than previous rc, unsurprisingly;
  the majority of updates in ASoC are about Intel drivers, and another
  major changes are the continued plumbing of ALSA timer bugs revealed
  by syzkaller fuzzer.  Hopefully both settle down now.

  Other than that, HD-audio received a couple of code fixes as well as
  the usual quirks, and various small fixes are found for FireWire
  devices, ASoC codecs and drivers"

* tag 'sound-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (50 commits)
  ASoC: arizona: fref must be limited in pseudo-fractional mode
  ASoC: sigmadsp: Fix missleading return value
  ALSA: timer: Fix race at concurrent reads
  ALSA: firewire-digi00x: Drop bogus const type qualifier on dot_scrt()
  ALSA: hda - Fix bad dereference of jack object
  ALSA: timer: Fix race between stop and interrupt
  ALSA: timer: Fix wrong instance passed to slave callbacks
  ASoC: Intel: Add module tags for common match module
  ASoC: Intel: Load the atom DPCM driver only
  ASoC: Intel: Create independent acpi match module
  ASoC: Intel: Revert "ASoC: Intel: fix ACPI probe regression with Atom DPCM driver"
  ALSA: dummy: Implement timer backend switching more safely
  ALSA: hda - Fix speaker output from VAIO AiO machines
  Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
  ALSA: firewire-tascam: remove needless member for control and status message
  ALSA: firewire-tascam: remove a flag for controller
  ALSA: firewire-tascam: add support for FW-1804
  ALSA: firewire-tascam: fix NULL pointer dereference when model identification fails
  ALSA: hda - Fix static checker warning in patch_hdmi.c
  ASoC: Intel: Skylake: Remove autosuspend delay
  ...

Merge tag 'fbdev-fixes-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux

Pull fbdev fixes from Tomi Valkeinen:
- fix omap2plus_defconfig to enable omapfb as it was in v4.4
- ocfb: fix timings for margins
- s6e8ax0, da8xx-fb: fix compile warnings
- mmp: fix build failure caused by bad printk parameters
- imxfb: fix clock issue which kept the display off

* tag 'fbdev-fixes-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  video: fbdev: imxfb: Provide a reset mechanism
  fbdev: mmp: print IRQ resource using %pR format string
  fbdev: da8xx-fb: remove incorrect type cast
  fbdev: s6e8ax0: avoid unused function warnings
  ocfb: fix tgdel and tvdel timing parameters
  ARM: omap2plus_defconfig: update display configs

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"A set of seven fixes:

  Two regressions in the new hisi_sas arm driver, a blacklist entry for
  the marvell console which was causing a reset cascade without it, a
  race fix in the WRITE_SAME/DISCARD routines, a retry fix for the rdac
  driver, without which, it would prematurely return EIO and a couple of
  fixes for the hyper-v storvsc driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
  SCSI: Add Marvell Console to VPD blacklist
  scsi_dh_rdac: always retry MODE SELECT on command lock violation
  storvsc: Use the specified target ID in device lookup
  storvsc: Install the storvsc specific timeout handler for FC devices
  hisi_sas: fix v1 hw check for slot error
  hisi_sas: add dependency for HAS_IOMEM

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm amd fixes from Dave Airlie:
"Been pretty quiet.

  This is an amdgpu fixes pull from AMD, a bunch of powerplay stability
  fixes, race fix, hibernate fix, and a possible circular locking fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (21 commits)
  drm/amdgpu: fix issue with overlapping userptrs
  drm/radeon: hold reference to fences in radeon_sa_bo_new
  drm/amdgpu: remove unnecessary forward declaration
  drm/amdgpu: hold reference to fences in amdgpu_sa_bo_new (v2)
  drm/amdgpu: fix s4 resume
  drm/amdgpu/cz: plumb pg flags through to powerplay
  drm/amdgpu/tonga: plumb pg flags through to powerplay
  drma/dmgpu: move cg and pg flags into shared headers
  drm/amdgpu: remove unused cg defines
  drm/amdgpu: add a cgs interface to fetch cg and pg flags
  drm/amd/powerplay/tonga: disable vce pg
  drm/amd/powerplay/tonga: disable uvd pg
  drm/amd/powerplay/cz: disable vce pg
  drm/amd/powerplay/cz: disable uvd pg
  drm/amdgpu: be consistent with uvd cg flags
  drm/amdgpu: clean up vce pg flags for cz/st
  drm/amdgpu: handle vce pg flags properly
  drm/amdgpu: handle uvd pg flags properly
  drm/amdgpu/dpm/ci: switch over to the common pcie caps interface
  drm/amdgpu/cik: don't mess with aspm if gpu is root bus
  ...

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull crypto fix from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
EVM: Use crypto_memneq() for digest comparisons

Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"This has a few fixes from Filipe, along with a readdir fix from Dave
  that we've been testing for some time"

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: properly set the termination value of ctx->pos in readdir
  Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
  Btrfs: remove no longer used function extent_read_full_page_nolock()
  Btrfs: fix page reading in extent_same ioctl leading to csum errors
  Btrfs: fix invalid page accesses in extent_same (dedup) ioctl

Merge tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fix from Dve Chinner:
"This contains a fix for an endian conversion issue in new CRC
validation in log recovery that was discovered on a ppc64 platform"

* tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: fix endianness error when checking log block crc on big endian platforms

btrfs: Introduce new mount option alias for nologreplay

Introduce new mount option alias "norecovery" for nologreplay, to keep
"norecovery" behavior the same with other filesystems.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: Introduce new mount option to disable tree log replay

Introduce a new mount option "nologreplay" to co-operate with "ro" mount
option to get real readonly mount, like "norecovery" in ext* and xfs.

Since the new parse_options() need to check new flags at remount time,
so add a new parameter for parse_options().

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: Introduce new mount option usebackuproot to replace recovery

Current "recovery" mount option will only try to use backup root.
However the word "recovery" is too generic and may be confusing for some
users.

Here introduce a new and more specific mount option, "usebackuproot" to
replace "recovery" mount option.
"Recovery" will be kept for compatibility reason, but will be
deprecated.

Also, since "usebackuproot" will only affect mount behavior and after
open_ctree() it has nothing to do with the filesystem, so clear the flag
after mount succeeded.

This provides the basis for later unified "norecovery" mount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[ dropped usebackuproot from show_mount, added note about 'recovery' to
docs ]
Signed-off-by: David Sterba <dsterba@suse.com>

Merge tag 'asoc-fix-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v4.5

A rather large batch of fixes here, almost all in the Intel driver.
The changes that got merged in this merge window for Skylake were rather
large and as well as issues that you'd expect in a large block of new
code there were some problems created for older processors which needed
fixing up. Things are largely settling down now hopefully.

EVM: Use crypto_memneq() for digest comparisons

This patch fixes vulnerability CVE-2016-2085. The problem exists
because the vm_verify_hmac() function includes a use of memcmp().
Unfortunately, this allows timing side channel attacks; specifically
a MAC forgery complexity drop from 2^128 to 2^12. This patch changes
the memcmp() to the cryptographically safe crypto_memneq().

Reported-by: Xiaofei Rex Guo <xiaofei.rex.guo@intel.com>
Signed-off-by: Ryan Ware <ware@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>

ARC: mm: Introduce explicit super page size support

MMUv4 supports 2 concurrent page sizes: Normal and Super [4K to 16M]

So far Linux supported a single super page size for a given Normal page,
depending on the software page walking address split.
e.g. we had 11:8:13 address split for 8K page, which meant super page
was 2 ^(8+13) = 2M (given that THP size has to be PMD_SHIFT)

Now we turn this around, by allowing multiple Super Pages in Kconfig
(currently 2M and 16M only) and forcing page walker address split to
PGDIR_SHIFT and PAGE_SHIFT

For configs without Super page, things are same as before and
PGDIR_SHIFT can be hacked to get non default address split

The motivation for this change is a customer who needs 16M super page
and a 8K Normal page combo.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Merge tag 'phy-for-4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-linus

Kishon writes:

phy: for 4.5-rc

*) Fix error handling code in phy core [phy_power_on()]
*) phy-twl4030-usb fixes for unloading the module
*) Restrict phy-hi6220-usb to HiSilicon arm64

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>

arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI

arch/x86/built-in.o: In function `uv_bios_call':
(.text+0xeba00): undefined reference to `efi_call'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix pfn_t vs highmem

The pfn_t type uses an unsigned long to store a pfn + flags value.  On a
64-bit platform the upper 12 bits of an unsigned long are never used for
storing the value of a pfn.  However, this is not true on highmem
platforms, all 32-bits of a pfn value are used to address a 44-bit
physical address space.  A pfn_t needs to store a 64-bit value.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211
Fixes: 01c8f1c44b83 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Stuart Foster <smf.linux@ntlworld.com>
Reported-by: Julian Margetson <runaway@candw.ms>
Tested-by: Julian Margetson <runaway@candw.ms>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/locking/lockdep.c: convert hash tables to hlists

Mike said:

: CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.  e
: kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error
: message.
:
: The problem is that ubsan callbacks use spinlocks and might be called
: before lockdep is initialized.  Particularly this line in the
: reserve_ebda_region function causes problem:
:
: lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES);
:
: If i put lockdep_init() before reserve_ebda_region call in
: x86_64_start_reservations kernel loads well.

Fix this ordering issue permanently: change lockdep so that it uses
hlists for the hash tables.  Unlike a list_head, an hlist_head is in its
initialized state when it is all-zeroes, so lockdep is ready for
operation immediately upon boot - lockdep_init() need not have run.

The patch will also save some memory.

lockdep_init() and lockdep_initialized can be done away with now - a 4.6
patch has been prepared to do this.

Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE

[akpm@linux-foundation.org: s/threshhold/threshold/]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm,thp: khugepaged: call pte flush at the time of collapse

This showed up on ARC when running LMBench bw_mem tests as Overlapping
TLB Machine Check Exception triggered due to STLB entry (2M pages)
overlapping some NTLB entry (regular 8K page).

bw_mem 2m touches a large chunk of vaddr creating NTLB entries.  In the
interim khugepaged kicks in, collapsing the contiguous ptes into a
single pmd.  pmdp_collapse_flush()->flush_pmd_tlb_range() is called to
flush out NTLB entries for the ptes.  This for ARC (by design) can only
shootdown STLB entries (for pmd).  The stray NTLB entries cause the
overlap with the subsequent STLB entry for collapsed page.  So make
pmdp_collapse_flush() call pte flush interface not pmd flush.

Note that originally all thp flush call sites in generic code called
flush_tlb_range() leaving it to architecture to implement the flush for
pte and/or pmd.  Commit 12ebc1581ad11454 changed this by calling a new
opt-in API flush_pmd_tlb_range() which made the semantics more explicit
but failed to distinguish the pte vs pmd flush in generic code, which is
what this patch fixes.

Note that ARC can fixed w/o touching the generic pmdp_collapse_flush()
by defining a ARC version, but that defeats the purpose of generic
version, plus sementically this is the right thing to do.

Fixes STAR 9000961194: LMBench on AXS103 triggering duplicate TLB
exceptions with super pages

Fixes: 12ebc1581ad11454 ("mm,thp: introduce flush_pmd_tlb_range")
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.4]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/backing-dev.c: fix error path in wb_init()

We need to use post-decrement to get percpu_counter_destroy() called on
&wb->stat[0]. Moreover, the pre-decremebt would cause infinite
out-of-bounds accesses if the setup code failed at i==0.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, dax: check for pmd_none() after split_huge_pmd()

DAX implements split_huge_pmd() by clearing pmd.  This simple approach
reduces memory overhead, as we don't need to deposit page table on huge
page mapping to make split_huge_pmd() never-fail.  PTE table can be
allocated and populated later on page fault from backing store.

But one side effect is that have to check if pmd is pmd_none() after
split_huge_pmd().  In most places we do this already to deal with
parallel MADV_DONTNEED.

But I found two call sites which is not affected by MADV_DONTNEED (due
down_write(mmap_sem)), but need to have the check to work with DAX
properly.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vsprintf: kptr_restrict is okay in IRQ when 2

The kptr_restrict flag, when set to 1, only prints the kernel address
when the user has CAP_SYSLOG.  When it is set to 2, the kernel address
is always printed as zero.  When set to 1, this needs to check whether
or not we're in IRQ.

However, when set to 2, this check is unneccessary, and produces
confusing results in dmesg.  Thus, only make sure we're not in IRQ when
mode 1 is used, but not mode 2.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix filemap.c kernel doc warning

Add missing kernel-doc notation for function parameter 'gfp_mask' to fix
kernel-doc warning.

mm/filemap.c:1898: warning: No description found for parameter 'gfp_mask'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ubsan: cosmetic fix to Kconfig text

When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased
significantly (~3x). So, it sounds better to have some note in Kconfig.

And, fixed a typo.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'gpio-v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
- Probe errorpath fix for the Altera
- irqchip ofnode pointer added to the DaVinci driver
- controller instance number correction for DaVinci

* tag 'gpio-v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: davinci: Fix the number of controllers allocated
  gpio: davinci: Add the missing of-node pointer
  gpio: gpio-altera: Remove gpiochip on probe failure.

Merge tag 'platform-drivers-x86-v4.5-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
"Just two small fixes for the 4.5-rc cycle:

  intel_scu_ipcutil:
   - underflow in scu_reg_access()

  intel-hid:
   - fix incorrect entries in intel_hid_keymap"

* tag 'platform-drivers-x86-v4.5-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  intel_scu_ipcutil: underflow in scu_reg_access()
  intel-hid: fix incorrect entries in intel_hid_keymap

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix BPF handling of branch offset adjustmnets on backjumps, from
    Daniel Borkmann.

2) Make sure selinux knows about SOCK_DESTROY netlink messages, from
    Lorenzo Colitti.

3) Fix openvswitch tunnel mtu regression, from David Wragg.

4) Fix ICMP handling of TCP sockets in syn_recv state, from Eric
    Dumazet.

5) Fix SCTP user hmacid byte ordering bug, from Xin Long.

6) Fix recursive locking in ipv6 addrconf, from Subash Abhinov
    Kasiviswanathan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  bpf: fix branch offset adjustment on backjumps after patching ctx expansion
  vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
  geneve: Relax MTU constraints
  vxlan: Relax MTU constraints
  flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  of: of_mdio: Add marvell, 88e1145 to whitelist of PHY compatibilities.
  selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tables
  sctp: translate network order to host order when users get a hmacid
  enic: increment devcmd2 result ring in case of timeout
  tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
  net:Add sysctl_max_skb_frags
  tcp: do not drop syn_recv on all icmp reports
  ipv6: fix a lockdep splat
  unix: correctly track in-flight fds in sending process user_struct
  update be2net maintainers' email addresses
  dwc_eth_qos: Reset hardware before PHY start
  ipv6: addrconf: Fix recursive spin lock call

IB/core: Fix reading capability mask of the port info class

When checking specific attribute from a bit mask, need to use bitwise
AND and not logical AND, fixed that.

Fixes: 145d9c541032 ('IB/core: Display extended counter set if
available')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

net/mlx4: fix some error handling in mlx4_multi_func_init()

The while loop after err_slaves should use post-decrement; otherwise
we'll fail to do the kfrees for i==0, and will run into out-of-bounds
accesses if the setup above failed already at i==0.

[I'm not sure why one even bothers populating the ->vlan_filter array:
mlx4.h isn't #included by anything outside
drivers/net/ethernet/mellanox/mlx4/, and "git grep -C2 -w vlan_filter
drivers/net/ethernet/mellanox/mlx4/" seems to suggest that the
vlan_filter elements aren't used at all.]

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Doug Ledford <dledford@redhat.com>

Revert "mmc: block: don't use parameter prefix if built as module"

This reverts commit 829b6962f7e3cfc06f7c5c26269fd47ad48cf503.

Revert this change as it causes a sysfs path to change and therefore
introduces and ABI regression. More precisely Android's vold is not being
able to access /sys/module/mmcblk/parameters/perdev_minors any more, since
the path becomes changed to: "/sys/module/mmc_block/..."

Fixes: 829b6962f7e3 ("mmc: block: don't use parameter prefix if built as
module")
Reported-by: John Stultz <john.stultz@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

btrfs: teach print_leaf about temporary item subtypes

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: teach print_leaf about permanent item subtypes

Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: switch dev stats item to the permanent item key

Signed-off-by: David Sterba <dsterba@suse.com>