Hannes Reinecke [Fri, 13 Jun 2014 12:01:45 +0000 (14:01 +0200)]
scsi_error: set DID_TIME_OUT correctly
Any callbacks in scsi_timeout_out() might return BLK_EH_RESET_TIMER,
in which case we should leave the result alone and not set
DID_TIME_OUT, as the command didn't actually timeout.
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com>
After scsi_try_to_abort_cmd returns, the eh_abort_handler may have
already found that the command has completed in the device, causing
the host_byte to be nonzero (e.g. it could be DID_ABORT). When
this happens, ORing DID_TIME_OUT into the host byte will corrupt
the result field and initiate an unwanted command retry.
Cc: stable@vger.kernel.org Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
[Fix all instances according to review comments. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
Linus Torvalds [Tue, 24 Jun 2014 00:05:28 +0000 (17:05 -0700)]
Merge tag 'compress-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull compress bugfixes from Greg KH:
"Here are two bugfixes for some compression functions that resolve some
errors when uncompressing some pathalogical data. Both were found by
Don A Bailey"
* tag 'compress-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
lz4: ensure length does not wrap
lzo: properly check for overruns
Linus Torvalds [Mon, 23 Jun 2014 23:48:14 +0000 (16:48 -0700)]
Merge branch 'akpm' (patches from Andrew Morton)
Merge fixes from Andrew Morton:
"The nmi patch and watchdog patch aren't actually fixes - they're
features which needed a few last-minutes touchups.
Otherwise, a rather large batch of fixes - ocfs2 review takes a while
and I got distracted and missed last week's batch"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
ocfs2/dlm: do not purge lockres that is queued for assert master
ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod
ocfs2: fix a tiny race when running dirop_fileop_racer
ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()
ocfs2: refcount: take rw_lock in ocfs2_reflink
ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"
ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn
ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()
mm: fix crashes from mbind() merging vmas
checkpatch: reduce false positives when checking void function return statements
ia64: arch/ia64/include/uapi/asm/fcntl.h needs personality.h
DMA, CMA: fix possible memory leak
slab: fix oops when reading /proc/slab_allocators
shmem: fix faulting into a hole while it's punched
mm: let mm_find_pmd fix buggy race with THP fault
mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()
kernel/watchdog.c: print traces for all cpus on lockup detection
nmi: provide the option to issue an NMI back trace to every cpu but current
Documentation/accounting/getdelays.c: add missing null-terminate after strncpy call
...
Xue jiufei [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2/dlm: do not purge lockres that is queued for assert master
When workqueue is delayed, it may occur that a lockres is purged while it
is still queued for master assert. it may trigger BUG() as follows.
N1 N2
dlm_get_lockres()
->dlm_do_master_requery
is the master of lockres,
so queue assert_master work
dlm_thread() start running
and purge the lockres
dlm_assert_master_worker()
send assert master message
to other nodes
receiving the assert_master
message, set master to N2
dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID,
if it is RECOVERY lockres, it triggers the BUG().
Another BUG() is triggered when N3 become the new master and send
assert_master to N1, N1 will trigger the BUG() because owner doesn't
match. So we should not purge lockres when it is queued for assert
master.
Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jiangyiwen [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
The following case may lead to endless loop during umount.
node A node B node C node D
umount volume,
migrate lockres1
to B
want to lock lockres1,
send
MASTER_REQUEST_MSG
to C
init block mle
send
MIGRATE_REQUEST_MSG
to C
find a block
mle, and then
return
DLM_MIGRATE_RESPONSE_MASTERY_REF
to B
set C in refmap
umount successfully
try to umount, endless
loop occurs when migrate
lockres1 since C is in
refmap
So we can fix this endless loop case by only returning
DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
MIGRATE_REQUEST_MSG.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jiangyiwen [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod
When the call to ocfs2_add_entry() failed in ocfs2_symlink() and
ocfs2_mknod(), iput() will not be called during dput(dentry) because no
d_instantiate(), and this will lead to umount hung.
Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yiwen Jiang [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2: fix a tiny race when running dirop_fileop_racer
When running dirop_fileop_racer we found a dead lock case.
2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create
/race/16/1 in the filesystem, and let the inode number of dir 16 is less
than the inode number of dir race.
Node A Node B
mv /race/16/1 /race/
right after Node A has got the
EX mode of /race/16/, and tries to
get EX mode of /race
ls /race/16/
In this case, Node A has got the EX mode of /race/16/, and wants to get EX
mode of /race/. Node B has got the PR mode of /race/, and wants to get
the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead
lock happens.
This patch fixes this case by locking in ancestor order before trying
inode number order.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xue jiufei [Mon, 23 Jun 2014 20:22:08 +0000 (13:22 -0700)]
ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()
When a lockres in purge list but is still in use, it should be moved to
the tail of purge list. dlm_thread will continue to check next lockres in
purge list. However, code list_move_tail(&dlm->purge_list,
&lockres->purge) will do *no* movements, so dlm_thread will purge the same
lockres in this loop again and again. If it is in use for a long time,
other lockres will not be processed.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It crashes at
BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED));
in ocfs2_direct_IO_get_blocks.
ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in
ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken
during the time before (or inside) ocfs2_prepare_inode_for_write() and after
ocfs2_direct_IO_get_blocks().
It can happen in this case:
Node A(which crashes) Node B
------------------------ ---------------------------
ocfs2_file_aio_write
ocfs2_prepare_inode_for_write
ocfs2_inode_lock
...
ocfs2_inode_unlock
#no refcount found
.... ocfs2_reflink
ocfs2_inode_lock
...
ocfs2_inode_unlock
#now, refcount flag set on extent
...
flush change to disk
ocfs2_direct_IO_get_blocks
ocfs2_get_clusters
#extent map miss
#buffer_head miss
read extents from disk
found refcount flag on extent
crash..
Fix:
Take rw_lock in ocfs2_reflink path
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xue jiufei [Mon, 23 Jun 2014 20:22:08 +0000 (13:22 -0700)]
ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"
75f82eaa502c ("ocfs2: fix NULL pointer dereference when dismount and
ocfs2rec simultaneously") may cause umount hang while shutting down
truncate log.
The situation is as followes:
ocfs2_dismout_volume
-> ocfs2_recovery_exit
-> free osb->recovery_map
-> ocfs2_truncate_shutdown
-> lock global bitmap inode
-> ocfs2_wait_for_recovery
-> check whether osb->recovery_map->rm_used is zero
Because osb->recovery_map is already freed, rm_used can be any other
values, so it may yield umount hang.
Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two node cluster and both nodes hold a lock at PR level and both want to
convert to EX at the same time. Master node 1 has sent BAST and then
closes the connection due to idletime out. Node 0 receives BAST, sends
unlock req with cancel flag but gets error -ENOTCONN. The problem is
this error is ignored in dlm_send_remote_unlock_request() on the
**incorrect** assumption that the master is dead. See NOTE in comment
why it returns DLM_NORMAL. Upon getting DLM_NORMAL, node 0 proceeds to
sends convert (without cancel flg) which fails with -ENOTCONN. waits 5
sec and resends.
This time gets DLM_IVLOCKID from the master since lock not found in
grant, it had been moved to converting queue in response to conv PR->EX
req. No way out.
Node 1 (master) Node 0
============== ======
lock mode PR PR
convert PR -> EX
mv grant -> convert and que BAST
...
<-------- convert PR -> EX
convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
...
BAST (want PR -> NL)
------------------>
...
idle timout, conn closed
...
In response to BAST,
sends unlock with cancel convert flag
gets -ENOTCONN. Ignores and
sends remote convert request
gets -ENOTCONN, waits 5 Sec, retries
...
reconnects
<----------------- convert req goes through on next try
does not find lock on grant que
status DLM_IVLOCKID
------------------>
...
No way out. Fix is to keep retrying unlock with cancel flag until it
succeeds or the master dies.
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alex chen [Mon, 23 Jun 2014 20:22:07 +0000 (13:22 -0700)]
ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()
There are two files a and b in dir /mnt/ocfs2.
node A node B
mv a b
In ocfs2_rename(), after calling
ocfs2_orphan_add(), the inode of
file b will be added into orphan
dir.
If ocfs2_update_entry() fails,
ocfs2_rename return error and mv
operation fails. But file b still
exists in the parent dir.
ocfs2_queue_orphan_scan
-> ocfs2_queue_recovery_completion
-> ocfs2_complete_recovery
-> ocfs2_recover_orphans
The inode of the file b will be
put with iput().
ocfs2_evict_inode
-> ocfs2_delete_inode
-> ocfs2_wipe_inode
-> ocfs2_remove_inode
OCFS2_VALID_FL in the inode
i_flags will be cleared.
The file b still can be accessed
on node B.
ls /mnt/ocfs2
When first read the file b with
ocfs2_read_inode_block(). It will
validate the inode using
ocfs2_validate_inode_block().
Because OCFS2_VALID_FL not set in
the inode i_flags, so the file
system will be readonly.
So we should add inode into orphan dir after updating entry in
ocfs2_rename().
Signed-off-by: alex.chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 23 Jun 2014 20:22:07 +0000 (13:22 -0700)]
mm: fix crashes from mbind() merging vmas
In v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
introduced vma merging to mbind(), but it should have also changed the
convention of passing start vma from queue_pages_range() (formerly
check_range()) to new_vma_page(): vma merging may have already freed
that structure, resulting in BUG at mm/mempolicy.c:1738 and probably
worse crashes.
fs/notify/fanotify/fanotify_user.c: In function 'SYSC_fanotify_init':
fs/notify/fanotify/fanotify_user.c:726: error: implicit declaration of function 'personality'
fs/notify/fanotify/fanotify_user.c:726: error: 'PER_LINUX32' undeclared (first use in this function)
fs/notify/fanotify/fanotify_user.c:726: error: (Each undeclared identifier is reported only once
fs/notify/fanotify/fanotify_user.c:726: error: for each function it appears in.)
Joonsoo Kim [Mon, 23 Jun 2014 20:22:06 +0000 (13:22 -0700)]
slab: fix oops when reading /proc/slab_allocators
Commit b1cb0982bdd6 ("change the management method of free objects of
the slab") introduced a bug on slab leak detector
('/proc/slab_allocators'). This detector works like as following
decription.
1. traverse all objects on all the slabs.
2. determine whether it is active or not.
3. if active, print who allocate this object.
but that commit changed the way how to manage free objects, so the logic
determining whether it is active or not is also changed. In before, we
regard object in cpu caches as inactive one, but, with this commit, we
mistakenly regard object in cpu caches as active one.
This intoduces kernel oops if DEBUG_PAGEALLOC is enabled. If
DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who
corrupt free memory in the slab. It unmaps page table mapping if object
is free and map it if object is active. When slab leak detector check
object in cpu caches, it mistakenly think this object active so try to
access object memory to retrieve caller of allocation. At this point,
page table mapping to this object doesn't exist, so oops occurs.
Following is oops message reported from Dave.
It blew up when something tried to read /proc/slab_allocators
(Just cat it, and you should see the oops below)
To fix the problem, I introduce an object status buffer on each slab.
With this, we can track object status precisely, so slab leak detector
would not access active object and no kernel oops would occur. Memory
overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK
which is mainly used for debugging, so memory overhead isn't big
problem.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 23 Jun 2014 20:22:06 +0000 (13:22 -0700)]
shmem: fix faulting into a hole while it's punched
Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.
It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.
Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I believe this comes about because, whereas collapsing and splitting THP
functions take anon_vma lock in write mode (which excludes concurrent
rmap walks), faulting THP functions (write protection and misplaced
NUMA) do not - and mostly they do not need to.
But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
an instant (indeed, for a long instant, given the inter-CPU TLB flush in
there), leaves *pmd neither present not trans_huge.
Which can confuse a concurrent rmap walk, as when removing migration
ptes, seen in the dumped trace. Although that rmap walk has a 4k page
to insert, anon_vmas containing THPs are in no way segregated from
4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
that instant when a trans_huge pmd is temporarily absent.
I don't think we need strengthen the locking at the THP end: it's easily
handled with an ACCESS_ONCE() before testing both conditions.
And since mm_find_pmd() had only one caller who wanted a THP rather than
a pmd, let's slightly repurpose it to fail when it hits a THP or
non-present pmd, and open code split_huge_page_address() again.
Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Dave Jones <davej@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 23 Jun 2014 20:22:05 +0000 (13:22 -0700)]
mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()
Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC oops
in copy_page_rep() called from copy_user_huge_page() called from
do_huge_pmd_wp_page().
I believe this is a DEBUG_PAGEALLOC false positive, due to the source
page being split, and a tail page freed, while copy is in progress; and
not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will
prevent a miscopy from being made visible.
Fix by adding get_user_huge_page() and put_user_huge_page(): reducing to
the usual get_page() and put_page() on head page in the usual config;
but get and put references to all of the tail pages when
DEBUG_PAGEALLOC.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaron Tomlin [Mon, 23 Jun 2014 20:22:05 +0000 (13:22 -0700)]
kernel/watchdog.c: print traces for all cpus on lockup detection
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.
Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.
On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e. the owner/holder of the contended resource) is
reported to the user. Often this information has proven to be
insufficient to assist debugging efforts.
To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting. This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.
If NMI is not supported we just revert back to the old method. A sysctl
and boot-time parameter is available to toggle this feature.
[dzickus@redhat.com: add CONFIG_SMP in certain areas]
[akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
[mq@suse.cz: fix warning] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaron Tomlin [Mon, 23 Jun 2014 20:22:05 +0000 (13:22 -0700)]
nmi: provide the option to issue an NMI back trace to every cpu but current
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current. For
instance if one was previously captured recently.
This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.
Patch x86 and sparc to support new routine.
[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.
This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity. If a value in the range [1, 7]
is written, the sysctl will return EINVAL.
This successfully solves the divide by zero issue at the same time.
Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Oleg Drokin <green@linuxhacker.ru> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
What happened is after updating the watchdog_thresh, the lockup detector
is restarted to utilize the new value. Part of this process involved
disabling preemption. Once preemption was disabled, perf tried to
allocate a new event (as part of the restart). This caused the above
BUG_ON as you can't sleep with preemption disabled.
The preemption restriction seemed agressive as we are not doing anything
on that particular cpu, but with all the online cpus (which are
protected by the get_online_cpus lock). Remove the restriction and the
BUG_ON goes away.
Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reported-by: Peter Wu <peter@lekensteyn.nl> Tested-by: Peter Wu <peter@lekensteyn.nl> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove one inactive maintainer, add two new ones and update my email
address. Plus add Andrew. And fix the glob to include files like
mm/slab_common.c
Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry
There's a race between fork() and hugepage migration, as a result we try
to "dereference" a swap entry as a normal pte, causing kernel panic.
The cause of the problem is that copy_hugetlb_page_range() can't handle
"swap entry" family (migration entry and hwpoisoned entry) so let's fix
it.
Hugh Dickins [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
tmpfs: ZERO_RANGE and COLLAPSE_RANGE not currently supported
I was well aware of FALLOC_FL_ZERO_RANGE and FALLOC_FL_COLLAPSE_RANGE
support being added to fallocate(); but didn't realize until now that I
had been too stupid to future-proof shmem_fallocate() against new
additions. -EOPNOTSUPP instead of going on to ordinary fallocation.
David Rientjes [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
mm, hotplug: probe interface is available on several platforms
Documentation/memory-hotplug.txt incorrectly states that the memory
driver "probe" interface is only supported on powerpc and is vague about
its application on x86. Clarify the platforms that make this interface
available if memory hotplug is enabled.
Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Tesarik [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
kexec: save PG_head_mask in VMCOREINFO
To allow filtering of huge pages, makedumpfile must be able to identify
them in the dump. This can be done by checking the appropriate page
flag, so communicate its value to makedumpfile through the VMCOREINFO
interface.
There's only one small catch. Depending on how many page flags are
available on a given architecture, this bit can be called PG_head or
PG_compound.
I sent a similar patch back in 2012, but Eric Biederman did not like
using an #ifdef. So, this time I'm adding a common symbol
(PG_head_mask) instead.
See https://lkml.org/lkml/2012/11/28/91 for the previous version.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Shaohua Li <shli@kernel.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Srivatsa S. Bhat [Mon, 23 Jun 2014 20:22:02 +0000 (13:22 -0700)]
CPU hotplug, smp: flush any pending IPI callbacks before CPU offline
There is a race between the CPU offline code (within stop-machine) and
the smp-call-function code, which can lead to getting IPIs on the
outgoing CPU, *after* it has gone offline.
Specifically, this can happen when using
smp_call_function_single_async() to send the IPI, since this API allows
sending asynchronous IPIs from IRQ disabled contexts. The exact race
condition is described below.
During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
the other CPUs disable their local interrupts. Due to this, we can
encounter a situation in which an IPI is sent by one of the other CPUs
to the outgoing CPU (while it is *still* online), but the outgoing CPU
ends up noticing it only *after* it has gone offline.
CPU 1 CPU 2
(Online CPU) (CPU going offline)
Enter _PREPARE stage Enter _PREPARE stage
Enter _DISABLE_IRQ stage
=
Got a device interrupt, and | Didn't notice the IPI
the interrupt handler sent an | since interrupts were
IPI to CPU 2 using | disabled on this CPU.
smp_call_function_single_async() |
=
Enter _DISABLE_IRQ stage
Enter _RUN stage Enter _RUN stage
=
Busy loop with interrupts | Invoke take_cpu_down()
disabled. | and take CPU 2 offline
=
Enter _EXIT stage Enter _EXIT stage
Re-enable interrupts Re-enable interrupts
The pending IPI is noted
immediately, but alas,
the CPU is offline at
this point.
This of course, makes the smp-call-function IPI handler code running on
CPU 2 unhappy and it complains about "receiving an IPI on an offline
CPU".
One real example of the scenario on CPU 1 is the block layer's
complete-request call-path:
However, if we look closely, the block layer does check that the target
CPU is online before firing the IPI. So in this case, it is actually
the unfortunate ordering/timing of events in the stop-machine phase that
leads to receiving IPIs after the target CPU has gone offline.
In reality, getting a late IPI on an offline CPU is not too bad by
itself (this can happen even due to hardware latencies in IPI
send-receive). It is a bug only if the target CPU really went offline
without executing all the callbacks queued on its list. (Note that a
CPU is free to execute its pending smp-call-function callbacks in a
batch, without waiting for the corresponding IPIs to arrive for each one
of those callbacks).
So, fixing this issue can be broken up into two parts:
1. Ensure that a CPU goes offline only after executing all the
callbacks queued on it.
2. Modify the warning condition in the smp-call-function IPI handler
code such that it warns only if an offline CPU got an IPI *and* that
CPU had gone offline with callbacks still pending in its queue.
Achieving part 1 is straight-forward - just flush (execute) all the
queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
including those callbacks for which the source CPU's IPIs might not have
been received on the outgoing CPU yet. Once we do this, an IPI that
arrives late on the CPU going offline (either due to the race mentioned
above, or due to hardware latencies) will be completely harmless, since
the outgoing CPU would have executed all the queued callbacks before
going offline.
Overall, this fix (parts 1 and 2 put together) additionally guarantees
that we will see a warning only when the *IPI-sender code* is buggy -
that is, if it queues the callback _after_ the target CPU has gone
offline.
[1]. The CPU_DYING part needs a little more explanation: by the time we
execute the CPU_DYING notifier callbacks, the CPU would have already
been marked offline. But we want to flush out the pending callbacks at
this stage, ignoring the fact that the CPU is offline. So restructure
the IPI handler code so that we can by-pass the "is-cpu-offline?" check
in this particular case. (Of course, the right solution here is to fix
CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
notifiers, but this requires a lot of audit to ensure that this change
doesn't break any existing code; hence lets go with the solution
proposed above until that is done).
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sachin Kamat <sachin.kamat@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Miao [Mon, 23 Jun 2014 20:22:02 +0000 (13:22 -0700)]
mm: nommu: per-thread vma cache fix
mm could be removed from current task struct, using previous vma->vm_mm
It will crash on blackfin after updated to Linux 3.15. The commit "mm:
per-thread vma caching" caused the crash. mm could be removed from
current task struct before
Given some pathologically compressed data, lz4 could possibly decide to
wrap a few internal variables, causing unknown things to happen. Catch
this before the wrapping happens and abort the decompression.
The lzo decompressor can, if given some really crazy data, possibly
overrun some variable types. Modify the checking logic to properly
detect overruns before they happen.
Reported-by: "Don A. Bailey" <donb@securitymouse.com> Tested-by: "Don A. Bailey" <donb@securitymouse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sun, 22 Jun 2014 05:01:15 +0000 (19:01 -1000)]
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c new drivers from Wolfram Sang:
"Here is a pull request from i2c hoping for the "new driver" rule.
Originally, I wanted to send this request during the merge window, but
code checkers with very recent additions complained, so a few fixups
were needed. So, some more time went by and I merged rc1 to get a
stable base"
So the "new driver" rule is really about drivers that people absolutely
need for the kernel to work on new hardware, which is not so much the
case for i2c. So I considered not pulling this, but eventually
relented.
Just for FYI: the whole (and only) point of "new drivers" is not that
new drivers cannot regress things (they can, and they have - by
triggering badly tested code on machines that never triggered that code
before), but because they can bring to life machines that otherwise
wouldn't be useful at all without the drivers.
So the new driver rule is for essential things that actual consumers
would care about, ie devices like networking or disk drivers that matter
to normal people (not server people - they run old kernels anyway, so
mainlining new drivers is irrelevant for them).
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: sun6-p2wi: fix call to snprintf
i2c: rk3x: add NULL entry to the end of_device_id array
i2c: sun6i-p2wi: use proper return value in probe
i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller support
i2c: sunxi: add P2WI DT bindings documentation
i2c: rk3x: add driver for Rockchip RK3xxx SoC I2C adapter
Linus Torvalds [Sun, 22 Jun 2014 02:40:30 +0000 (16:40 -1000)]
Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux
Pull file locking fixes from Jeff Layton:
"File locking related bugfixes
Nothing too earth-shattering here. A fix for a potential regression
due to a patch in pile #1, and the addition of a memory barrier to
prevent a race condition between break_deleg and generic_add_lease"
* tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux:
locks: set fl_owner for leases back to current->files
locks: add missing memory barrier in break_deleg
Linus Torvalds [Sun, 22 Jun 2014 02:38:16 +0000 (16:38 -1000)]
Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild fixes from Michal Marek:
"There are three fixes for regressions caused by the relative paths
series: deb-pkg, tar-pkg and *docs did not work with O=.
Plus, there is a fix for the linux-headers deb package and a fixed
typo. These are not regression fixes but are safe enough"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: fix a typo in a kbuild document
builddeb: fix missing headers in linux-headers package
Documentation: Fix DocBook build with relative $(srctree)
kbuild: Fix tar-pkg with relative $(objtree)
deb-pkg: Fix for relative paths
Linus Torvalds [Sun, 22 Jun 2014 00:21:43 +0000 (14:21 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This fixes some lockups in btrfs reported with rc1. It probably has
some performance impact because it is backing off our spinning locks
more often and switching to a blocking lock. I'll be able to nail
that down next week, but for now I want to get the lockups taken care
of.
Otherwise some more stack reduction and assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix wrong error handle when the device is missing or is not writeable
Btrfs: fix deadlock when mounting a degraded fs
Btrfs: use bio_endio_nodec instead of open code
Btrfs: fix NULL pointer crash when running balance and scrub concurrently
btrfs: Skip scrubbing removed chunks to avoid -ENOENT.
Btrfs: fix broken free space cache after the system crashed
Btrfs: make free space cache write out functions more readable
Btrfs: remove unused wait queue in struct extent_buffer
Btrfs: fix deadlocks with trylock on tree nodes
Linus Torvalds [Sun, 22 Jun 2014 00:20:38 +0000 (14:20 -1000)]
Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
"Fixes for a new regression from the xdr encoding rewrite, and a
delegation problem we've had for a while (made somewhat more annoying
by the vfs delegation support added in 3.13)"
* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
NFSD: fix bug for readdir of pseudofs
NFSD: Don't hand out delegations for 30 seconds after recalling them.
Linus Torvalds [Sat, 21 Jun 2014 17:07:17 +0000 (07:07 -1000)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This is larger than usual: the main reason are the ARM symbol lookup
speedups that came in late and were hard to resist.
There's also a kprobes fix and various tooling fixes, plus the minimal
re-enablement of the mmap2 support interface"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/kprobes: Fix build errors and blacklist context_track_user
perf tests: Add test for closing dso objects on EMFILE error
perf tests: Add test for caching dso file descriptors
perf tests: Allow reuse of test_file function
perf tests: Spawn child for each test
perf tools: Add dso__data_* interface descriptons
perf tools: Allow to close dso fd in case of open failure
perf tools: Add file size check and factor dso__data_read_offset
perf tools: Cache dso data file descriptor
perf tools: Add global count of opened dso objects
perf tools: Add global list of opened dso objects
perf tools: Add data_fd into dso object
perf tools: Separate dso data related variables
perf tools: Cache register accesses for unwind processing
perf record: Fix to honor user freq/interval properly
perf timechart: Reflow documentation
perf probe: Improve error messages in --line option
perf probe: Improve an error message of perf probe --vars mode
perf probe: Show error code and description in verbose mode
perf probe: Improve error message for unknown member of data structure
...
Linus Torvalds [Sat, 21 Jun 2014 17:06:02 +0000 (07:06 -1000)]
Merge branch 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rtmutex fixes from Thomas Gleixner:
"Another three patches to make the rtmutex code more robust. That's
the last urgent fallout from the big futex/rtmutex investigation"
* 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Plug slow unlock race
rtmutex: Detect changes in the pi lock chain
rtmutex: Handle deadlock detection smarter
Linus Torvalds [Sat, 21 Jun 2014 16:47:01 +0000 (06:47 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
"A couple of bug fixes, a debug change for qdio, an update for the
default config, and one small extension.
The watchdog module based on diagnose 0x288 is converted to the
watchdog API and it now works under LPAR as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ccwgroup: use ccwgroup_ungroup wrapper
s390/ccwgroup: fix an uninitialized return code
s390/ccwgroup: obtain extra reference for asynchronous processing
qdio: Keep device-specific dbf entries
s390/compat: correct ucontext layout for high gprs
s390/cio: set device name as early as possible
s390: update default configuration
s390: avoid format strings leaking into names
s390/airq: silence lockdep warning
s390/watchdog: add support for LPAR operation (diag288)
s390/watchdog: use watchdog API
s390/sclp_vt220: Enable ASCII console per default
s390/qdio: replace shift loop by ilog2
s390/cio: silence lockdep warning
s390/uaccess: always load the kernel ASCE after task switch
s390/ap_bus: Make modules parameters visible in sysfs
Linus Torvalds [Sat, 21 Jun 2014 16:45:54 +0000 (06:45 -1000)]
Merge tag 'for-linus' of git://github.com/gxt/linux
Pull UniCore32 bug fixes from Guan Xuetao:
"This includes bugfixes to make unicore32 successfully build under
defconfig, and some changes for allmodconfig (though not finished)"
* tag 'for-linus' of git://github.com/gxt/linux:
unicore32: Remove ARCH_HAS_CPUFREQ config option
UniCore32: Change git tree location information in MAINTAINERS
arch: unicore32: ksyms: export '__cpuc_coherent_kern_range' to avoid compiling failure
arch: unicore32: ksyms: export 'pm_power_off' to avoid compiling failure.
arch: unicore32: ksyms: export additional find_first_*() to avoid compiling failure
arch:unicore32:mm: add devmem_is_allowed() to support STRICT_DEVMEM
unicore32: include: asm: add missing ')' for PAGE_* macros in pgtable.h
arch/unicore32/kernel/setup.c: add generic 'screen_info' to avoid compiling failure
drivers: scsi: mvsas: fix compiling issue by adding 'MVS_' for "enum pci_interrupt_cause"
arch: unicore32: kernel: ksyms: remove 'bswapsi2' and 'muldi3' to avoid compiling failure
arch/unicore32/kernel/ksyms.c: remove 2 export symbols to avoid compiling failure
drivers/rtc/rtc-puv3.c: remove "&dev->" for typo issue MIME-Version: 1.0
drivers/rtc/rtc-puv3.c: use dev_dbg() instead of dev_debug() for typo issue
arch/unicore32/include/asm/io.h: add readl_relaxed() generic definition
arch/unicore32/include/asm/ptrace.h: add generic definition for profile_pc()
arch/unicore32/mm/alignment.c: include "asm/pgtable.h" to avoid compiling error
arch/unicore32/kernel/clock.c: add readl() and writel() for 'PM_' macros
arch/unicore32/kernel/module.c: use __vmalloc_node_range() instead of __vmalloc_area()
arch/unicore32/kernel/ksyms.c: remove several undefined exported symbols
Linus Torvalds [Sat, 21 Jun 2014 16:43:19 +0000 (06:43 -1000)]
Merge tag 'char-misc-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are 3 patches, one a revert of the UIO patch you objected to in
3.16-rc1 and that no one wanted to defend, a w1 driver bugfix, and a
MAINTAINERS update for the vmware balloon driver.
All of these, except for the MAINTAINERS update which just got added,
have been in linux-next just fine"
* tag 'char-misc-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
MAINTAINERS: add entry for VMware Balloon driver
w1: mxc_w1: Fix incorrect "presence" status
Revert "uio: fix vma io range check in mmap"
Linus Torvalds [Sat, 21 Jun 2014 16:42:40 +0000 (06:42 -1000)]
Merge tag 'staging-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are a few fixes for staging and iio drivers that resolve issues
reported in 3.16-rc1.
All have been in linux-next just fine"
* tag 'staging-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
imx-drm: parallel-display: Fix DPMS default state.
staging: android: timed_output: fix use after free of dev
staging: comedi: addi_apci_1564: add addi_watchdog dependency
staging: rtl8723au: Reference correct firmwarefiles with MODULE_FIRMWARE()
staging: rtl8723au: Request correct firmware file for A-cut parts
iio: adc: checking for NULL instead of IS_ERR() in probe
iio: adc: at91: signedness bug in at91_adc_get_trigger_value_by_name()
iio: mxs-lradc: fix divider
iio: Fix endianness issue in ak8975_read_axis()
staging/iio: IIO_SIMPLE_DUMMY_BUFFER neds IIO_BUFFER
twl4030-madc: Request processed values in twl4030_get_madc_conversion
staging: iio: tsl2x7x_core: fix proximity treshold
iio: Fix two mpl3115 issues in measurement conversion
iio: hid-sensors: Get feature report from sensor hub after changing power state
Linus Torvalds [Sat, 21 Jun 2014 16:41:42 +0000 (06:41 -1000)]
Merge tag 'tty-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial bugfixes from Greg KH:
"Here are some tty / serial driver bugfixes for 3.16-rc2 that resolve
some reported issues. The samsung driver build error itself has been
reported by a bunch of people, sorry about that one. The others are
all tiny and everyone seems to like them in linux-next so far"
* tag 'tty-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty/serial: fix 8250 early console option passing to regular console
tty: Correct INPCK handling
serial: Fix IGNBRK handling
serial: samsung: Fix build error
Linus Torvalds [Sat, 21 Jun 2014 16:41:07 +0000 (06:41 -1000)]
Merge tag 'usb-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes for 3.16-rc2 that resolve some reported
issues. All of these have been in linux-next for a while with no
problems"
* tag 'usb-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: usbtest: add a timeout for scatter-gather tests
USB: EHCI: avoid BIOS handover on the HASEE E200
usb: fix hub-port pm_runtime_enable() vs runtime pm transitions
usb: quiet peer failure warning, disable poweroff
usb: improve "not suspended yet" message in hub_suspend()
xhci: Fix sleeping with IRQs disabled in xhci_stop_device()
usb: fix ->update_hub_device() vs hdev->maxchild
Linus Torvalds [Fri, 20 Jun 2014 04:58:57 +0000 (18:58 -1000)]
Merge tag 'pm+acpi-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are fixes mostly (ia64 regression related to the ACPI
enumeration of devices, cpufreq regressions, fix for I2C controllers
included in Intel SoCs, mvebu cpuidle driver fix related to sysfs)
plus additional kernel command line arguments from Kees to make it
possible to build kernel images with hibernation and the kernel
address space randomization included simultaneously, a new ACPI
battery driver quirk for a system with a broken BIOS and a couple of
ACPI core cleanups.
Specifics:
- Fix for an ia64 regression introduced during the 3.11 cycle by a
commit that modified the hardware initialization ordering and made
device discovery fail on some systems.
- Fix for a build problem on systems where the cpufreq-cpu0 driver is
built-in and the cpu-thermal driver is modular from Arnd Bergmann.
- Fix for a recently introduced computational mistake in the
intel_pstate driver that leads to excessive rounding errors from
Doug Smythies.
- Fix for a failure code path in cpufreq_update_policy() that fails
to unlock the locks acquired previously from Aaron Plattner.
- Fix for the cpuidle mvebu driver to use shorter state names which
will prevent the sysfs interface from returning mangled strings.
From Gregory Clement.
- ACPI LPSS driver fix to make sure that the I2C controllers included
in BayTrail SoCs are not held in the reset state while they are
being probed from Mika Westerberg.
- New kernel command line arguments making it possible to build
kernel images with hibernation and kASLR included at the same time
and to select which of them will be used via the command line (they
are still functionally mutually exclusive, though). From Kees
Cook.
- ACPI battery driver quirk for Acer Aspire V5-573G that fails to
send battery status change notifications timely from Alexander
Mezin.
- Two ACPI core cleanups from Christoph Jaeger and Fabian Frederick"
* tag 'pm+acpi-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: mvebu: Fix the name of the states
cpufreq: unlock when failing cpufreq_update_policy()
intel_pstate: Correct rounding in busy calculation
ACPI: use kstrto*() instead of simple_strto*()
ACPI / processor replace __attribute__((packed)) by __packed
ACPI / battery: add quirk for Acer Aspire V5-573G
ACPI / battery: use callback for setting up quirks
ACPI / LPSS: Take I2C host controllers out of reset
x86, kaslr: boot-time selectable with hibernation
PM / hibernate: introduce "nohibernate" boot parameter
cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency
ACPI / ia64 / sba_iommu: Restore the working initialization ordering
Linus Torvalds [Fri, 20 Jun 2014 04:49:37 +0000 (18:49 -1000)]
Merge tag 'sound-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The significant part here is a few security fixes for ALSA core
control API by Lars. Besides that, there are a few fixes for ASoC
sigmadsp (again by Lars) for building properly, and small fixes for
ASoC rsnd, MMP, PXA and FSL, in addition to a fix for bogus WARNING in
i915/HD-audio binding"
* tag 'sound-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: control: Make sure that id->index does not overflow
ALSA: control: Handle numid overflow
ALSA: control: Don't access controls outside of protected regions
ALSA: control: Fix replacing user controls
ALSA: control: Protect user controls against concurrent access
drm/i915, HD-audio: Don't continue probing when nomodeset is given
ASoC: fsl: Fix build problem
ASoC: rsnd: fixup index of src/dst mod when capture
ASoC: fsl_spdif: Fix integer overflow when calculating divisors
ASoC: fsl_spdif: Fix incorrect usage of regmap_read()
ASoC: dapm: Make sure register value is in sync with DAPM kcontrol state
ASoC: sigmadsp: Split regmap and I2C support into separate modules
ASoC: MMP audio needs sram support
ASoC: pxa: add I2C dependencies as needed
Linus Torvalds [Fri, 20 Jun 2014 04:40:36 +0000 (18:40 -1000)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This looks bigger than it is, as one of the nouveau firmware fixes
("drm/gf100-/gr: report class data to host on fwmthd failure")
regenerates a bunch of the firmware files after changing the assembly
by a few lines, without that, its more of a
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
drm: fix uninitialized acquire_ctx fields (v2)
drm/radeon: Fix radeon_irq_kms_pflip_irq_get/put() imbalance
Revert "drm/radeon: remove drm_vblank_get|put from pflip handling"
drm/radeon: improve dvi_mode_valid
drm/radeon: update mode_valid testing for DP
drm/radeon: Use dce5/6 hdmi deep color clock setup also on dce8+
drm/nouveau/disp: fix oops in destructor with headless cards
drm/gf117/i2c: no aux channels on this chipset
drm/nouveau/doc: update the thermal documentation
drm/nouveau/pwr: fix typo in fifo wrap handling
drm/nv50/disp: fix a potential oops in supervisor handling
drm/nouveau/disp/dp: don't touch link config after success
drm/nouveau/kms: reference vblank for crtc during pageflip.
drm/gk104/fb/ram: fixups from an earlier search+replace
drm/nv50/gr: remove an unneeded write while initialising PGRAPH
drm/nv50/gr: fix overlap while zeroing zcull regions
drm/gf100-/gr: report class data to host on fwmthd failure
drm/gk104/ibus: increase various random timeouts
drm/gk104/clk: only touch divider for mode we'll be using
drm/radeon: Bypass hw lut's for > 8 bpc framebuffer scanout.
...
Linus Torvalds [Fri, 20 Jun 2014 03:56:43 +0000 (17:56 -1000)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A smaller collection of fixes for the block core that would be nice to
have in -rc2. This pull request contains:
- Fixes for races in the wait/wakeup logic used in blk-mq from
Alexander. No issues have been observed, but it is definitely a
bit flakey currently. Alternatively, we may drop the cyclic
wakeups going forward, but that needs more testing.
- Some cleanups from Christoph.
- Fix for an oops in null_blk if queue_mode=1 and softirq completions
are used. From me.
- A fix for a regression caused by the chunk size setting. It
inadvertently used max_hw_sectors instead of max_sectors, which is
incorrect, and causes hangs on btrfs multi-disk setups (where hw
sectors apparently isn't set). From me.
- Removal of WQ_POWER_EFFICIENT in the kblockd creation. This was a
recent addition as well, but it actually breaks blk-mq which relies
on strict scheduling. If the workqueue power_efficient mode is
turned on, this breaks blk-mq. From Matias.
- null_blk module parameter description fix from Mike"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: bitmap tag: fix races in bt_get() function
blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt
blk-mq: bitmap tag: fix races on shared ::wake_index fields
block: blk_max_size_offset() should check ->max_sectors
null_blk: fix softirq completions for queue_mode == 1
blk-mq: merge blk_mq_drain_queue and __blk_mq_drain_queue
blk-mq: properly drain stopped queues
block: remove WQ_POWER_EFFICIENT from kblockd
null_blk: fix name and description of 'queue_mode' module parameter
block: remove elv_abort_queue and blk_abort_flushes
Linus Torvalds [Fri, 20 Jun 2014 03:53:20 +0000 (17:53 -1000)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A first set of bug fixes that didn't make it for the merge window, and
two Kconfig cleanups that still make sense at this point.
Unfortunately, one of the two cleanups caused an unintended change in
the original version, so we had to revert one part of it and do some
more testing to ensure the rest is really fine. There was also a
last-minute rebase of the patches to remove another bad commit"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: use menuconfig for sub-arch menus
ARM: multi_v7_defconfig: re-enable SDHCI drivers
ARM: EXYNOS: Fix compilation warning
ARM: exynos: move sysram info to exynos.c
ARM: dts: Specify the NAND ECC scheme explicitly on Armada 385 DB board
ARM: dts: Specify the NAND ECC scheme explicitly on Armada 375 DB board
ARM: exynos: cleanup kconfig option display
misc: vexpress: fix error handling vexpress_syscfg_regmap_init()
ARM: Remove ARCH_HAS_CPUFREQ config option
ARM: integrator: fix section mismatch problem
ARM: mvebu: DT: fix OpenBlocks AX3-4 RAM size
ARM: samsung: make SAMSUNG_DMADEV optional
remoteproc: da8xx: don't select CMA on no-MMU
bus/arm-cci: add dependency on OF && CPU_V7
ARM: keystone requires ARM_PATCH_PHYS_VIRT
ARM: omap2: fix am43xx dependency on l2x0 cache
Stephen Boyd [Tue, 3 Jun 2014 18:30:03 +0000 (11:30 -0700)]
unicore32: Remove ARCH_HAS_CPUFREQ config option
This config exists entirely to hide the cpufreq menu from the
kernel configuration unless a platform has selected it. Nothing
is actually built if this config is 'Y' and it just leads to more
patches that add a select under a platform Kconfig so that some
other CPUfreq option can be chosen. Let's remove the option so
that we can always enable CPUfreq drivers on unicore32 platforms.
Guan Xuetao [Mon, 26 May 2014 23:53:10 +0000 (07:53 +0800)]
UniCore32: Change git tree location information in MAINTAINERS
UniCore32 git repo has moved to github.
Branch 'unicore32' is used for prepared patches, and automatically merged to linux-next.
Branch 'unicore32-working' is used for development.
Chen Gang [Tue, 27 May 2014 00:08:06 +0000 (08:08 +0800)]
arch: unicore32: ksyms: export '__cpuc_coherent_kern_range' to avoid compiling failure
flush_icache_range() is '__cpuc_coherent_kern_range' under unicore32,
and lkdtm.ko needs it. At present, '__cpuc_coherent_kern_range' is
still used by unicore32, so export it to avoid compiling failure.
The related error (with allmodconfig under unicore32):
Chen Gang [Wed, 21 May 2014 23:08:23 +0000 (07:08 +0800)]
arch/unicore32/kernel/setup.c: add generic 'screen_info' to avoid compiling failure
Add generic 'screen_info' just like another architectures have done
(e.g. tile, sh, score, ia64, hexagon, and cris).
The related error (with allmodconfig under unicore32):
LD init/built-in.o
drivers/built-in.o: In function `vgacon_save_screen':
powercap_sys.c:(.text+0x21788): undefined reference to `screen_info'
drivers/built-in.o: In function `vgacon_resize':
powercap_sys.c:(.text+0x21b54): undefined reference to `screen_info'
drivers/built-in.o: In function `vgacon_switch':
powercap_sys.c:(.text+0x21cb4): undefined reference to `screen_info'
drivers/built-in.o: In function `vgacon_init':
powercap_sys.c:(.text+0x2296c): undefined reference to `screen_info'
drivers/built-in.o: In function `vgacon_startup':
powercap_sys.c:(.text+0x22e80): undefined reference to `screen_info'
Chen Gang [Fri, 9 May 2014 01:19:39 +0000 (09:19 +0800)]
drivers: scsi: mvsas: fix compiling issue by adding 'MVS_' for "enum pci_interrupt_cause"
The direct cause is IRQ_SPI is already defined as a macro in unicore32
architecture (also, blackfin and mips architectures define it). The
related error (unicore32 with allmodconfig)
CC [M] drivers/scsi/mvsas/mv_94xx.o
In file included from drivers/scsi/mvsas/mv_94xx.c:27:
drivers/scsi/mvsas/mv_94xx.h:176: error: expected identifier before numeric constant
And IRQ_SAS_A and IRQ_SAS_B are used as 'u32' (although "enum
pci_interrupt_cause" is not used directly, now).
All together, need add 'MVS_' for "enum pci_interrupt_cause".
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xuetao Guan <gxt@mprc.pku.edu.cn> Signed-off-by: Xuetao Guan <gxt@mprc.pku.edu.cn>
Chen Gang [Wed, 21 May 2014 00:57:53 +0000 (08:57 +0800)]
arch/unicore32/kernel/ksyms.c: remove 2 export symbols to avoid compiling failure
'csum_partial' and 'csum_partial_copy_from_user' have already been
exported in "lib/", so need not export them again, or it will cause
compiling error.
The related error (with allmodconfig under unicore32):
LD vmlinux.o
lib/built-in.o:(___ksymtab+csum_partial+0x0): multiple definition of `__ksymtab_csum_partial'
arch/unicore32/kernel/built-in.o:(___ksymtab+csum_partial+0x0): first defined here
lib/built-in.o:(___ksymtab+csum_partial_copy_from_user+0x0): multiple definition of `__ksymtab_csum_partial_copy_from_user'
arch/unicore32/kernel/built-in.o:(___ksymtab+csum_partial_copy_from_user+0x0): first defined here
make: *** [vmlinux] Error 1
Chen Gang [Sat, 3 May 2014 05:09:02 +0000 (13:09 +0800)]
drivers/rtc/rtc-puv3.c: remove "&dev->" for typo issue MIME-Version: 1.0
It is only a typo issue, the related commit:
"1fbc4c4 drivers/rtc/rtc-puv3.c: use dev_dbg() instead of pr_debug()"
The related error (for unicore32 with allmodconfig):
CC [M] drivers/rtc/rtc-puv3.o
drivers/rtc/rtc-puv3.c: In function 'puv3_rtc_setalarm':
drivers/rtc/rtc-puv3.c:143: error: 'struct device' has no member named 'dev'
Chen Gang [Sat, 3 May 2014 05:07:57 +0000 (13:07 +0800)]
drivers/rtc/rtc-puv3.c: use dev_dbg() instead of dev_debug() for typo issue
It is only a typo issue, the related commit:
"1fbc4c4 drivers/rtc/rtc-puv3.c: use dev_dbg() instead of pr_debug()"
The related error (unicore32 with allmodconfig):
CC [M] drivers/rtc/rtc-puv3.o
drivers/rtc/rtc-puv3.c: In function 'puv3_rtc_setpie':
drivers/rtc/rtc-puv3.c:74: error: implicit declaration of function 'dev_debug'
Need generic definition for readl_relaxed(), like other architectures
have done. Or can not pass compiling with allmodconfig, the related
error:
CC [M] drivers/message/fusion/mptbase.o
drivers/message/fusion/mptbase.c: In function 'mpt_send_handshake_request':
drivers/message/fusion/mptbase.c:1224: error: implicit declaration of function 'readl_relaxed'
Chen Gang [Mon, 24 Mar 2014 12:54:11 +0000 (20:54 +0800)]
arch/unicore32/include/asm/ptrace.h: add generic definition for profile_pc()
Add generic definition just like another architectures have done, or
can not pass compiling with allmodconfig, the related error:
CC kernel/profile.o
kernel/profile.c: In function 'profile_tick':
kernel/profile.c:419: error: implicit declaration of function 'profile_pc'
make[1]: *** [kernel/profile.o] Error 1
make: *** [kernel] Error 2
Chen Gang [Mon, 24 Mar 2014 12:17:44 +0000 (20:17 +0800)]
arch/unicore32/mm/alignment.c: include "asm/pgtable.h" to avoid compiling error
Need include "asm/pgtable.h" to include "asm-generic/pgtable-nopmd.h",
so can let 'pmd_t' defined. The related error with allmodconfig:
CC arch/unicore32/mm/alignment.o
In file included from arch/unicore32/mm/alignment.c:24:
arch/unicore32/include/asm/tlbflush.h:135: error: expected .). before .*. token
arch/unicore32/include/asm/tlbflush.h:154: error: expected .). before .*. token
In file included from arch/unicore32/mm/alignment.c:27:
arch/unicore32/mm/mm.h:15: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
arch/unicore32/mm/mm.h:20: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
arch/unicore32/mm/mm.h:25: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
make[1]: *** [arch/unicore32/mm/alignment.o] Error 1
make: *** [arch/unicore32/mm] Error 2
Chen Gang [Fri, 14 Mar 2014 01:19:39 +0000 (09:19 +0800)]
arch/unicore32/kernel/clock.c: add readl() and writel() for 'PM_' macros
Add readl() and writel() for 'PM_' macros, just like another areas have
done within unicored32, or will cause compiling issue.
The related error (allmodconfig for unicored32):
CC arch/unicore32/kernel/clock.o
arch/unicore32/kernel/clock.c: In function 'clk_set_rate':
arch/unicore32/kernel/clock.c:182: warning: initialization makes integer from pointer without a cast
arch/unicore32/kernel/clock.c:204: error: lvalue required as left operand of assignment
arch/unicore32/kernel/clock.c:206: error: lvalue required as left operand of assignment
arch/unicore32/kernel/clock.c:207: error: invalid operands to binary & (have 'void *' and 'long unsigned int')
make[1]: *** [arch/unicore32/kernel/clock.o] Error 1
make: *** [arch/unicore32/kernel] Error 2
Chen Gang [Fri, 14 Mar 2014 00:49:27 +0000 (08:49 +0800)]
arch/unicore32/kernel/module.c: use __vmalloc_node_range() instead of __vmalloc_area()
__vmalloc_area() has already been removed from upstream kernel, need
use __vmalloc_node_range() instead of.
The related commit: "d0a2126 mm: unify module_alloc code for vmalloc".
The related error (allmodconfig for unicore32):
CC arch/unicore32/kernel/module.o
arch/unicore32/kernel/module.c: In function 'module_alloc' :
arch/unicore32/kernel/module.c:34: error: implicit declaration of function '__vmalloc_area'
arch/unicore32/kernel/module.c:34: warning: return makes pointer from integer without a cast
make[1]: *** [arch/unicore32/kernel/module.o] Error 1
make: *** [arch/unicore32/kernel] Error 2
Others are only within some architectures (not kernel wide).
The related error with allmodconfig for unicode32:
CC arch/unicore32/kernel/ksyms.o
arch/unicore32/kernel/ksyms.c:29: error: ._backtrace. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:29: error: type defaults to .nt. in declaration of ._backtrace.
arch/unicore32/kernel/ksyms.c:38: error: .sum_partial_copy_nocheck. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:38: error: type defaults to .nt. in declaration of .sum_partial_copy_nocheck.
arch/unicore32/kernel/ksyms.c:39: error: ._csum_ipv6_magic. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:39: error: type defaults to .nt. in declaration of ._csum_ipv6_magic.
arch/unicore32/kernel/ksyms.c:43: error: ._raw_readsb. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:43: error: type defaults to .nt. in declaration of ._raw_readsb.
arch/unicore32/kernel/ksyms.c:46: error: ._raw_readsw. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:46: error: type defaults to .nt. in declaration of ._raw_readsw.
arch/unicore32/kernel/ksyms.c:49: error: ._raw_readsl. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:49: error: type defaults to .nt. in declaration of ._raw_readsl.
arch/unicore32/kernel/ksyms.c:52: error: ._raw_writesb. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:52: error: type defaults to .nt. in declaration of ._raw_writesb.
arch/unicore32/kernel/ksyms.c:55: error: ._raw_writesw. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:55: error: type defaults to .nt. in declaration of ._raw_writesw.
arch/unicore32/kernel/ksyms.c:58: error: ._raw_writesl. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:58: error: type defaults to .nt. in declaration of ._raw_writesl.
arch/unicore32/kernel/ksyms.c:79: error: ._get_user_1. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:79: error: type defaults to .nt. in declaration of ._get_user_1.
arch/unicore32/kernel/ksyms.c:80: error: ._get_user_2. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:80: error: type defaults to .nt. in declaration of ._get_user_2.
arch/unicore32/kernel/ksyms.c:81: error: ._get_user_4. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:81: error: type defaults to .nt. in declaration of ._get_user_4.
arch/unicore32/kernel/ksyms.c:83: error: ._put_user_1. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:83: error: type defaults to .nt. in declaration of ._put_user_1.
arch/unicore32/kernel/ksyms.c:84: error: ._put_user_2. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:84: error: type defaults to .nt. in declaration of ._put_user_2.
arch/unicore32/kernel/ksyms.c:85: error: ._put_user_4. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:85: error: type defaults to .nt. in declaration of ._put_user_4.
arch/unicore32/kernel/ksyms.c:86: error: ._put_user_8. undeclared here (not in a function)
arch/unicore32/kernel/ksyms.c:86: error: type defaults to .nt. in declaration of ._put_user_8.
Miao Xie [Thu, 19 Jun 2014 02:42:55 +0000 (10:42 +0800)]
Btrfs: fix wrong error handle when the device is missing or is not writeable
The original bio might be submitted, so we shoud increase bi_remaining to
account for it when we deal with the error that the device is missing or
is not writeable, or we would skip the endio handle.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
Miao Xie [Thu, 19 Jun 2014 02:42:54 +0000 (10:42 +0800)]
Btrfs: fix deadlock when mounting a degraded fs
The deadlock happened when we mount degraded filesystem, the reproduced
steps are following:
# mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1>
# echo 1 > /sys/block/`basename <dev0>`/device/delete
# mount -o degraded <dev1> <mnt>
The reason was that the counter -- bi_remaining was wrong. If the missing
or unwriteable device was the last device in the mapping array, we would
not submit the original bio, so we shouldn't increase bi_remaining of it
in btrfs_end_bio(), or we would skip the final endio handle.
Fix this problem by adding a flag into btrfs bio structure. If we submit
the original bio, we will set the flag, and we increase bi_remaining counter,
or we don't.
Though there is another way to fix it -- decrease bi_remaining counter of the
original bio when we make sure the original bio is not submitted, this method
need add more check and is easy to make mistake.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
Qu Wenruo [Thu, 19 Jun 2014 02:42:51 +0000 (10:42 +0800)]
btrfs: Skip scrubbing removed chunks to avoid -ENOENT.
When run scrub with balance, sometimes -ENOENT will be returned, since
in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but
btrfs_lookup_block_group() will search block group in *MEMORY*, so if a
chunk is removed but not committed, -ENOENT will be returned.
However, there is no need to stop scrubbing since other chunks may be
scrubbed without problem.
So this patch changes the behavior to skip removed chunks and continue
to scrub the rest.
Miao Xie [Thu, 19 Jun 2014 02:42:50 +0000 (10:42 +0800)]
Btrfs: fix broken free space cache after the system crashed
When we mounted the filesystem after the crash, we got the following
message:
BTRFS error (device xxx): block group xxxx has wrong amount of free space
BTRFS error (device xxx): failed to load free space cache for block group xxx
It is because we didn't update the metadata of the allocated space (in extent
tree) until the file data was written into the disk. During this time, there was
no information about the allocated spaces in either the extent tree nor the
free space cache. when we wrote out the free space cache at this time (commit
transaction), those spaces were lost. In fact, only the free space that is
used to store the file data had this problem, the others didn't because
the metadata of them is updated in the same transaction context.
There are many methods which can fix the above problem
- track the allocated space, and write it out when we write out the free
space cache
- account the size of the allocated space that is used to store the file
data, if the size is not zero, don't write out the free space cache.
The first one is complex and may make the performance drop down.
This patch chose the second method, we use a per-block-group variant to
account the size of that allocated space. Besides that, we also introduce
a per-block-group read-write semaphore to avoid the race between
the allocation and the free space cache write out.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
Miao Xie [Thu, 19 Jun 2014 02:42:49 +0000 (10:42 +0800)]
Btrfs: make free space cache write out functions more readable
This patch makes the free space cache write out functions more readable,
and beisdes that, it also reduces the stack space that the function --
__btrfs_write_out_cache uses from 194bytes to 144bytes.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
Filipe Manana [Mon, 16 Jun 2014 12:14:25 +0000 (13:14 +0100)]
Btrfs: remove unused wait queue in struct extent_buffer
The lock_wq wait queue is not used anywhere, therefore just remove it.
On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320
bytes down to 296 bytes, which means a 4Kb page can now be used for
13 extent buffers instead of 12.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
Chris Mason [Thu, 19 Jun 2014 21:16:52 +0000 (14:16 -0700)]
Btrfs: fix deadlocks with trylock on tree nodes
The Btrfs tree trylock function is poorly named. It always takes
the spinlock and backs off if the blocking lock is held. This
can lead to surprising lockups because people expect it to really be a
trylock.
This commit makes it a pure trylock, both for the spinlock and the
blocking lock. It also reworks the nested lock handling slightly to
avoid taking the read lock while a spinning write lock might be held.
Rob Herring [Thu, 12 Jun 2014 17:52:44 +0000 (12:52 -0500)]
tty/serial: fix 8250 early console option passing to regular console
In the conversion to generic early console, the passing of options from
the early 8250 console to the regular ttyS console was broken. This
resulted in the baud rate changing when switching consoles during boot.
This feature allows specifying a single console option on the kernel
command line rather than both an early console and regular serial tty
console. It would be nice to generalize this feature. However, it only
works if the correct baud rate can be probed early which is not the
case on many platforms which have non-standard UART clock rates. So for
now, this is left as an 8250 specific feature.
Reported-and-tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rob Herring <robh@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Mon, 16 Jun 2014 12:10:42 +0000 (08:10 -0400)]
tty: Correct INPCK handling
If INPCK is not set, input parity detection should be disabled. This means
parity errors should not be received from the tty driver, and the data
received should be treated normally.
SUS v3, 11.2.2, General Terminal Interface - Input Modes, states:
"If INPCK is set, input parity checking shall be enabled. If INPCK is
not set, input parity checking shall be disabled, allowing output parity
generation without input parity errors. Note that whether input parity
checking is enabled or disabled is independent of whether parity detection
is enabled or disabled (see Control Modes). If parity detection is enabled
but input parity checking is disabled, the hardware to which the terminal
is connected shall recognize the parity bit, but the terminal special file
shall not check whether or not this bit is correctly set."
Ignore parity errors reported by the tty driver when INPCK is not set, and
handle the received data normally.
Fixes: Bugzilla #71681, 'Improvement of n_tty_receive_parity_error from n_tty.c' Reported-by: Ivan <athlon_@mail.ru> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Hurley [Mon, 16 Jun 2014 12:10:41 +0000 (08:10 -0400)]
serial: Fix IGNBRK handling
If IGNBRK is set without either BRKINT or PARMRK set, some uart
drivers send a 0x00 byte for BREAK without the TTYBREAK flag to the
line discipline, when it should send either nothing or the TTYBREAK flag
set. This happens because the read_status_mask masks out the BI
condition, which uart_insert_char() then interprets as a normal 0x00 byte.
SUS v3 is clear regarding the meaning of IGNBRK; Section 11.2.2, General
Terminal Interface - Input Modes, states:
"If IGNBRK is set, a break condition detected on input shall be ignored;
that is, not put on the input queue and therefore not read by any
process."
Fix read_status_mask to include the BI bit if IGNBRK is set; the
lsr status retains the BI bit if a BREAK is recv'd, which is
subsequently ignored in uart_insert_char() when masked with the
ignore_status_mask.
Linus Torvalds [Thu, 19 Jun 2014 17:53:27 +0000 (07:53 -1000)]
Merge tag 'stable/for-linus-3.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen fixes from David Vrabel:
"Xen regression and PVH fixes for 3.16-rc1
- fix dom0 PVH memory setup on latest unstable Xen releases
- fix 64-bit x86 PV guest boot failure on Xen 3.1 and earlier
- fix resume regression on non-PV (auto-translated physmap) guests"
* tag 'stable/for-linus-3.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/grant-table: fix suspend for non-PV guests
x86/xen: no need to explicitly register an NMI callback
Revert "xen/pvh: Update E820 to work with PVH (v2)"
x86/xen: fix memory setup for PVH dom0
Linus Torvalds [Thu, 19 Jun 2014 17:51:45 +0000 (07:51 -1000)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"These are primarily bug fixes with a performance improvement patch for
the GHASH crypto algorithm (which went in during this merging window)
and dts/defconfig/Kconfig updates.
- ftrace_return_addr() macro fix for arm (introduced earlier via the
arm64 tree)
- stack alignment exception entry code fix
- GHASH crypto algorithm fix and performance improvement
- CMA buffer limited to 32-bit (until a better way to describe the
system topology in DT)
- UAPI sigcontext.h build fix
- __kernel_old_{gid,uid}_t definitions fix (affecting 32-bit LTP)
- ptrace fixes (kernel fault and 32-bit arm core dump)
- pte_mknotpresent() fix
- dts updates (APM SoC)
- defconfig and Kconfig update"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: remove broken &= operator from pmd_mknotpresent
arm64: fix build error in sigcontext.h
arm64: dts: Add more serial port nodes in APM X-Gene device tree
arm64/dma: Removing ARCH_HAS_DMA_GET_REQUIRED_MASK macro
arm64: ptrace: fix empty registers set in prstatus of aarch32 process core
arm64: uid16: fix __kernel_old_{gid,uid}_t definitions
arm64: ptrace: change fs when passing kernel pointer to regset code
arm64: Limit the CMA buffer to 32-bit if ZONE_DMA
arm/ftrace: fix ftrace_return_addr() to ftrace_return_address()
arm64/crypto: improve performance of GHASH algorithm
arm64/crypto: fix data corruption bug in GHASH algorithm
arm64: defconfig update for LTP
arm64: ftrace: Fix comment typo 'CONFIG_FUNCTION_GRAPH_FP_TEST'
arm64: add ARCH_HAS_OPP to allow enabling OPP library
arm64: restore alphabetic order in Kconfig
arm64: Bug fix in stack alignment exception
Dave Airlie [Thu, 19 Jun 2014 00:54:35 +0000 (10:54 +1000)]
Merge tag 'drm-intel-fixes-2014-06-17' of git://anongit.freedesktop.org/drm-intel into drm-next
First round of fixes for 3.16-rc, mostly cc: stable, and the vt/vgacon
fixes from Daniel [1] to avoid hangs and unclaimed register errors on
module load/reload.
* tag 'drm-intel-fixes-2014-06-17' of git://anongit.freedesktop.org/drm-intel:
drm/i915/bdw: remove erroneous chv specific workarounds from bdw code
drm/i915: fix possible refcount leak when resetting forcewake
drm/i915: Reorder semaphore deadlock check
drm/i95: Initialize active ring->pid to -1
drm/i915: set backlight duty cycle after backlight enable for gen4
drm/i915: Avoid div-by-zero when pixel_multiplier is zero
drm/i915: Disable FBC by default also on Haswell and later
drm/i915: Kick out vga console
drm/i915: Fixup global gtt cleanup
vt: Don't ignore unbind errors in vt_unbind
vt: Fix up unregistration of vt drivers
vt: Fix replacement console check when unbinding
Rob Clark [Sat, 7 Jun 2014 14:55:39 +0000 (10:55 -0400)]
drm: fix uninitialized acquire_ctx fields (v2)
The acquire ctx will typically be declared on the stack, which means we
could have garbage values for any uninitialized field. In this case, it
was triggering WARN_ON()s because 'contended' had garbage value.
Go ahead and use memset() to be more future-proof.
v2: now with extra brown paper bag
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>