git.karo-electronics.de Git - karo-tx-linux.git/log

staging/lustre/fld: prepare FLD module for client server split

Split FLD server from client, fld_{handler,index}.c are not compliled
unless server support is enabled. Do not include dt_object.h or
lustre_mdt.h in lustre_fld.h and fix the minor breakages caused by
this elsewhere. Generally cleanup includes in lustre/fld.

Signed-off-by: Liu Xuezhao <xuezhao.liu@emc.com>
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1330
Lustre-change: http://review.whamcloud.com/2675
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/lnet: remove empty file lnet/lnet/api-errno.c

The file is empty. We can just remove it.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2335
Lustre-change: http://review.whamcloud.com/5880
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/ldlm: Fix flock deadlock detection race

Deadlock isn't detected if 2 threads are trying to
grant 2 locks which deadlock on each other.
They call ldlm_flock_deadlock() simultaneously
and deadlock ins't detected.

The soulition is to add lock to blocking list before
calling ldlm_flock_deadlock()

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1602
Lustre-change: http://review.whamcloud.com/3277
Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-by: Bruce Korb <bruce_korb@xyratex.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: call simple_setattr() from ll_md_setattr()

This partially reverts the change from "LU-2482 layout: introduce new
layout for released files" by calling simple_setattr() from
ll_md_setattr() without ATTR_SIZE set. Doing so avoids failed
assertions in osc_page_delete(). Disable truncates on released files
and modify sanity 229 accordingly.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3448
Lustre-change: http://review.whamcloud.com/6643
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: jacques-Charles Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/mdt: duplicate link names in directory

When creating a hard link to a file, the MDT/MDD/OSD code does
not verify whether the target link name already exists in the
directory. The ZFS ZAP code checks for duplicate entries. The
add_dirent_to_buf() function in ldiskfs only checks entries for
duplicates while it is traversing the leaf block looking for free
space. Even if it scanned the whole leaf block, this would not
work for non-htree directories since there is no guarantee that
the name is being inserted into the same leaf block.

To fix this, link should check target object doesn't exist as
other creat operations.

Add sanity.sh test_31o with multiple threads racing to link a new
name into the directory, while ensuring that there is a free entry
in the leaf block that is large enough to hold the duplicate name.
This needs to be racy, because otherwise the client VFS will see
the existing name and not send the RPC to the MDS, hiding the bug.

Add DLDLMRES/PLDLMRES macros for printing the whole lock resource
name (including the name hash) in LDLM_DEBUG() messages in a format
similar to DFID/PFID so they can be found in debug logs more easily.

The patch pickes client side change of the original patch, which only
contains the DLM printk part.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2901
Lustre-change: http://review.whamcloud.com/6591
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/crypto: add crc32c module loading to libcfs

This patch add automatically module loading for crc32c
when libcfs is starting.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2212
Lustre-change: http://review.whamcloud.com/4372
Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/ptlrpc: Race between start and stop service threads

When ptlrpc_start_thread fails to create a new thread, it will
finalize and free a struct ptlrpc_thread created and used here.
Considering this, it can be a problem when ptlrpc_svcpt_stop_thread
is driven and handles the struct ptlrpc_thread right after or right
before failure of cfs_create_thread. Because this situation let
the both of ptlrpc_start_thread and ptlrpc_svcpt_stop_threads
access the freed ptlrpc_thread and cause OS panic. Or, it may
happen that ptlrpc_svcpt_stop_threads waits forever holding an
already-freed waitq.

This patch adds an error handling into ptlrpc_start_thread to fix
this problem.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2889
Lustre-change: http://review.whamcloud.com/5552
Signed-off-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/osc: Check return code for lu_kmem_init

lu_kmem_init can fail and returns has a return code.
Check for this return code in lu_kmem_init.

This issue was found during 2gb VM Racer testing

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3063
Lustre-change: http://review.whamcloud.com/6514
Signed-off-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/lfsck: LFSCK 1.5 technical debts (3)

Original patch resolves some LFSCK 1.5 technical debts, including:
1) Check and remove repeated linkea entries.
2) Merge some "goto" branches to make the code more readable.
3) Some comments about object's nlink inconsistency processing.

This patch picks the obd flags change.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2915
Lustre-change: http://review.whamcloud.com/6344
Signed-off-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: force lvb_data update after layout change

When a file is restored the layout lock is first
associated with the released layout and after restore
it has to be assocaited with the new layout. This patch
forces lvb_data update in ll_layout_fetch() even if one
is present (case for released->normal state change)

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3299
Lustre-change: http://review.whamcloud.com/6291
Signed-off-by: JC Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/fid: prepare FID module for client server split

Split FID server from client, fid_{handler,store,lib}.c are not
compliled unless server support is enabled. Generally cleanup
includes in lustre/fid/ and reduce the need for client code to
directly or indirectly include {dt,md}_object.h.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1330
Lustre-change: http://review.whamcloud.com/2673
Signed-off-by: Liu Xuezhao <xuezhao.liu@emc.com>
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix 'code maintainability' errors

Fix 'code maintainability' issues found by Coverity version 6.5.1:
Unused pointer value (UNUSED_VALUE)
Pointer returned by function is never used.
Missing varargs init or cleanup (VARARGS)
va_end was not called for variable.

Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3107
Lustre-change: http://review.whamcloud.com/5944
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llapi: add user space method for lov_user_md

move lov_mds_md_size from obd_lov.h to lustre_idl.h
to have it close to lov_mds_md definition.

add lov_user_md_size() to compute lum size so
llapi and user space utils do not use kernel internal
definitions/methods

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3345
Lustre-change: http://review.whamcloud.com/6345
Signed-off-by: JC Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/mdt: add macros for fid string len

add 2 macros for the length of a fid string 0xSEQ:0xOID:0xVER
and it's brace version (FID_NOBRACE_LEN, and FID_LEN)

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2782
Lustre-change: http://review.whamcloud.com/5299
Signed-off-by: JC Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/layout: introduce new layout for released files

Released files now have a standard layout (with generation, pool, ...)
and a stripe count 0 and lmm_pattern flag LOV_PATTERN_F_RELEASED.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2482
Lustre-change: http://review.whamcloud.com/4816
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/md: fix lu_ucred.c boilerplate

In preparing Ie3a3cd99 (LU-1330 obdclass: splits server-side object
stack from client) the lu_ucred infrastructure was put in its own
file. Fixup the boilerplate of this file to give the proper path,
short description, and authors.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1330
Lustre-change: http://review.whamcloud.com/5910
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/dlmlock: compress out unused space

* lustre/include/lustre_dlm.h: Remove all bit fields and the unused
  weighing callback procedure.  respell LDLM_AST_DISCARD_DATA as
  LDLM_FL_AST_DISCARD_DATA to match other flags.
* .gitignore: ignore emacs temporary files
* autogen.sh: rebuild the lock bits, if autogen is available.
* contrib/bit-masks/lustre_dlm_flags.def: define the ldlm_lock flags
* contrib/bit-masks/lustre_dlm_flags.tpl: template for emitting text
* contrib/bit-masks/Makefile: construct the .c and .h files
  The .c file is for constructing a crash extension and is not
  preserved.
* contrib/bit-masks/.gitignore: ignore built products
* lustre/contrib/wireshark/packet-lustre.c: use built files instead
  of local versions of the defines.

In the rest of the modified sources, replace flag field references
with bit mask references.

* lustre/osc/osc_lock.c: removed osc_lock_weigh, too

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2771
Lustre-change: http://review.whamcloud.com/5312
Signed-off-by: Bruce Korb <bruce_korb@xyratex.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Keith Mannthey <Keith.Mannthey@intel.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: <bruce.korb@gmail.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: Make quota namespace refcounting consistent

It seems quota namespace is needlessly referenced on connect,
but that's not necessary as it could not go away until entire
obd goes away.
On the other hand this extra reference disturbs other logic
depending on empty namespace having zero refcount, so this patch
drops such extra referencing.

This picks client side change of the original patch.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2924
Lustre-change: http://review.whamcloud.com/6234
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: Only wake up ldlm_poold as frequently as the check interval

We used to wake up ldlm poold every second, but that's overkill,
we should just see how much time is left until next closest recalc
interval hits and sleep this much.
This will make "per-second" client grant statistic not actually
per-second, but I don't think we need any precision in that code

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2924
Lustre-change: http://review.whamcloud.com/5793
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/ldlm: split client namespaces into active and inactive

The main reason behind this is ldlm_poold walks all namespaces currently
no matter if there are any locks or not. On large systems this could take
quite a bit of time, esp. since ldlm_poold is currently woken up once per
second.

Now every time a client namespace loses it's last resource it is placed
into an inactive list that is not touched by ldlm_poold as pointless.
On creation of a first resource in a namespace it is placed back into
the active list.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2924
Lustre-change: http://review.whamcloud.com/5624
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/mdc: layout lock rpc must not take rpc_lock

When a client issue an RPC to get a layout lock, it
must not hold rpc_lock because in case of a restore
the rpc can be blocking for a long time

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3200
Lustre-change: http://review.whamcloud.com/6115
Signed-off-by: JC Lafoucriere <jacques-charles.lafoucriere@cea.fr>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/ptlrpc: Translate between host and network errnos

Lustre puts system errors (e.g., ENOTCONN) on wire as numbers
essentially specific to senders' architectures.  While this is fine
for x86-only sites, where receivers share the same error number
definition with senders, problems will arise, however, for sites
involving multiple architectures with different error number
definitions.  For instance, an ENOTCONN reply from a sparc server will
be put on wire as -57, which, for an x86 client, means EBADSLT
instead.

To solve the problem, this patch defines a set of network errors for
on-wire or on-disk uses.  These errors correspond to a subset of the
x86 system errors and share the same number definition, maintaining
compatibility with existing x86 clients and servers.

Then, either error numbers could be translated at run time, or all
host errors going on wire could be replaced with network errors in the
code.  This patch does the former by introducing both generic and
field-specific translation routines and calling them at proper places,
so that translations for existing fields are transparent.
(Personally, I tend to think the latter way might be worthwhile, as it
is more straightforward conceptually.  Do we really need so many
different errors?  Should errors returned by kernel routines really be
passed up and eventually put on wire?  There could even be security
implications in that.)

Thank Fujitsu for the original idea and their contributions that make
this available upstream.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2743
Lustre-change: http://review.whamcloud.com/5577
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/ptlrpc: race in pinger (use-after-free situation)

The race is result of use-after-free situation:

~ ptlrpc_stop_pinger()          ~ ptlrpc_pinger_main()
---------------------------------------------------------------
thread_set_flags(SVC_STOPPING)
cfs_waitq_signal(pinger_thread) ...
...                             thread_set_flags(SVC_STOPPED)
l_wait_event(thread_is_stopped)
OBD_FREE_PTR(pinger_thread)
...                             cfs_waitq_signal(pinger_thread)
---------------------------------------------------------------

The memory used by pinger_thread might have been freed and
reallocated to something else, when ptlrpc_pinger_main()
used it in cvs_waitq_signal().

Signed-off-by: Li Wei <wei.g.li@intel.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3032
Lustre-change: http://review.whamcloud.com/6040
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/ldlm: print FID in lvbo_init(), lvbo_update

Print the namespace and OBD device name, as well as the first two
lock resource fields (typically the FID) if there is an error with
loading the object from disk. This will be more important with
FID-on-OST and also the MDS. Using fid_extract_from_res_name() isn't
possible in the LDLM code, since the lock resource may not be a FID.

Make fid_extract_quota_resid() argument order and name consistent
with other fid_*_res() functions, with FID first and resource second.

Fix a bug in ofd_lvbo_init() where NULL lvb is accessed on error.

Print FID in ofd_lvbo_update() CDEBUG() and CERROR() messages.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2193
Lustre-change: http://review.whamcloud.com/4501
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: check ll_prep_md_op_data() using IS_ERR()

In ll_file_ioctl() and ll_swap_layouts() check the result of
ll_prep_md_op_data() using IS_ERR().

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3283
Lustre-change: http://review.whamcloud.com/6275
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: A not locked mutex can be unlocked.

In case of memory pressure a not locked mutex can be unlocked
in function ll_file_open(). This is not allowed and subsequent
behavior is not defined.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3157
Lustre-change: http://review.whamcloud.com/6028
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Reviewed-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: check alloc in ll_file_data_get, ll_dir_ioctl

In ll_file_data_get() and ll_dir_ioctl() return error on failed
allocations.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2753
Lustre-change: http://review.whamcloud.com/5845
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Sebastien Buisson <sebastien.buisson@bull.net>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix 'program hangs' errors

Fix 'program hangs' defects found by Coverity version 6.5.1:
Missing unlock (LOCK)
Returning without unlocking.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3054
Lustre-change: http://review.whamcloud.com/5870
Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: missing last bit in ll_have_md_lock

Missing the last bit during INODELOCK check in ll_have_md_lock.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3385
Lustre-change: http://review.whamcloud.com/6438
Signed-off-by: wang di <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: use READ, WRITE around ll_rw_stats_tally()

In vvp_io_write_start() the stats function ll_rw_stats_tally() was
incorrectly called with a rw argument of 0. Correct this and use the
macros READ and WRITE in and around ll_rw_stats_tally() for clarity.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3384
Lustre-change: http://review.whamcloud.com/6447
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: remove bogus ifndef EXPORT_SYMBOL

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/libcfs: drop bogus Kconfig default

Commit 4b5b4c7222 ("staging/lustre/libcfs: restore LINVRNT") added
"default false" to this Kconfig file. It was obviously meant to use
"default n" here. But we might as well drop this line, as a Kconfig bool
defaults to 'n' anyway.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/libcfs: removed dead code from libcfs_string

Confirmed by cscope that the functions are not used anymore. A fresh compilation does not yield any errors.

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: remove the second argument of ll_kmap_atomic()

kmap_atomic allows only one argument now, just remove the second.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: drop CONFIG_BROKEN

This reverts commit 0ad1ea69545b1965be4c93ee03fdc685c6beb23d

I didn't use git revert because it can not be done cleanly.
Hopefully it will be the last time we do it...

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: readdir convert to iterate

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: fix for d_compare API change

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix for invalidatepage() API change

somehow this got dropped during merge window...

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix build warnning on 32bit system

Building on 32bit system, I got warnings like below:
drivers/staging/lustre/lustre/llite/../include/lprocfs_status.h:666:7: note: expected ‘long unsigned int *’ but argument is of type ‘size_t *’
char *lprocfs_find_named_value(const char *buffer, const char *name,

drivers/staging/lustre/lustre/lov/lov_io.c: In function ‘lov_io_rw_iter_init’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
(void)(((typeof((n)) *)0) == ((uint64_t *)0)); \

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix build error on non-x86 platforms

dump_trace() is only available on X86. Without it, Lustre's own
watchdog is broken. We can only dump current task's stack.

The client-side this code is much less likely to hit deadlocks and
it's probably OK to drop this altogether, since we hardly have any
ptlrpc threads on clients, most notable ones are ldlm cb threads
that should not really be blocking on the client anyway.

Remove libcfs watchdog for now, until the upstream kernel watchdog
can detect distributed deadlocks and dump other kernel threads.

Cc: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix build when CONFIG_UIDGID_STRICT_TYPE_CHECKS is on

kuid_t/kgid_t are wrappered when CONFIG_UIDGID_STRICT_TYPE_CHECKS is on.
Lustre build is broken because we always treat them as plain __u32.
The patch fixes it. Internally, Lustre always use __u32 uid/gid, and
convert to kuid_t/kgid_t when necessary.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: fix build erorr if CONFIG_FS_POSIX_ACL is off

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre/llite: add missing include file for prefetchw

Got below errors on s390 build:
CC [M] drivers/staging/lustre/lustre/llite/dir.o
drivers/staging/lustre/lustre/llite/dir.c: In function 'll_dir_filler':
drivers/staging/lustre/lustre/llite/dir.c:225:3: error: implicit declaration of function 'prefetchw' [-Werror=implicit-function-declaration]

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix build on s390

As reported by Fengguang:
In file included from drivers/staging/lustre/lustre/obdclass/../include/lustre/lustre_idl.h:99:0,
    from drivers/staging/lustre/lustre/obdclass/../include/lprocfs_status.h:46,
    from drivers/staging/lustre/lustre/obdclass/../include/obd_support.h:42,
    from drivers/staging/lustre/lustre/obdclass/../include/obd_class.h:40,
    from drivers/staging/lustre/lustre/obdclass/lu_object.c:53:
drivers/staging/lustre/lustre/obdclass/../include/lustre/lustre_user.h:356:10: error: field 'lmd_st' has incomplete type
drivers/staging/lustre/lustre/obdclass/../include/lustre/lustre_user.h:361:10: error: field 'lmd_st' has incomplete type

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix build error when !CONFIG_SMP

Three functions cfs_cpu_ht_nsiblings, cfs_cpt_cpumask and
cfs_cpt_table_print are missing if !CONFIG_SMP.

cpumask_t/nodemask_t/__read_mostly/____cacheline_aligned
are redefined.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: remove HIPQUAD

Stephen Rothwell reported below error on powerpc:

In file included from drivers/staging/lustre/include/linux/libcfs/libcfs.h:203:0,
                 from drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h:67,
                 from drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c:41:
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c: In function 'kiblnd_dev_need_failover':
drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h:215:16: error: implicit declaration of function 'NIPQUAD' [-Werror=implicit-function-declaration]
  static struct libcfs_debug_msg_data msgdata;      \
                ^
We should just remove HIPQUAD and replace it with %pI4h.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

stating/lustre: only build if configured as module

Lustre internal dependency needs to be cleaned up. Currently,
libcfs is acting as a basis of all other modules, while other
modules in lustre/ directory in turn depend on lnet modules.
It creates a dependency loop that need to be fixed. Hopefully
we will remove libcfs in the end. So just disable buildin for
now.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: don't assert ln_refcount in LNetGetId

If LNetNIInit() fails, we'll get zero ln_refcount. So fail
LNetGetId() properly instead of asserting.

We can get to it when socklnd fails to scan network interfaces,
which is possible if Lustre is builtin.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: don't assert module owner

It can well be NULL if Lustre is builtin.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: fix Lustre code link order

Change Makefiles to keep link order in match with Lustre module
dependency, so that when Lustre is built in kernel, we'll have
the same dependency. Otherwise we'll crash kernel if Lustre is
builtin due to missing internal dependency.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging/lustre: replace num_physpages with totalram_pages

The global variable num_physpages is going away. Replace it
with totalram_pages.

Cc: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 3.11-rc2

Merge tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI video support fixes from Rafael Wysocki:
"I'm sending a separate pull request for this as it may be somewhat
  controversial.  The breakage addressed here is not really new and the
  fixes may not satisfy all users of the affected systems, but we've had
  so much back and forth dance in this area over the last several weeks
  that I think it's time to actually make some progress.

  The source of the problem is that about a year ago we started to tell
  BIOSes that we're compatible with Windows 8, which we really need to
  do, because some systems shipping with Windows 8 are tested with it
  and nothing else, so if we tell their BIOSes that we aren't compatible
  with Windows 8, we expose our users to untested BIOS/AML code paths.

  However, as it turns out, some Windows 8-specific AML code paths are
  not tested either, because Windows 8 actually doesn't use the ACPI
  methods containing them, so if we declare Windows 8 compatibility and
  attempt to use those ACPI methods, things break.  That occurs mostly
  in the backlight support area where in particular the _BCM and _BQC
  methods are plain unusable on some systems if the OS declares Windows
  8 compatibility.

  [ The additional twist is that they actually become usable if the OS
    says it is not compatible with Windows 8, but that may cause
    problems to show up elsewhere ]

  Investigation carried out by Matthew Garrett indicates that what
  Windows 8 does about backlight is to leave backlight control up to
  individual graphics drivers.  At least there's evidence that it does
  that if the Intel graphics driver is used, so we've decided to follow
  Windows 8 in that respect and allow i915 to control backlight (Daniel
  likes that part).

  The first commit from Aaron Lu makes ACPICA export the variable from
  which we can infer whether or not the BIOS believes that we are
  compatible with Windows 8.

  The second commit from Matthew Garrett prepares the ACPI video driver
  by making it initialize the ACPI backlight even if it is not going to
  be used afterward (that is needed for backlight control to work on
  Thinkpads).

  The third commit implements the actual workaround making i915 take
  over backlight control if the firmware thinks it's dealing with
  Windows 8 and is based on the work of multiple developers, including
  Matthew Garrett, Chun-Yi Lee, Seth Forshee, and Aaron Lu.

  The final commit from Aaron Lu makes us follow Windows 8 by informing
  the firmware through the _DOS method that it should not carry out
  automatic brightness changes, so that brightness can be controlled by
  GUI.

  Hopefully, this approach will allow us to avoid using blacklists of
  systems that should not declare Windows 8 compatibility just to avoid
  backlight control problems in the future.

   - Change from Aaron Lu makes ACPICA export a variable which can be
     used by driver code to determine whether or not the BIOS believes
     that we are compatible with Windows 8.

   - Change from Matthew Garrett makes the ACPI video driver initialize
     the ACPI backlight even if it is not going to be used afterward
     (that is needed for backlight control to work on Thinkpads).

   - Fix from Rafael J Wysocki implements Windows 8 backlight support
     workaround making i915 take over bakclight control if the firmware
     thinks it's dealing with Windows 8.  Based on the work of multiple
     developers including Matthew Garrett, Chun-Yi Lee, Seth Forshee,
     and Aaron Lu.

   - Fix from Aaron Lu makes the kernel follow Windows 8 by informing
     the firmware through the _DOS method that it should not carry out
     automatic brightness changes, so that brightness can be controlled
     by GUI"

* tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: no automatic brightness changes by win8-compatible firmware
  ACPI / video / i915: No ACPI backlight if firmware expects Windows 8
  ACPI / video: Always call acpi_video_init_brightness() on init
  ACPICA: expose OSI version

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext[34] tmpfile bugfix from Ted Ts'o:
"Fix regression caused by commit af51a2ac36d1f which added ->tmpfile()
  support (along with a similar fix for ext3)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext3: fix a BUG when opening a file with O_TMPFILE flag
  ext4: fix a BUG when opening a file with O_TMPFILE flag

ext3: fix a BUG when opening a file with O_TMPFILE flag

When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always().  We can use the following program to trigger it:

int main(int argc, char *argv[])
{
int fd;

fd = open(argv[1], O_TMPFILE, 0666);
if (fd < 0) {
perror("open ");
return -1;
}
close(fd);
return 0;
}

The oops message looks like this:

kernel: kernel BUG at fs/ext3/namei.c:1992!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd
kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4
kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010
kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000
kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP: 0018:ffff8801124d5cc8  EFLAGS: 00010202
kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0
kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8
kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f
kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000
kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8
kernel: FS:  00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0
kernel: Stack:
kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7
kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000
kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1
kernel: Call Trace:
kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd]
kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3]
kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7
kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30
kernel: [<ffffffff82065fa2>] ?  __dequeue_entity+0x33/0x38
kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d
kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102
kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd
kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20
kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b
kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0
kernel: RIP  [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP <ffff8801124d5cc8>

Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>

ext4: fix a BUG when opening a file with O_TMPFILE flag

When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always().  We can use the following program to trigger it:

int main(int argc, char *argv[])
{
int fd;

fd = open(argv[1], O_TMPFILE, 0666);
if (fd < 0) {
perror("open ");
return -1;
}
close(fd);
return 0;
}

The oops message looks like this:

kernel BUG at fs/ext4/namei.c:2572!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli
nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir
da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp
kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000
RIP: 0010:[<ffffffff8125ce69>]  [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0
RSP: 0018:ffff88010f7b7cf8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001
RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668
R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0
FS:  00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0
DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00
ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c
ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004
Call Trace:
[<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180
[<ffffffff811cba78>] path_openat+0x238/0x700
[<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80
[<ffffffff811cc647>] do_filp_open+0x47/0xa0
[<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200
[<ffffffff811ba2e4>] do_sys_open+0x124/0x210
[<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290
[<ffffffff811ba3ee>] SyS_open+0x1e/0x20
[<ffffffff816ca8d4>] tracesys+0xdd/0xe2
[<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0
Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00

Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>

Merge tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging tree fixes from Greg KH:
"Here are a few iio driver fixes for 3.11-rc2.  They are still spread
  across drivers/iio and drivers/staging/iio so they are coming in
  through this tree.

  I've also removed the drivers/staging/csr/ driver as the developers
  who originally sent it to me have moved on to other companies, and CSR
  still will not send us the specs for the device, making the driver
  pretty much obsolete and impossible to fix up.  Deleting it now
  prevents people from sending in lots of tiny codingsyle fixes that
  will never go anywhere.

  It also helps to offset the large lustre filesystem merge that
  happened in 3.11-rc1 in the overall 3.11.0 diffstat.  :)"

* tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: csr: remove driver
  iio: lps331ap: Fix wrong in_pressure_scale output value
  iio staging: fix lis3l02dq, read error handling
  staging:iio:ad7291: add missing .driver_module to struct iio_info
  iio: ti_am335x_adc: add missing .driver_module to struct iio_info
  iio: mxs-lradc: Remove useless check in read_raw
  iio: mxs-lradc: Fix misuse of iio->trig
  iio: inkern: fix iio_convert_raw_to_processed_unlocked
  iio: Fix iio_channel_has_info
  iio:trigger: device_unregister->device_del to avoid double free
  iio: dac: ad7303: fix error return code in ad7303_probe()

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"The sget() one is a long-standing bug and will need to go into -stable
  (in fact, it had been originally caught in RHEL6), the other two are
  3.11-only"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: constify dentry parameter in d_count()
  livelock avoidance in sget()
  allow O_TMPFILE to work with O_WRONLY

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
"Fixes for 3.11-rc2, sent at 5pm, in the professoinal style.  :-)"

I'm not sure I like this new level of "professionalism".
9-5, people, 9-5.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: call ext4_es_lru_add() after handling cache miss
  ext4: yield during large unlinks
  ext4: make the extent_status code more robust against ENOMEM failures
  ext4: simplify calculation of blocks to free on error
  ext4: fix error handling in ext4_ext_truncate()

Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
- Fix a regression against NFSv4 FreeBSD servers when creating a new
   file
- Fix another regression in rpc_client_register()

* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix a regression against the FreeBSD server
  SUNRPC: Fix another issue with rpc_client_register()

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next

Pull btrfs fixes from Josef Bacik:
"I'm playing the role of Chris Mason this week while he's on vacation.
  There are a few critical fixes for btrfs here, all regressions and
  have been tested well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next:
  Btrfs: fix wrong write offset when replacing a device
  Btrfs: re-add root to dead root list if we stop dropping it
  Btrfs: fix lock leak when resuming snapshot deletion
  Btrfs: update drop progress before stopping snapshot dropping

vfs: constify dentry parameter in d_count()

so that it can be used in places like d_compare/d_hash
without causing a compiler warning.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

livelock avoidance in sget()

Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
to fail.  The superblock is on ->fs_supers, ->s_umount is held exclusive,
->s_active is 1.  Along comes two more processes, trying to mount the same
thing; sget() in each is picking that superblock, bumping ->s_count and
trying to grab ->s_umount.  ->s_active is 3 now.  Original mount(2)
finally gets to deactivate_locked_super() on failure; ->s_active is 2,
superblock is still ->fs_supers because shutdown will *not* happen until
->s_active hits 0.  ->s_umount is dropped and now we have two processes
chasing each other:
s_active = 2, A acquired ->s_umount, B blocked
A sees that the damn thing is stillborn, does deactivate_locked_super()
s_active = 1, A drops ->s_umount, B gets it
A restarts the search and finds the same superblock.  And bumps it ->s_active.
s_active = 2, B holds ->s_umount, A blocked on trying to get it
... and we are in the earlier situation with A and B switched places.

The root cause, of course, is that ->s_active should not grow until we'd
got MS_BORN.  Then failing ->mount() will have deactivate_locked_super()
shut the damn thing down.  Fortunately, it's easy to do - the key point
is that grab_super() is called only for superblocks currently on ->fs_supers,
so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
bump ->s_active; we must never increment ->s_count for superblocks past
->kill_sb(), but grab_super() is never called for those.

The bug is pretty old; we would've caught it by now, if not for accidental
exclusion between sget() for block filesystems; the things like cgroup or
e.g. mtd-based filesystems don't have anything of that sort, so they get
bitten.  The right way to deal with that is obviously to fix sget()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

allow O_TMPFILE to work with O_WRONLY

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:
"Special thanks goes to Toralf Föster for continuously testing UML and
  reporting issues!"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: remove dead code
  um: siginfo cleanup
  uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases
  um: Fix wait_stub_done() error handling
  um: Mark stub pages mapping with VM_PFNMAP
  um: Fix return value of strnlen_user()

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 3.11.  Half of then is for Netlogic the remainder
  touches things across arch/mips.

  Nothing really dramatic and by rc1 standards MIPS will be in fairly
  good shape with this applied.  Tested by building all MIPS defconfigs
  of which with this pull request four platforms won't build.  And yes,
  it boots also on my favorite test systems"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: kvm: Kconfig: Drop HAVE_KVM dependency from VIRTUALIZATION
  MIPS: Octeon: Fix DT pruning bug with pip ports
  MIPS: KVM: Mark KVM_GUEST (T&E KVM) as BROKEN_ON_SMP
  MIPS: tlbex: fix broken build in v3.11-rc1
  MIPS: Netlogic: Add XLP PIC irqdomain
  MIPS: Netlogic: Fix USB block's coherent DMA mask
  MIPS: tlbex: Fix typo in r3000 tlb store handler
  MIPS: BMIPS: Fix thinko to release slave TP from reset
  MIPS: Delete dead invocation of exception_exit().

Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull arm64 fixes from Catalin Marinas:
- Post -rc1 update to the common reboot infrastructure.
- Fixes (user cache maintenance fault handling, !COMPAT compilation,
   CPU online and interrupt hanlding).

* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: use common reboot infrastructure
  arm64: mm: don't treat user cache maintenance faults as writes
  arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler()
  arm64: Only enable local interrupts after the CPU is marked online

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"An update for the BFP jit to the latest and greatest, two patches to
  get kdump working again, the random-abort ptrace extention for
  transactional execution, the z90crypt module alias for ap and a tiny
  cleanup"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: Alias for new zcrypt device driver base module
  s390/kdump: Allow copy_oldmem_page() copy to virtual memory
  s390/kdump: Disable mmap for s390
  s390/bpf,jit: add pkt_type support
  s390/bpf,jit: address randomize and write protect jit code
  s390/bpf,jit: use generic jit dumper
  s390/bpf,jit: call module_free() from any context
  s390/qdio: remove unused variable
  s390/ptrace: PTRACE_TE_ABORT_RAND

Btrfs: fix wrong write offset when replacing a device

Miao Xie reported the following issue:

The filesystem was corrupted after we did a device replace.

Steps to reproduce:
# mkfs.btrfs -f -m single -d raid10 <device0>..<device3>
# mount <device0> <mnt>
# btrfs replace start -rfB 1 <device4> <mnt>
# umount <mnt>
# btrfsck <device4>

The reason for the issue is that we changed the write offset by mistake,
introduced by commit 625f1c8dc.

We read the data from the source device at first, and then write the
data into the corresponding place of the new device. In order to
implement the "-r" option, the source location is remapped using
btrfs_map_block(). The read takes place on the mapped location, and
the write needs to take place on the unmapped location. Currently
the write is using the mapped location, and this commit changes it
back by undoing the change to the write address that the aforementioned
commit added by mistake.

Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: re-add root to dead root list if we stop dropping it

If we stop dropping a root for whatever reason we need to add it back to the
dead root list so that we will re-start the dropping next transaction commit.
The other case this happens is if we recover a drop because we will add a root
without adding it to the fs radix tree, so we can leak it's root and commit root
extent buffer, adding this to the dead root list makes this cleanup happen.
Thanks,

Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: fix lock leak when resuming snapshot deletion

We aren't setting path->locks[level] when we resume a snapshot deletion which
means we won't unlock the buffer when we free the path. This causes deadlocks
if we happen to re-allocate the block before we've evicted the extent buffer
from cache. Thanks,

Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: update drop progress before stopping snapshot dropping

Alex pointed out a problem and fix that exists in the drop one snapshot at a
time patch.  If we decide we need to exit for whatever reason (umount for
example) we will just exit the snapshot dropping without updating the drop
progress.  So the next time we go to resume we will BUG_ON() because we can't
find the extent we left off at because we never updated it.  This patch fixes
the problem.

Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fix from Paolo Bonzini:
"This single patch fixes a regression caused by one of the
  optimizations introduced in 3.11, which is generally visible only on
  AMD processors"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: MMU: avoid fast page fault fixing mmio page fault

Merge tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes collected over the last week, most importnatly two
  cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
  fix preventing systems using it from crashing during shutdown and two
  ACPI scan fixes related to hotplug.

  Specifics:

   - Two cpufreq commits from the 3.10 cycle introduced regressions.
     The first of them was buggy (it did way much more than it needed to
     do) and the second one attempted to fix an issue introduced by the
     first one.  Fixes from Srivatsa S Bhat revert both.

   - If autosleep triggers during system shutdown and the shutdown
     callbacks of some device drivers have been called already, it may
     crash the system.  Fix from Liu Shuo prevents that from happening
     by making try_to_suspend() check system_state.

   - The ACPI memory hotplug driver doesn't clear its driver_data on
     errors which may cause a NULL poiter dereference to happen later.
     Fix from Toshi Kani.

   - The ACPI namespace scanning code should not try to attach scan
     handlers to device objects that have them already, which may
     confuse things quite a bit, and it should rescan the whole
     namespace branch starting at the given node after receiving a bus
     check notify event even if the device at that particular node has
     been discovered already.  Fixes from Rafael J Wysocki.

   - New ACPI video blacklist entry for a system whose initial backlight
     setting from the BIOS doesn't make sense.  From Lan Tianyu.

   - Garbage string output avoindance for ACPI PNP from Liu Shuo.

   - Two Kconfig fixes for issues introduced recently in the s3c24xx
     cpufreq driver (when moving the driver to drivers/cpufreq) from
     Paul Bolle.

   - Trivial comment fix in pm_wakeup.h from Chanwoo Choi"

* tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
  PNP / ACPI: avoid garbage in resource name
  cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression
  cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
  cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
  PM / Sleep: Fix comment typo in pm_wakeup.h
  PM / Sleep: avoid 'autosleep' in shutdown progress
  cpufreq: Revert commit a66b2e to fix suspend/resume regression
  ACPI / memhotplug: Fix a stale pointer in error path
  ACPI / scan: Always call acpi_bus_scan() for bus check notifications
  ACPI / scan: Do not try to attach scan handlers to devices having them

arm64: use common reboot infrastructure

Commit 7b6d864b48d9 (reboot: arm: change reboot_mode to use enum
reboot_mode) changed the way reboot is handled on arm, which has a
direct impact on arm64 as we share the reset driver on the VE platform.

The obvious fix is to move arm64 to use the same infrastructure.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[catalin.marinas@arm.com: removed reboot_mode = REBOOT_HARD default setting]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: don't treat user cache maintenance faults as writes

On arm64, cache maintenance faults appear as data aborts with the CM
bit set in the ESR. The WnR bit, usually used to distinguish between
faulting loads and stores, always reads as 1 and (slightly confusingly)
the instructions are treated as reads by the architecture.

This patch fixes our fault handling code to treat cache maintenance
faults in the same way as loads.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler()

If 'COMPAT' not defined, aarch32_break_handler() cannot pass compiling,
and it can work independent with 'COMPAT', so remove dummy definition.

The related error:

  arch/arm64/kernel/debug-monitors.c:249:5: error: redefinition of ‘aarch32_break_handler’
  In file included from arch/arm64/kernel/debug-monitors.c:29:0:
  /root/linux-next/arch/arm64/include/asm/debug-monitors.h:89:12: note: previous definition of ‘aarch32_break_handler’ was here

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: Only enable local interrupts after the CPU is marked online

There is a slight chance that (timer) interrupts are triggered before a
secondary CPU has been marked online with implications on softirq thread
affinity.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Kirill Tkhai <tkhai@yandex.ru>

MIPS: kvm: Kconfig: Drop HAVE_KVM dependency from VIRTUALIZATION

Virtualization does not always need KVM capabilities so drop the
dependency. The KVM symbol already depends on HAVE_KVM.

Fixes the following problem on a randconfig:
warning: (REMOTEPROC && RPMSG) selects VIRTUALIZATION which has unmet direct
dependencies (HAVE_KVM)
warning: (REMOTEPROC && RPMSG) selects VIRTUALIZATION which has unmet
direct dependencies (HAVE_KVM)

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Acked-by: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5443/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

um: remove dead code

"me" is not used.

Signed-off-by: Richard Weinberger <richard@nod.at>

um: siginfo cleanup

Currently we use both struct siginfo and siginfo_t.
Let's use struct siginfo internally to avoid ongoing
compiler warning. We are allowed to do so because
struct siginfo and siginfo_t are equivalent.

Signed-off-by: Richard Weinberger <richard@nod.at>

MIPS: Octeon: Fix DT pruning bug with pip ports

During the pruning of the device tree octeon_fdt_pip_iface() is called
for each PIP interface and every port up to the port count is removed
from the device tree. However, the count was set to the return value of
cvmx_helper_interface_enumerate() which doesn't actually return the
count but just returns zero on success. This effectively removed *all*
ports from the tree.

Use cvmx_helper_ports_on_interface() instead to fix this. This
successfully restores the 3 ports of my ERLite-3 and fixes the "kernel
assigns random MAC addresses" issue.

Signed-off-by: Faidon Liambotis <paravoid@debian.org>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5587/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases

which_tmpdir did the wrong thing if /dev/shm was a symlink (e.g., to /run/shm),
if there were multiple mounts on top of each other, if the mount(s) were
obscured by a later mount, or if /dev/shm was a prefix of another mount point.
This fixes these cases. Applies to 3.9.6.

Signed-off-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

um: Fix wait_stub_done() error handling

If we die within a stub handler we only way to reliable
kill the (obviously) dying uml guest process is killing
it's host twin on the host side.

Signed-off-by: Richard Weinberger <richard@nod.at>

um: Mark stub pages mapping with VM_PFNMAP

Ensure that a process cannot destroy his stub pages with
using MADV_DONTNEED and friends.

Reported-by: toralf.foerster@gmx.de
Signed-off-by: Richard Weinberger <richard@nod.at>

um: Fix return value of strnlen_user()

In case of an error it must not return -EFAULT.
Return 0 like all other archs do.

Reported-by: toralf.foerster@gmx.de
Signed-off-by: Richard Weinberger <richard@nod.at>

MIPS: KVM: Mark KVM_GUEST (T&E KVM) as BROKEN_ON_SMP

Make KVM_GUEST depend on BROKEN_ON_SMP so that it cannot be enabled with
SMP.

SMP kernels use ll/sc instructions for an atomic section in the tlb fill
handler, with a tlbp instruction contained in the middle. This cannot be
emulated with trap & emulate KVM because the tlbp instruction traps and
the eret to return to the guest code clears the LLbit which makes the sc
instruction always fail.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/5588/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: tlbex: fix broken build in v3.11-rc1

Commit 6ba045f9fbdafb48da42aa8576ea7a3980443136 (MIPS: Move generated code
to .text for microMIPS) deleted tlbmiss_handler_setup_pgd_array, but some
references were not converted. Fix that to enable building a MIPS kernel.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Jayachandran C. <jchandra@broadcom.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5589/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Netlogic: Add XLP PIC irqdomain

Add a legacy irq domain for the XLP PIC interrupts. This will be used
when interrupts are assigned from the device tree. This change is required
after commit c5cdc67 "irqdomain: Remove temporary MIPS workaround code".

Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Cc: linux-mips@linux-mips.org
Cc: Jayachandran C <jchandra@broadcom.com>
Patchwork: https://patchwork.linux-mips.org/patch/5597/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Netlogic: Fix USB block's coherent DMA mask

The on-chip USB controller on Netlogic XLP does not suppport
DMA beyond 32-bit physical address. Set the coherent_dma_mask
of the USB in its PCI fixup to support this.

Signed-off-by: Ganesan Ramalingam <ganesanr@broadcom.com>
Signed-off-by: Jayachandran C. <jchandra@broadcom.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5596/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: tlbex: Fix typo in r3000 tlb store handler

commit 6ba045f (MIPS: Move generated code to .text for microMIPS)
causes a panic at boot. The handler builder should test against
handle_tlbs_end, not handle_tlbs.

Signed-off-by: Tony Wu <tung7970@gmail.com>
Acked-by: Jayachandran C. <jchandra@broadcom.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5600/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: BMIPS: Fix thinko to release slave TP from reset

Commit 4df715aa ["MIPS: BMIPS: support booting from physical CPU other
than 0"] introduced a thinko which will prevents slave CPUs from being
released from reset on systems where we boot from TP0. The problem is
that we are checking whether the slave CPU logical CPU map is 0, which
is never true for systems booting from TP0, so we do not release the
slave TP from reset and we are just stuck. Fix this by properly checking
that the CPU we intend to boot really is the physical slave CPU (logical
and physical value being 1).

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: blogic@openwrt.org
Cc: jogo@openwrt.org
Cc: cernekee@gmail.com
Cc: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/5598/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

s390/zcrypt: Alias for new zcrypt device driver base module

The zcrypt device driver has been split into base/bus module, api-module,
card modules and message type modules. The base module has been renamed
from z90crypt to ap.
A module alias (with the well-known z90crypt identifier) will be introduced
that enable users to use their existing way to load the zcrypt device driver.

Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"A couple interesting SKB fragment handling fixes, plus the usual small
  bits here and there:

   1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
      Tim Gardner.

   2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
      address printing helper.

   3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
      device can't do checksumming offloads then it shouldn't say it can
      do SG either.  From Haiyang Zhang.

   4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.

   5) Don't leak DMA mappings on mapping failures, from Neil Horman.

   6) We need to reset the transport header of SKBs in ipv4 before we
      attempt to perform early socket demux, just like ipv6 does.  From
      Eric Dumazet.

   7) Add missing locking on vxlan device removal, from Stephen
      Hemminger.

   8) xen-netfront has to make two passes over an SKB to prepare it for
      transfer.  One pass calculates the number of slots needed, the
      second massages the SKB and fills the slots.  Unfortunately, the
      first pass doesn't calculate the number of slots properly so we
      can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
      work out so well.  Fix from Jan Beulich with help and discussion
      with several others.

   9) Fix a similar problem in tun and macvtap, which have to split up
      scatter-gather elements at PAGE_SIZE boundaries.  Don't do
      zerocopy if it would result in a > MAX_SKB_FRAGS skb.  Fixes from
      Jason Wang.

  10) On receive, once we've decoded the VLAN state completely, clear
      skb->vlan_tci.  Otherwise demuxed tunnels underneath can trigger
      the VLAN code again, corrupting the packet.  Fix from Eric
      Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  vlan: fix a race in egress prio management
  vlan: mask vlan prio bits
  macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  pkt_sched: sch_qfq: remove a source of high packet delay/jitter
  xen-netfront: pull on receive skb may need to happen earlier
  vxlan: add necessary locking on device removal
  hyperv: Fix the NETIF_F_SG flag setting in netvsc
  net: Fix sysfs_format_mac() code duplication.
  be2net: Fix to avoid hardware workaround when not needed
  macvtap: do not assume 802.1Q when send vlan packets
  macvtap: fix the missing ret value of TUNSETQUEUE
  ipv4: set transport header earlier
  mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
  bgmac: add dependency to phylib
  net/irda: fixed style issues in irlan_eth
  ethtool: fixed trailing statements in ethtool
  ndisc: bool initializations should use true and false
  atl1e: unmap partially mapped skb on dma error and free skb

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Trying again to get the fixes queue, including the fixed IDT alignment
  patch.

  The UEFI patch is by far the biggest issue at hand: it is currently
  causing quite a few machines to boot.  Which is sad, because the only
  reason they would is because their BIOSes touch memory that has
  already been freed.  The other major issue is that we finally have
  tracked down the root cause of a significant number of machines
  failing to suspend/resume"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Make sure IDT is page aligned
  x86, suspend: Handle CPUs which fail to #GP on RDMSR
  x86/platform/ce4100: Add header file for reboot type
  Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"
  efivars: check for EFI_RUNTIME_SERVICES

Merge tag 'md-3.11-fixes' of git://neil.brown.name/md

Pull md bug fixes from NeilBrown:
"Sorry boss, back at work now boss.  Here's them nice shiny patches ya
  wanted.  All nicely tagged and justified for -stable and everyfing:

  Three bug fixes for md in 3.10

  3.10 wasn't a good release for md.  The bio changes left a couple of
  bugs, and an md "fix" created another one.

  These three patches appear to fix the issues and have been tagged for
  -stable"

* tag 'md-3.11-fixes' of git://neil.brown.name/md:
  md/raid1: fix bio handling problems in process_checks()
  md: Remove recent change which allows devices to skip recovery.
  md/raid10: fix two problems with RAID10 resync.

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"You'll be terribly disappointed in this, I'm not trying to sneak any
  features in or anything, its mostly radeon and intel fixes, a couple
  of ARM driver fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (34 commits)
  drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)
  drm/radeon/dpm/atom: fix broken gcc harder
  drm/radeon/dpm/atom: restructure logic to work around a compiler bug
  drm/radeon/dpm: fix atom vram table parsing
  drm/radeon: fix an endian bug in atom table parsing
  drm/radeon: add a module parameter to disable aspm
  drm/rcar-du: Use the GEM PRIME helpers
  drm/shmobile: Use the GEM PRIME helpers
  uvesafb: Really allow mtrr being 0, as documented and warn()ed
  radeon kms: do not flush uninitialized hotplug work
  drm/radeon/dpm/sumo: handle boost states properly when forcing a perf level
  drm/radeon: align VM PTBs (Page Table Blocks) to 32K
  drm/radeon: allow selection of alignment in the sub-allocator
  drm/radeon: never unpin UVD bo v3
  drm/radeon: fix UVD fence emit
  drm/radeon: add fault decode function for CIK
  drm/radeon: add fault decode function for SI (v2)
  drm/radeon: add fault decode function for cayman/TN (v2)
  drm/radeon: use radeon device for request firmware
  drm/radeon: add missing ttm_eu_backoff_reservation to radeon_bo_list_validate
  ...

vlan: fix a race in egress prio management

egress_priority_map[] hash table updates are protected by rtnl,
and we never remove elements until device is dismantled.

We have to make sure that before inserting an new element in hash table,
all its fields are committed to memory or else another cpu could
find corrupt values and crash.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: mask vlan prio bits

In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.

But we also have a problem if prio bits are set.

Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.

Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :

16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
       0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
       0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
       0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
       0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

It seems we are not really ready to properly cope with this right now.

We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.

For stable kernels, lets clear vlan_tci to fix the bugs.

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS

We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.

Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.

This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.

We can do further optimization on top.

This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73
(macvtap: zerocopy: validate vectors before building skb).

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>