Dan Carpenter [Tue, 15 Sep 2015 06:54:33 +0000 (09:54 +0300)]
staging: wilc1000: off by one in get_handler_from_id()
The > should be >= here or we read beyond the end of the array.
Fixes: d42ab0838d04 ('staging: wilc1000: use id value as argument') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Olaf Weber [Mon, 14 Sep 2015 22:41:35 +0000 (18:41 -0400)]
staging/lustre/ptlrpc: make ptlrpcd threads cpt-aware
On NUMA systems, the placement of worker threads relative to the
memory they use greatly affects performance. The CPT mechanism can be
used to constrain a number of Lustre thread types, and this change
makes it possible to configure the placement of ptlrpcd threads in a
similar manner.
To simplify the code changes, the global structures used to manage
ptlrpcd threads are changed to one per CPT. In particular this means
there will be one ptlrpcd recovery thread per CPT.
To prevent ptlrpcd threads from wandering all over the system, all
ptlrpcd thread are bound to a CPT. Note that some CPT configuration
is always created, but the defaults are not likely to be correct for
a NUMA system. After discussing the options with Liang Zhen we
decided that we would not bind ptlrpcd threads to specific CPUs,
and rather trust the kernel scheduler to migrate ptlrpcd threads.
With all ptlrpcd threads bound to a CPT, but not to specific CPUs,
the load policy mechanism can be radically simplified:
- PDL_POLICY_LOCAL and PDL_POLICY_ROUND are currently identical.
- PDL_POLICY_ROUND, if fully implemented, would cost us the locality
we are trying to achieve, so most or all calls using this policy
would have to be changed to PDL_POLICY_LOCAL.
- PDL_POLICY_PREFERRED is not used, and cannot be implemented without
binding ptlrpcd threads to individual CPUs.
- PDL_POLICY_SAME is rarely used, and cannot be implemented without
binding ptlrpcd threads to individual CPUs.
The partner mechanism is also updated, because now all ptlrpcd
threads are "bound" threads. The only difference between the various
bind policies, PDB_POLICY_NONE, PDB_POLICY_FULL, PDB_POLICY_PAIR, and
PDB_POLICY_NEIGHBOR, is the number of partner threads. The bind
policy is replaced with a tunable that directly specifies the size of
the groups of ptlrpcd partner threads.
Ensure that the ptlrpc_request_set for a ptlrpcd thread is created on
the same CPT that the thread will work on. When threads are bound to
specific nodes and/or CPUs in a NUMA system, it pays to ensure that
the datastructures used by these threads are also on the same node.
Visible changes:
* ptlrpcd thread names include the CPT number, for example
"ptlrpcd_02_07". In this case the "07" is relative to the CPT, and
not a CPU number.
Tunables added:
* ptlrpcd_cpts (string): A CPT string describing the CPU partitions
that ptlrpcd threads should run on. Used to make ptlrpcd threads
run on a subset of all CPTs.
* ptlrpcd_per_cpt_max (int): The maximum number of ptlrpcd threads
to run in a CPT.
* ptlrpcd_partner_group_size (int): The desired number of threads
in each ptlrpcd partner thread group. Default is 2, corresponding
to the old PDB_POLICY_PAIR. A negative value makes all ptlrpcd
threads in a CPT partners of each other.
Tunables obsoleted:
* max_ptlrpcds: The new ptlrcpd_per_cpt_max can be used to obtain the
same effect.
* ptlrpcd_bind_policy: The new ptlrpcd_partner_group_size can be used
to obtain the same effect.
Internal interface changes:
* pdb_policy_t and related code have been removed. Groups of partner
ptlrpcd threads are still created, and all threads in a partner
group are bound on the same CPT. The ptlrpcd threads bound to a
CPT are typically divided into several partner groups. The partner
groups on a CPT all have an equal number of ptlrpcd threads.
* pdl_policy_t and related code have been removed. Since ptlrpcd
threads are not bound to a specific CPU, all the code that avoids
scheduling on the current CPU (or attempts to do so) has been
removed as non-functional. A simplified form of PDL_POLICY_LOCAL
is kept as the only load policy.
* LIOD_BIND and related code have been removed. All ptlrpcd threads
are now bound to a CPT, and no additional binding policy is
implemented.
* ptlrpc_prep_set(): Changed to allocate a ptlrpc_request_set
on the current CPT.
* ptlrpcd(): If an error is encountered before entering the main loop
store the error in pc_error before exiting.
* ptlrpcd_start(): Check pc_error to verify that the ptlrpcd thread
has successfully entered its main loop.
* ptlrpcd_init(): Initialize the struct ptlrpcd_ctl for all threads
for a CPT before starting any of them. This closes a race during
startup where a partner thread could reference a non-initialized
struct ptlrpcd_ctl.
staging/lustre/o2iblnd: leak cmid in kiblnd_dev_need_failover
cmid created by kiblnd_dev_need_failover should always be destroyed,
however it is not the case in current implementation and we will leak
cmid when this function detected a device failover.
Li Xi [Mon, 14 Sep 2015 22:41:32 +0000 (18:41 -0400)]
staging/lustre/osc: use global osc_rq_pool to reduce memory usage
The per-osc request pools consume a lot of memory if there are
hundreds of OSCs on one client. This will be a critical problem
if the client doesn't have sufficient memory for both OSCs and
applications.
This patch replaces per-osc request pools with a global pool
osc_rq_pool. The total memory usage is 5MB by default. And it
can be set by a module parameter of OSC:
"options osc osc_reqpool_mem_max=POOL_SIZE". The unit of POOL_SIZE
is MB. If cl_max_rpcs_in_flight is the same for all OSCs, the
memory usage of the OSC pool can be calculated as:
Min(POOL_SIZE * 1M,
(cl_max_rpcs_in_flight + 2) * OSC number * OST_MAXREQSIZE)
Also, this patch changes the allocation logic of OSC write requests.
The allocation from osc_rq_pool will only be tried after normal
allocation failed.
Signed-off-by: Wu Libin <lwu@ddn.com> Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: http://review.whamcloud.com/15422
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6770 Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ann Koehler [Mon, 14 Sep 2015 22:41:30 +0000 (18:41 -0400)]
staging/lustre/obdclass: Eliminate hash bucket scans in lu_cache_shrink
The lu_cache_shrink slab shrinker is too slow, accounting for > 90% of
the time spent in shrink_slab when allocating huge pages. Most of its
time is spent iterating over the buckets in each site's object hash
table to compute the number of freeable objects. This iteration is
eliminated by adding an lru length count to the lu_site struct. A
percpu counter is used to maintain the lru length, so that the
lu_site does not need to be locked when an object is accessed through
the hash table. A counter is updated whenever an object is added to
or deleted from any of the hash table buckets. The number of freeable
objects is the sum of the counter values across all cpus.
Signed-off-by: Ann Koehler <amk@cray.com>
Reviewed-on: http://review.whamcloud.com/14066
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6365 Reviewed-by: Mike Pershin <mike.pershin@intel.com> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com> Signed-off-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ll_iget_for_nfs() can call unbalanced iput() causing memory
leaks. This patch removes this iput() call.
Also, avoid unhashing disconnected dentries in
d_lustre_invalidate(), which is another source of memory
leaks.
One of the symptoms of the leak is the following crash pattern:
LustreError: 14812:0:(lu_object.c:1251:lu_device_fini())
ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
LustreError: 14812:0:(lu_object.c:1251:lu_device_fini()) LBUG
Pid: 14812, comm: umount
Isaac Huang [Mon, 14 Sep 2015 22:41:28 +0000 (18:41 -0400)]
staging/lustre/o2iblnd: wrong uses of kib_tx_t::tx_nfrags
The kib_tx_t::tx_nfrags field is the # entries in
the kib_tx_t::tx_frags array, rather than # DMA
mapped entries. So kiblnd_send/kiblnd_recv should
use kib_rdma_desc_t::rd_nfrags instead.
LASSERT touches cl_client_cache->ccc_lru without any protection.
So this patch the LASSERT moves to the section protected by
cl_client_cache->ccc_lru_lock
staging/lustre/o2iblnd: connection refcount fix for kiblnd_post_rx
kiblnd_post_rx() can't refer to rx::rx_conn anymore after
ib_post_recv() because this rx can be polled out by another thread
which may drop this rx and destroy rx::rx_conn.
This patch fixes this issue by taking an extra refcount on connection
before calling ib_post_recv().
Andreas Dilger [Mon, 14 Sep 2015 22:41:21 +0000 (18:41 -0400)]
staging/lustre/ptlrpc: remove LUSTRE_MSG_MAGIC_V1 support
Remove the remains of LUSTRE_MSG_MAGIC_V1 support from ptlrpc.
It has not been supported since 1.8 and is not functional since 2.0.
In lustre_msg_check_version(), return an error for unsupported RPC
versions so that the server will reject such RPCs early. Otherwise
the server only prints an error message and continue on.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-on: http://review.whamcloud.com/14007
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6349 Reviewed-by: James Simmons <uja.ornl@yahoo.com> Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by: John L. Hammond <john.hammond@intel.com> Signed-off-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fan Yong [Mon, 14 Sep 2015 22:41:19 +0000 (18:41 -0400)]
staging/lustre/llite: cleanup open handle for client open failure
For open case, the client side open handling thread may hit error
after the MDT grant the open. Under such case, the client should
send close RPC to the MDT as cleanup; otherwise, the open handle
on the MDT will be leaked there until the client umount or evicted.
If the LFSCK marks LU_OBJECT_HEARD_BANSHEE on the MDT-object that is
opened by others for repairing some inconsistency, such as repairing
multiple-referenced OST-object, because the leaked open handle still
references the MDT-object, then it will block the subsequent threads
that want to locate such object via FID.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Reviewed-on: http://review.whamcloud.com/13709
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6301 Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Lai Siyao <lai.siyao@intel.com> Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com> Signed-off-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Due to some accounting bug, lsb_busy of a hash bucket can become
larger than the total number of objects in said bucket. A busy object
can be counted more than once. When that happens, a negative value is
returned by the shrinker callback.
Instead of trying (and failing) to count the busy objects, count the
objects than are not busy, i.e. the objects that are present on the
lsb_lru list. The number of busy objects is then the difference
between the number of objects in the hash and the objects on the
lsb_lru list.
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/12468
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5722 Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Mike Pershin <mike.pershin@intel.com> Signed-off-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The BIT() macro is already defined in bitops.h, remove duplicate
definitions. Users of BIT() macro are expecting unsigned int/u32, so
add typecasts where this creates a build warning.
Mike Rapoport [Sat, 12 Sep 2015 08:07:45 +0000 (11:07 +0300)]
staging: sm750fb: ddk750_hwi2c: reduce amount of CamelCase
Rename camel case variables deviceAddress, pBuffer and totalBytes to
addr, buf and total_bytes respectively in sm750_hw_i2c_{read,write}_data
functions.
Signed-off-by: Mike Rapoport <mike.rapoport@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>