]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
8 years agoIB/hfi1: Add SDMA cache eviction algorithm
Mitko Haralanov [Tue, 8 Mar 2016 19:15:44 +0000 (11:15 -0800)]
IB/hfi1: Add SDMA cache eviction algorithm

This commit adds a cache eviction algorithm for the SDMA
user buffer cache.

Besides the interval RB tree used for node lookup, the cache
nodes are also arranged in a doubly-linked list. When a node is
used, it is put at the beginning of the list. Less frequently
used nodes naturally move to the tail of the list.

When the cache limit is reached, the eviction code starts
traversing the linked list in reverse, freeing buffers until
enough space has been freed to fit the new user buffer. This
guarantees that only the least used cache nodes will be removed
from the cache.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Switch to using the pin query function
Mitko Haralanov [Tue, 8 Mar 2016 19:15:39 +0000 (11:15 -0800)]
IB/hfi1: Switch to using the pin query function

Use the new function to query whether the expected receive
user buffer can be pinned successfully. This requires that
a new variable be added to the hfi1_filedata structure used
to hold the number of pages pinned by the expected receive
code.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Specify mm when releasing pages
Mitko Haralanov [Tue, 8 Mar 2016 19:15:33 +0000 (11:15 -0800)]
IB/hfi1: Specify mm when releasing pages

This change adds a pointer to the process mm_struct when
calling hfi1_release_user_pages().

Previously, the function used the mm_struct of the current
process to adjust the number of pinned pages. However, is some
cases, namely when unpinning pages due to a MMU notifier call,
we want to drop into that code block as it will cause a deadlock
(the MMU notifiers take the process' mmap_sem prior to calling
the callbacks).

By allowing to caller to specify the pointer to the mm_struct,
the caller has finer control over that part of hfi1_release_user_pages().

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add pin query function
Mitko Haralanov [Tue, 8 Mar 2016 19:15:28 +0000 (11:15 -0800)]
IB/hfi1: Add pin query function

System administrators can use the locked memory
ulimit setting to set the maximum amount of memory
a user can lock/pin. However, this setting alone is not
enough to guarantee good operation of the hfi1 driver
due to the fact that the setting does not have fine
enough granularity to account for the limit being used
by multiple user processes and caches.

Therefore, a better limiting algorithm is needed. This
is where the new hfi1_can_pin_pages() function and the
cache_size module parameter come in.

The function works by looking at the ulimit and cache_size
value to compute a cache size. The algorithm examines the
ulimit value and, if it is not "unlimited", computes a
per-cache limit based on the number of configured user
contexts.

After that, the lower of the two - cache_size and computed
per-cache limit - is used.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Implement SDMA-side buffer caching
Mitko Haralanov [Tue, 8 Mar 2016 19:15:22 +0000 (11:15 -0800)]
IB/hfi1: Implement SDMA-side buffer caching

Add support for caching of user buffers used for SDMA
transfers. This change improves performance by
avoiding repeatedly pinning the pages of buffers, which
are being re-used by the application.

While the cost of the pinning operation has been made
heavier by adding the extra code to search the cache tree,
re-allocate pages arrays, and future cache evictions,
that cost will be amortized against the savings when the
same buffer is re-used. It is also worth noting that in
most cases, the cost of pinning should be much lower due
to the buffer already being in the cache.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Adjust last address values for intervals
Mitko Haralanov [Tue, 8 Mar 2016 19:15:16 +0000 (11:15 -0800)]
IB/hfi1: Adjust last address values for intervals

Last address values for intervals in the interval RB tree
nodes should be non-inclusive in order to avoid confusing
ranges.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add filter callback
Mitko Haralanov [Tue, 8 Mar 2016 19:15:10 +0000 (11:15 -0800)]
IB/hfi1: Add filter callback

This commit adds a filter callback, which can be used to filter
out interval RB nodes matching a certain interval down to a
single one.

This is needed for the upcoming SDMA-side caching where buffers
will need to be filtered by their virtual address.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Remove compare callback
Mitko Haralanov [Tue, 8 Mar 2016 19:15:04 +0000 (11:15 -0800)]
IB/hfi1: Remove compare callback

Interval RB trees provide their own searching function,
which also takes care of determining the path through
the tree that should be taken.

This make the compare callback unnecessary.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add MMU tracing
Mitko Haralanov [Tue, 8 Mar 2016 19:14:59 +0000 (11:14 -0800)]
IB/hfi1: Add MMU tracing

Add a new tracepoint type for the MMU functions and calls
to that tracepoint to allow tracing of MMU functionality.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Use interval RB trees
Mitko Haralanov [Tue, 8 Mar 2016 19:14:53 +0000 (11:14 -0800)]
IB/hfi1: Use interval RB trees

The interval RB trees can handle RB nodes which
hold ranged information. This is exactly the usage
for the buffer cache implemented in the expected
receive code path.

Convert the MMU/RB functions to use the interval RB
tree API. This will help with future users of the
caching API, as well.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Notify remove MMU/RB callback of calling context
Mitko Haralanov [Tue, 8 Mar 2016 19:14:48 +0000 (11:14 -0800)]
IB/hfi1: Notify remove MMU/RB callback of calling context

Tell the remove MMU/RB callback if it's being called as
part of a memory invalidation or not. This can be important
in preventing a deadlock if the remove callback attempts to
take the map_sem semaphore because the kernel's MMU
invalidation functions have already taken it.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Remove the use of add/remove RB function pointers
Mitko Haralanov [Tue, 8 Mar 2016 19:14:42 +0000 (11:14 -0800)]
IB/hfi1: Remove the use of add/remove RB function pointers

The usage of function pointers for RB node insertion
and removal in the expected receive code path was
meant to be a small performance optimization. However,
maintaining it, especially with the new MMU API, would
become more troublesome as the API is extended.

Since the performance optimization is minor, remove the
function pointers and replace with direct calls.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Allow remove MMU callbacks to free nodes
Mitko Haralanov [Tue, 8 Mar 2016 19:14:36 +0000 (11:14 -0800)]
IB/hfi1: Allow remove MMU callbacks to free nodes

In order to allow the remove MMU callbacks to free the
RB nodes, it is necessary to prevent any references to
the nodes after the remove callback has been called.

Therefore, remove the node from the tree prior to calling
the callback. In other words, the MMU/RB API now guarantees
that all RB node operations it performs will be done prior
to calling the remove callback and that the RB node will
not be touched afterwards.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Prevent NULL pointer dereference
Mitko Haralanov [Tue, 8 Mar 2016 19:14:31 +0000 (11:14 -0800)]
IB/hfi1: Prevent NULL pointer dereference

Prevent a potential NULL pointer dereference (found
by code inspection) when unregistering an MMU handler.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Allow MMU function execution in IRQ context
Mitko Haralanov [Tue, 8 Mar 2016 19:14:25 +0000 (11:14 -0800)]
IB/hfi1: Allow MMU function execution in IRQ context

Future users of the MMU/RB functions might be searching or
manipulating the MMU RB trees in interrupt context. Therefore,
the MMU/RB functions need to be able to run in interrupt
context. This requires that we use the IRQ-aware API for
spin locks.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Re-factor MMU notification code
Mitko Haralanov [Tue, 8 Mar 2016 19:14:20 +0000 (11:14 -0800)]
IB/hfi1: Re-factor MMU notification code

The MMU notification code added to the
expected receive side has been re-factored and
split into it's own file. This was done in
order to make the code more general and, therefore,
usable by other parts of the driver.

The caching behavior remains the same. However,
the handling of the RB tree (insertion, deletions,
and searching) as well as the MMU invalidation
processing is now handled by functions in the
mmu_rb.[ch] files.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Post receive for QP in ERR state
Alex Estrin [Mon, 7 Mar 2016 19:35:51 +0000 (11:35 -0800)]
IB/rdmavt: Post receive for QP in ERR state

Accordingly IB Spec post WR to receive queue must
complete with error if QP is in Error state.
Please refer to C10-42, C10-97.2.1

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Enable adaptive pio by default
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:46 +0000 (11:35 -0800)]
IB/hfi1: Enable adaptive pio by default

Set the piothreshold to the agreed upon default of 256B.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix adaptive pio packet corruption
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:41 +0000 (11:35 -0800)]
IB/hfi1: Fix adaptive pio packet corruption

The adaptive pio heuristic missed a case that causes a corrupted
packet on the wire.

The case is if SDMA egress had been chosen for a pio-able packet and
then encountered a ring space wait, the packet is queued.   The sge
cursor had been incremented as part of the packet build out for SDMA.

After the send engine restart, the heuristic might now chose pio based
on the sdma count being zero and start the mmio copy using the already
incremented sge cursor.

Fix this by forcing SDMA egress when the SDMA descriptor has already
been built.

Additionally, the code to wait for a QPs pio count to zero when
switching to SDMA was missing.  Add it.

There is also an issue with UD QPs, in that the different SLs can pick
a different egress send context.  For now, just insure the UD/GSI
always go through SDMA.

Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix panic in adaptive pio
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:35 +0000 (11:35 -0800)]
IB/hfi1: Fix panic in adaptive pio

The following panic occurs while running ib_send_bw -a with
adaptive pio turned on:

[ 8551.143596] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 8551.152986] IP: [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.160926] PGD 80db21067 PUD 80bb45067 PMD 0
[ 8551.166431] Oops: 0000 [#1] SMP
[ 8551.276725] task: ffff880816bf15c0 ti: ffff880812ac0000 task.ti: ffff880812ac0000
[ 8551.285705] RIP: 0010:[<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.296462] RSP: 0018:ffff880812ac3b58  EFLAGS: 00010282
[ 8551.303029] RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000000800
[ 8551.311633] RDX: ffff880812ac3c08 RSI: 0000000000000000 RDI: ffff8800b6665e40
[ 8551.320228] RBP: ffff880812ac3ba0 R08: 0000000000001000 R09: ffffffffa09039a0
[ 8551.328820] R10: ffff880817a0c000 R11: 0000000000000000 R12: ffff8800b6665e40
[ 8551.337406] R13: ffff880817a0c000 R14: ffff8800b6665800 R15: ffff8800b6665e40
[ 8551.355640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8551.362674] CR2: 0000000000000000 CR3: 000000080abe8000 CR4: 00000000001406e0
[ 8551.371262] Stack:
[ 8551.374119]  ffff880812ac3bf0 ffff88080cf54010 ffff880800000800 ffff880812ac3c08
[ 8551.383036]  ffff8800b6665800 ffff8800b6665e40 0000000000000202 ffffffffa08e7b80
[ 8551.391941]  00000001007de431 ffff880812ac3bc8 ffffffffa0904645 ffff8800b6665800
[ 8551.400859] Call Trace:
[ 8551.404214]  [<ffffffffa08e7b80>] ? hfi1_del_timers_sync+0x30/0x30 [hfi1]
[ 8551.412417]  [<ffffffffa0904645>] hfi1_verbs_send+0x215/0x330 [hfi1]
[ 8551.420154]  [<ffffffffa08ec126>] hfi1_do_send+0x166/0x350 [hfi1]
[ 8551.427618]  [<ffffffffa055a533>] rvt_post_send+0x533/0x6a0 [rdmavt]
[ 8551.435367]  [<ffffffffa050760f>] ib_uverbs_post_send+0x30f/0x530 [ib_uverbs]
[ 8551.443999]  [<ffffffffa0501367>] ib_uverbs_write+0x117/0x380 [ib_uverbs]
[ 8551.452269]  [<ffffffff815810ab>] ? sock_recvmsg+0x3b/0x50
[ 8551.459071]  [<ffffffff81581152>] ? sock_read_iter+0x92/0xe0
[ 8551.466068]  [<ffffffff81212857>] __vfs_write+0x37/0x100
[ 8551.472692]  [<ffffffff81213532>] ? rw_verify_area+0x52/0xd0
[ 8551.479682]  [<ffffffff81213782>] vfs_write+0xa2/0x1a0
[ 8551.486089]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[ 8551.493891]  [<ffffffff812146c5>] SyS_write+0x55/0xc0
[ 8551.500220]  [<ffffffff816ae0ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 8551.531284] RIP  [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.539508]  RSP <ffff880812ac3b58>
[ 8551.544110] CR2: 0000000000000000

The priv s_sendcontext pointer was not setup properly.  Fix with this
patch by using the s_sendcontext and eliminating its send engine use.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix PIO wakeup timing hole
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:30 +0000 (11:35 -0800)]
IB/hfi1: Fix PIO wakeup timing hole

There is a timing hole if there had been greater than
PIO_WAIT_BATCH_SIZE waiters.  This code will dispatch the first
batch but leave the others in the queue.   If the restarted waiters
don't in turn wait on a buffer, there is a hang.

Fix by forcing a return when the QP queue is non-empty.

Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix ordering of trace for accuracy
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:24 +0000 (11:35 -0800)]
IB/hfi1: Fix ordering of trace for accuracy

The postitioning of the sdma ibhdr trace was
causing an extra trace message when the tx send
returned -EBUSY.

Move the trace to just before the return
and handle negative return values to avoid
any trace.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add unique trace point for pio and sdma send
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:19 +0000 (11:35 -0800)]
IB/hfi1: Add unique trace point for pio and sdma send

This allows for separately enabling pio and sdma
tracepoints to cut the volume of trace information.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Fix issues with qp_stats print
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:14 +0000 (11:35 -0800)]
IB/hfi1: Fix issues with qp_stats print

The changes are to aid in coorelating trace information
with QPs between the trace and qp_stats information

Such changes include adds a space after QP and clarifying that the second
QP is actually the remote QP.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Report pid in qp_stats to aid debug
Mike Marciniszyn [Mon, 7 Mar 2016 19:35:08 +0000 (11:35 -0800)]
IB/hfi1: Report pid in qp_stats to aid debug

Tracking user/QP ownership is needed to debug issues with
user ULPs like OpenMPI.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Improve LED beaconing
Easwar Hariharan [Mon, 7 Mar 2016 19:35:03 +0000 (11:35 -0800)]
IB/hfi1: Improve LED beaconing

The current LED beaconing code is unclear and uses the timer handler to
turn off the timer. This patch simplifies the code by removing the
special semantics of timeon = timeoff = 0 being interpreted as a request
to turn off the beaconing.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Don't call cond_resched in atomic mode when sending packets
Kaike Wan [Sat, 5 Mar 2016 16:50:49 +0000 (08:50 -0800)]
IB/hfi1: Don't call cond_resched in atomic mode when sending packets

This patch fixed the problem where the driver might reschedule in atomic
mode when sending packets. This is due to the fact that the call to
cond_resched() in hfi1_do_send() might occur in atomic mode and a check is
required to avoid the warning message:
    "kernel: BUG: scheduling while atomic: swapper/2/0/0x10000100."

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add adaptive cacheless verbs copy
Dean Luick [Sat, 5 Mar 2016 16:50:43 +0000 (08:50 -0800)]
IB/hfi1: Add adaptive cacheless verbs copy

The kernel memcpy is faster than a cacheless copy.  However,
if too much of the L3 cache is overwritten by one-time copies
then overall bandwidth suffers.  Implement an adaptive scheme
where full page copies are tracked and if the number of unique
entries are larger than a threshold, verbs will use a cacheless
copy.  Tracked entries are gradually cleaned, allowing memcpy to
resume once the larger copies have stopped.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Handle host handshake timeout
Jubin John [Sat, 5 Mar 2016 16:50:38 +0000 (08:50 -0800)]
IB/hfi1: Handle host handshake timeout

Host handshake timeout can occur during the verify capability
state. This is a LNI related failure and should be
handled in the same way as other LNI failures.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add ASIC flag view/clear
Dean Luick [Sat, 5 Mar 2016 16:50:33 +0000 (08:50 -0800)]
IB/hfi1: Add ASIC flag view/clear

Different OSes using parts of the same hardware may leave
cross-device flags set.  Export a debugfs file to view and
clear these flags if needed.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Hold i2c resource across debugfs open/close
Dean Luick [Sat, 5 Mar 2016 16:50:27 +0000 (08:50 -0800)]
IB/hfi1: Hold i2c resource across debugfs open/close

External i2c firmware updates are done in multiple steps and
cannot have other things done in between.  For debugfs files,
acquire the resource on open and release it on close.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Reduce hardware mutex timeout
Dean Luick [Sat, 5 Mar 2016 16:50:22 +0000 (08:50 -0800)]
IB/hfi1: Reduce hardware mutex timeout

The hardware mutex is now held only long enough to set
or clear flags.  Reduce the timeout to something more
reasonable.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Remove unused HFI1_DO_INIT_ASIC flag
Dean Luick [Sat, 5 Mar 2016 16:50:17 +0000 (08:50 -0800)]
IB/hfi1: Remove unused HFI1_DO_INIT_ASIC flag

The flag HFI1_DO_INIT_ASIC flag is no longer used.  Remove
the flag and the code that sets it.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Change thermal init to use resource reservation
Dean Luick [Sat, 5 Mar 2016 16:50:11 +0000 (08:50 -0800)]
IB/hfi1: Change thermal init to use resource reservation

Use the resource reservation system to flag that the ASIC
thermal has been initialized.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Change QSFP functions to use resource reservation
Dean Luick [Sat, 5 Mar 2016 16:50:06 +0000 (08:50 -0800)]
IB/hfi1: Change QSFP functions to use resource reservation

Remove the mutex guarding each operation in favor the ASIC
resource acquire/release.  Push the resource acquire/release,
above each operation call to allow exclusive access across
multiple operations.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Change SBus handling to use resource reservation
Dean Luick [Sat, 5 Mar 2016 16:50:01 +0000 (08:50 -0800)]
IB/hfi1: Change SBus handling to use resource reservation

The SBus resource includes SBUS, PCIE, and THERM registers.
Change SBus handling to use the new ASIC resource reservation system.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Change EPROM handling to use resource reservation
Dean Luick [Sat, 5 Mar 2016 16:49:55 +0000 (08:49 -0800)]
IB/hfi1: Change EPROM handling to use resource reservation

Change EPROM handling to use the new ASIC resource reservation system.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add ASIC resource reservation functions
Dean Luick [Sat, 5 Mar 2016 16:49:50 +0000 (08:49 -0800)]
IB/hfi1: Add ASIC resource reservation functions

The ASIC block is a shared hardware resource between two devices
on the chip.  Add functions to acquire and release these resources
in a way that is safe for both multiple users on the same OS
and multiple users on different OSes, while holding the hardware
mutex as little as possible.

Reservations are noted in a scratch register in the shared region.
There are two types of reservations: per-HFI dynamic and permanent.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add shared ASIC structure
Dean Luick [Sat, 5 Mar 2016 16:49:45 +0000 (08:49 -0800)]
IB/hfi1: Add shared ASIC structure

Create a shared structure to exist between devices that share the
same ASIC.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Remove ASIC block clear
Dean Luick [Sat, 5 Mar 2016 16:49:39 +0000 (08:49 -0800)]
IB/hfi1: Remove ASIC block clear

The ASIC block is shared between two HFIs. Individual devices
should not initialize registers there. Retain the power-on values.
Individual users set registers as needed with one exception.
Clear sbus fast mode on "slow" calls.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Replace kmalloc and memcpy with a kmemdup
Harish Chegondi [Sat, 5 Mar 2016 16:49:34 +0000 (08:49 -0800)]
IB/hfi1: Replace kmalloc and memcpy with a kmemdup

This change was recommended by Coccinelle tool when I ran the command:
-bash-4.2$ make coccicheck MODE=patch M=drivers/infiniband/hw/hfi1/

Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Move constant to the right in bitwise operations
Harish Chegondi [Sat, 5 Mar 2016 16:49:29 +0000 (08:49 -0800)]
IB/hfi1: Move constant to the right in bitwise operations

Implement changes recommended by the Coccinelle tool to move constant to
the right in bitwise operations

-bash-4.2$ make coccicheck MODE=report M=drivers/infiniband/hw/hfi1/

drivers/infiniband/hw/hfi1/pio.c:765:4-16: Move constant to right.
drivers/infiniband/hw/hfi1/rc.c:2503:19-29: Move constant to right.
drivers/infiniband/hw/hfi1/chip.c:9813:11-22: Move constant to right.
drivers/infiniband/hw/hfi1/chip.c:14468:29-40: Move constant to right.

Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/hfi1: Add the break statement that was removed in an earlier patch
Harish Chegondi [Sat, 5 Mar 2016 16:49:24 +0000 (08:49 -0800)]
IB/hfi1: Add the break statement that was removed in an earlier patch

The break statement was unintentionally removed in this patch
commit 41ca419abc0ca7ee65d765408cdc1a7fed2897a3
("staging/rdma/hfi1: Remove hfi1 MR and hfi1 specific qp type")

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix memory leaks
Jubin John [Fri, 26 Feb 2016 21:33:33 +0000 (13:33 -0800)]
staging/rdma/hfi1: Fix memory leaks

Fix 3 memory leaks reported by the LeakCheck tool in the KEDR framework.

The following resources were allocated memory during their respective
initializations but not freed during cleanup:
1. SDMA map elements
2. PIO map elements
3. HW send context to SW index map

This patch fixes the memory leaks by freeing the allocated memory in the
cleanup path.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix reporting of LED status in Get(LedInfo) and Get(PortInfo)
Easwar Hariharan [Fri, 26 Feb 2016 21:33:28 +0000 (13:33 -0800)]
staging/rdma/hfi1: Fix reporting of LED status in Get(LedInfo) and Get(PortInfo)

The LedInfo SMA attribute is redefined to control the LED beaconing
state machine instead of the LED directly. In accordance, we now
return the state of LED beaconing, represented by whether the beaconing
timer is active, instead of the state of the LED itself for SMA queries
Get(LedInfo) and Get(PortInfo). While we are at it, we fix the beaconing
timer control code so that the state of the timer is accurately updated.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Check interrupt registers mapping
Kaike Wan [Fri, 26 Feb 2016 21:33:23 +0000 (13:33 -0800)]
staging/rdma/hfi1: Check interrupt registers mapping

This patch tests the interrupt registers when the driver has no access to
its upstream component. In this case, it is highly likely that it is
running in a virtual machine (eg, Qemu-kvm guest). If the interrupt
registers are not mapped properly by the virtual machine monitor, an
error message will be printed and the probing will be terminated. This
will help the user identify the issue. On the other hand, if the driver
is running in a host or has access to its upstream component in some
other VM, it will do nothing.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Avoid using upstream component if it is not accessible
Kaike Wan [Fri, 26 Feb 2016 21:33:18 +0000 (13:33 -0800)]
staging/rdma/hfi1: Avoid using upstream component if it is not accessible

When the hfi1 device is assigned to a VM (eg KVM), the hfi1 driver has
no access to the upstream component and therefore cannot use it to perform
some operations, such as secondary bus reset. As a result, the hfi1 driver
cannot perform the pcie Gen3 transition. Instead, those operation should
be done in the host environment, preferrably done during the Option ROM
initialization. Similarly, the hfi1 driver cannot support ASPM and tune
the pcie capability under this circumstance.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix header size calculation for RC/UC QPs with GRH enabled
Jianxin Xiong [Fri, 26 Feb 2016 21:33:13 +0000 (13:33 -0800)]
staging/rdma/hfi1: Fix header size calculation for RC/UC QPs with GRH enabled

There is a header size counter in both the QP struture and the txreq
structure. The counter in the txreq structure is not updated properly
for RC and UC queue pairs with GRH enabled, and thus causing SDMA
send to fail. This patch fixes the RC and UC path.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Check lkey_table_size value before use
Jubin John [Fri, 26 Feb 2016 21:33:08 +0000 (13:33 -0800)]
IB/rdmavt: Check lkey_table_size value before use

The lkey_table_size driver specific parameter value is used before its
value is sanity checked and restricted to RVT_MAX_LKEY_TABLE_BITS.

This causes a vmalloc allocation failure for large values. Fix this
by moving the value check before the first usage of the value.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix counter read for cp
Dean Luick [Thu, 18 Feb 2016 19:13:01 +0000 (11:13 -0800)]
staging/rdma/hfi1: Fix counter read for cp

A cp or cat of /sys/kernel/debug/hfi1/hfi1_0/port1counters
produces the following message:

hfi1 0000:81:00.0: hfi1_0: index not supported
hfi1 0000:81:00.0: hfi1_0: read_cntrs does not support indexing

Fix by removing the file position logic and the associated messages
and make the file positioning the responsibility of the caller.

The port counter read function argument is changed to the per port
data structure since the counters are relative to the port and not
the device.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Guard i2c access against cp
Dean Luick [Thu, 18 Feb 2016 19:12:51 +0000 (11:12 -0800)]
staging/rdma/hfi1: Guard i2c access against cp

An attempt to cp or cat /sys/kernel/debug/hfi1/hfi1_0/i2c1
produces this message:

hfi1 0000:81:00.0: hfi1_0: IB0:1 I2C failed even retrying

Fix the issue by explicitly rejecting a simple cat/cp with an
-EINVAL error return.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdamvt: fix cross build with rdmavt
Mike Marciniszyn [Thu, 18 Feb 2016 19:12:42 +0000 (11:12 -0800)]
IB/rdamvt: fix cross build with rdmavt

The new check routine causes a larger than supported frame size
on s390.

Changing the check routine to noinline fixes the issue.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Disclose more information when i2c fails
Dean Luick [Thu, 18 Feb 2016 19:12:34 +0000 (11:12 -0800)]
staging/rdma/hfi1: Disclose more information when i2c fails

Improve logging messages when there are i2c failures.
Clean i2c read error handling.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix debugfs access race
Dean Luick [Thu, 18 Feb 2016 19:12:25 +0000 (11:12 -0800)]
staging/rdma/hfi1: Fix debugfs access race

Debugfs access races with the driver being ready.  Make sure the
driver is ready before debugfs files appear and debufs files are
gone before the driver starts tearing down.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Cleanup comments and logs in PHY code
Easwar Hariharan [Thu, 18 Feb 2016 19:12:16 +0000 (11:12 -0800)]
staging/rdma/hfi1: Cleanup comments and logs in PHY code

This is a set of minor fixes including comment and log message cleanups
and improvements to the PHY layer code.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix xmit discard error weight
Dean Luick [Thu, 18 Feb 2016 19:12:08 +0000 (11:12 -0800)]
staging/rdma/hfi1: Fix xmit discard error weight

Count only the errors that apply to xmit discards.  Update
the comment to better explain the limitations of the count.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: fix 0-day syntax error
Mike Marciniszyn [Thu, 18 Feb 2016 19:11:59 +0000 (11:11 -0800)]
staging/rdma/hfi1: fix 0-day syntax error

Setting CONFIG_HFI1_DEBUG_SDMA_ORDER causes a syntax error:
sdma.c: In function â€˜complete_tx’:
sdma.c:370: error: â€˜txp’ undeclared (first use in
this function)
sdma.c:370: error: (Each undeclared identifier is reported only once
sdma.c:370: error: for each function it appears in.)

Adjust code under ifdef to reference the tx properly.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix header
Jubin John [Mon, 15 Feb 2016 04:22:17 +0000 (20:22 -0800)]
staging/rdma/hfi1: Fix header

Fix the header by moving the copyright notice out of the license text
and to the top of the header. Also, update the copyright date.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove else after break
Jubin John [Mon, 15 Feb 2016 04:22:09 +0000 (20:22 -0800)]
staging/rdma/hfi1: Remove else after break

Remove else after break to fix checkpatch warning:
WARNING: else is not generally useful after a break or return

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Add braces on all arms of statement
Jubin John [Mon, 15 Feb 2016 04:22:00 +0000 (20:22 -0800)]
staging/rdma/hfi1: Add braces on all arms of statement

Add braces on all arms of statements to fix checkpatch check:
CHECK: braces {} should be used on all arms of this statement

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix code alignment
Jubin John [Mon, 15 Feb 2016 04:21:52 +0000 (20:21 -0800)]
staging/rdma/hfi1: Fix code alignment

Fix code alignment to fix checkpatch check:
CHECK: Alignment should match open parenthesis

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix block comments
Jubin John [Mon, 15 Feb 2016 04:21:43 +0000 (20:21 -0800)]
staging/rdma/hfi1: Fix block comments

Fix block comments with proper formatting to fix checkpatch warnings:
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Add comment for spinlock_t definition
Jubin John [Mon, 15 Feb 2016 04:21:34 +0000 (20:21 -0800)]
staging/rdma/hfi1: Add comment for spinlock_t definition

Add comments describing the spinlock for spinlock_t definitions to
fix checkpatch check:
CHECK: spinlock_t definition without comment

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove void function return statement
Jubin John [Mon, 15 Feb 2016 04:21:26 +0000 (20:21 -0800)]
staging/rdma/hfi1: Remove void function return statement

Remove return statement at the end of a void function to fix
checkpatch warning:
WARNING: void function return statements are not generally useful

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Use pointer instead of struct name
Jubin John [Mon, 15 Feb 2016 04:21:16 +0000 (20:21 -0800)]
staging/rdma/hfi1: Use pointer instead of struct name

Use sizeof(*p) instead of sizeof(struct foo) to fix checkpatch check:
CHECK: Prefer alloc(sizeof(*p)...) over alloc(sizeof(struct foo)...)

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove CamelCase
Jubin John [Mon, 15 Feb 2016 04:21:07 +0000 (20:21 -0800)]
staging/rdma/hfi1: Remove CamelCase

Remove CamelCase to fix checkpatch check:
CHECK: Avoid CamelCase: <PLS_CONFIGPHY_ESTcOMM_LOCAL_COMPLETE>

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix misspellings
Jubin John [Mon, 15 Feb 2016 04:20:58 +0000 (20:20 -0800)]
staging/rdma/hfi1: Fix misspellings

Fix misspelled word based on checkpatch check:
CHECK: 'ffoo' may be misspelled - perhaps 'foo'?

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Split multiple assignments
Jubin John [Mon, 15 Feb 2016 04:20:50 +0000 (20:20 -0800)]
staging/rdma/hfi1: Split multiple assignments

Split multiple assignments into individual assignments to fix
checkpatch check:
CHECK: multiple assignments should be avoided

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Use BIT_ULL macro
Jubin John [Mon, 15 Feb 2016 04:20:42 +0000 (20:20 -0800)]
staging/rdma/hfi1: Use BIT_ULL macro

Use BIT_ULL macro to fix checkpatch check:
CHECK: Prefer using the BIT_ULL macro

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove unnecessary parentheses
Jubin John [Mon, 15 Feb 2016 04:20:33 +0000 (20:20 -0800)]
staging/rdma/hfi1: Remove unnecessary parentheses

Remove unnecessary parentheses around addressof single $Lvals to fix
checkpatch check:
CHECK: Unnecessary parentheses around $var

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Add blank link after declarations
Jubin John [Mon, 15 Feb 2016 04:20:25 +0000 (20:20 -0800)]
staging/rdma/hfi1: Add blank link after declarations

Add blank line after declarations to fix checkpatch check:
CHECK: Please use a blank line after function/struct/union/enum
declarations

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix logical continuations
Jubin John [Mon, 15 Feb 2016 04:20:15 +0000 (20:20 -0800)]
staging/rdma/hfi1: Fix logical continuations

Move logical continuations to previous line to fix checkpatch check:
CHECK: Logical continuations should be on the previous line

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove blank line before close brace
Jubin John [Mon, 15 Feb 2016 04:20:06 +0000 (20:20 -0800)]
staging/rdma/hfi1: Remove blank line before close brace

Remove extra blank line before close brace to fix checkpatch check:
CHECK: Blank lines aren't necessary before a close brace '}'

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove blank line after an open brace
Jubin John [Mon, 15 Feb 2016 04:19:58 +0000 (20:19 -0800)]
staging/rdma/hfi1: Remove blank line after an open brace

Remove blank line after an open brace to fix checkpatch check:
CHECK: Blank lines aren't necessary after an open brace '{'

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Fix comparison to NULL
Jubin John [Mon, 15 Feb 2016 04:19:49 +0000 (20:19 -0800)]
staging/rdma/hfi1: Fix comparison to NULL

Convert pointer comparisons to NULL to !pointer
to fix checkpatch check:
CHECK: Comparison to NULL could be written "!pointer"

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove space after cast
Jubin John [Mon, 15 Feb 2016 04:19:41 +0000 (20:19 -0800)]
staging/rdma/hfi1: Remove space after cast

Remove the space after a cast to fix checkpatch check:
CHECK: No space is necessary after a cast

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove multiple blank lines
Jubin John [Mon, 15 Feb 2016 04:19:32 +0000 (20:19 -0800)]
staging/rdma/hfi1: Remove multiple blank lines

Remove multiple blank lines to fix checkpatch check:
CHECK: Please don't use multiple blank lines

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Add spaces around binary operators
Jubin John [Mon, 15 Feb 2016 04:19:24 +0000 (20:19 -0800)]
staging/rdma/hfi1: Add spaces around binary operators

Add spaces around binary operators.

Fixes checkpatch check:
CHECK: spaces preferred around that 'x'
where x is a binary operator

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: add cq head and tail information to qpstats
Vennila Megavannan [Sun, 14 Feb 2016 20:46:28 +0000 (12:46 -0800)]
staging/rdma/hfi1: add cq head and tail information to qpstats

This enables debugging issues related to cq event signalling mechanism

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Add send context sw index
Jubin John [Sun, 14 Feb 2016 20:46:19 +0000 (12:46 -0800)]
staging/rdma/hfi1: Add send context sw index

Print the qp's send context sw index in the qpstats

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Determine actual operational VLs
Mike Marciniszyn [Sun, 14 Feb 2016 20:46:01 +0000 (12:46 -0800)]
staging/rdma/hfi1: Determine actual operational VLs

Use shared credits and dedicated credits for each VL to determine
the actual number of operational VLs.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Add qp to send context mapping for PIO
Jubin John [Sun, 14 Feb 2016 20:46:10 +0000 (12:46 -0800)]
staging/rdma/hfi1: Add qp to send context mapping for PIO

PIO send context mapping is changed from per-VL to QPN based.

qp to send context mapping is done using a mapping infrastructure
similar to the current vl to sdma engine mapping.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi: fix CQ completion order issue
Mike Marciniszyn [Sun, 14 Feb 2016 20:45:53 +0000 (12:45 -0800)]
staging/rdma/hfi: fix CQ completion order issue

The current implementation of the sdma_wait variable
has a timing hole that can cause a completion Q entry
to be returned from a pio send prior to an older
sdma packets completion queue entry.

The sdma_wait variable used to be decremented prior to
calling the packet complete routine.  The hole is between decrement
and the verbs completion where send engine using pio could return
a out of order completion in that window.

This patch closes the hole by allowing an API option to
specify an sdma_drained callback.   The atomic dec
is positioned after the complete callback to avoid the
window as long as the pio path doesn't execute when
there is a non-zero sdma count.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib, staging/rdma/hfi1, IB/rdmavt: progress selection changes
Mike Marciniszyn [Sun, 14 Feb 2016 20:45:44 +0000 (12:45 -0800)]
IB/qib, staging/rdma/hfi1, IB/rdmavt: progress selection changes

The non-rdamvt versions of qib and hfi1 allow for a differing
heuristic to override a schedule progress in favor of a direct
call the the progress routine.

This patch adds that for both drivers and rdmavt.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Adaptive PIO for short messages
Mike Marciniszyn [Sun, 14 Feb 2016 20:45:36 +0000 (12:45 -0800)]
staging/rdma/hfi1: Adaptive PIO for short messages

The change requires a new pio_busy field in the iowait structure to
track the number of outstanding pios.  The new counter together
with the sdma counter serve as the basis for a packet by packet decision
as to which egress mechanism to use.  Since packets given to different
egress mechanisms are not ordered, this scheme will preserve the order.

The iowait drain/wait mechanisms are extended for a pio case.  An
additional qp wait flag is added for the PIO drain wait case.

Currently the only pio wait is for buffers, so the no_bufs_available()
routine name is changed to pio_wait() and a third argument is passed
with one of the two pio wait flags to generalize the routine.  A module
parameter is added to hold a configurable threshold. For now, the
module parameter is zero.

A heuristic routine is added to return the func pointer of the proper
egress routine to use.

The heuristic is as follows:
- SMI always uses pio
- GSI,UD qps <= threshold use pio
- UD qps > threadhold use sdma
  o No coordination with sdma is required because order is not required
    and this qp pio count is not maintained for UD
- RC/UC ONLY packets <= threshold chose as follows:
  o If sdmas pending, use SDMA
  o Otherwise use pio and enable the pio tracking count at
    the time the pio buffer is allocated
- RC/UC ONLY packets > threshold use SDMA
  o If pio's are pending the pio_wait with the new wait flag is
    called to delay for pios to drain

The threshold is potentially reduced by the QP's mtu.

The sc_buffer_alloc() has two additional args (a callback, a void *)
which are exploited by the RC/UC cases to pass a new complete routine
and a qp *.

When the shadow ring completes the credit associated with a packet,
the new complete routine is called.  The verbs_pio_complete() will then
decrement the busy count and trigger any drain waiters in qp destroy
or reset.

Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: use u8 for vl/sl
Mike Marciniszyn [Sun, 14 Feb 2016 20:45:27 +0000 (12:45 -0800)]
staging/rdma/hfi1: use u8 for vl/sl

The use should match the universal container size.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: fix panic in send engine
Mike Marciniszyn [Sun, 14 Feb 2016 20:45:18 +0000 (12:45 -0800)]
staging/rdma/hfi1: fix panic in send engine

The send engine wasn't correctly handling
pre-built packets, and worse, the pointer to
a packet state's txreq wasn't initialized correctly.

To fix:
- all waiters need to save any prebuilt packets
  (smda waits already did)
- the progress routine needs to handle a QPs prebuilt packet
  and initialize the txreq pointer properly

To keep SDMA working, the dma send code needs to see if
a packet has been built already. If not the code will build
it.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: avoid passing pmtu
Mike Marciniszyn [Sun, 14 Feb 2016 20:45:09 +0000 (12:45 -0800)]
staging/rdma/hfi1: avoid passing pmtu

It is in the qp.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Add s_sendcontext priv field
Jubin John [Sun, 14 Feb 2016 20:45:00 +0000 (12:45 -0800)]
staging/rdma/hfi1: Add s_sendcontext priv field

s_sendcontext will be used to map the QPs to the send contexts
for PIO.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: remove s_rdma_mr
Mike Marciniszyn [Sun, 14 Feb 2016 20:44:52 +0000 (12:44 -0800)]
staging/rdma/hfi1: remove s_rdma_mr

It can be conveyed in the verbs_txreq.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: Remove header memcpy from sdma send path.
Dennis Dalessandro [Sun, 14 Feb 2016 20:44:43 +0000 (12:44 -0800)]
staging/rdma/hfi1: Remove header memcpy from sdma send path.

Instead of writing the header into a buffer then copying it into another
buffer to be sent, remove that memcpy and instead build the header directly
into the tx request that will be sent.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: move txreq header code
Mike Marciniszyn [Sun, 14 Feb 2016 20:44:34 +0000 (12:44 -0800)]
staging/rdma/hfi1: move txreq header code

The patch separates the txreq defines into new files, one for
verbs and one for sdma.

The verbs_txreq implementation handles the setup and teardown
of the txreq cache, so the register routine is changed to call
the new init/exit routines.

This patch allows for followup patches enhance the send engine.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmvt: close send engine struct holes
Mike Marciniszyn [Sun, 14 Feb 2016 20:44:26 +0000 (12:44 -0800)]
IB/rdmvt: close send engine struct holes

pahole noted the wasted 4 bytes after s_lock and r_lock.

Move s_flags and r_psn to fill the holes.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agostaging/rdma/hfi1: add s_avail to qp_stats
Mike Marciniszyn [Sun, 14 Feb 2016 20:44:17 +0000 (12:44 -0800)]
staging/rdma/hfi1: add s_avail to qp_stats

This diagnostic capability was missed in the dual lock series.

Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Destroy SMI AH before de-allocating the protection domain
Harish Chegondi [Sun, 14 Feb 2016 20:11:28 +0000 (12:11 -0800)]
IB/qib: Destroy SMI AH before de-allocating the protection domain

If SMI AH is not destroyed before de-allocating the PD, it would result in
non-zero PD use count when de-allocating the PD, triggering a WARN_ON() at
drivers/infiniband/core/verbs.c:284 ib_dealloc_pd+0x69/0xb0 [ib_core]()
when unloading the qib driver on systems with dual-port card.

This problem has always been there in qib and was detected only after the
commit 7dd78647a2c2 ("IB/core: Make ib_dealloc_pd return void") introduced
a WARN_ON in ib_dealloc_pd() that triggers if a PD's use count is non-zero
before de-allocating the PD.

Below is the call trace from the dmesg log.

[ 7264.966129] Call Trace:
[ 7264.969652]  [<ffffffff81338470>] dump_stack+0x44/0x64
[ 7264.976181]  [<ffffffff81086bb6>] warn_slowpath_common+0x86/0xc0
[ 7264.983656]  [<ffffffff81086cfa>] warn_slowpath_null+0x1a/0x20
[ 7264.990961]  [<ffffffffa025c2d9>] ib_dealloc_pd+0x69/0xb0 [ib_core]
[ 7264.998717]  [<ffffffffa0044de8>] ib_mad_port_close+0xb8/0x120 [ib_mad]
[ 7265.006866]  [<ffffffffa0044ebf>] ib_mad_remove_device+0x6f/0xc0 [ib_mad]
[ 7265.015224]  [<ffffffffa025fc87>] ib_unregister_device+0xa7/0x140 [ib_core]
[ 7265.023738]  [<ffffffffa04b5b79>] rvt_unregister_device+0x29/0x80 [rdmavt]
[ 7265.032181]  [<ffffffffa088d2a2>] qib_unregister_ib_device+0x22/0x210 [ib_qib]
[ 7265.040993]  [<ffffffffa085f73f>] qib_remove_one+0x1f/0x250 [ib_qib]
[ 7265.048823]  [<ffffffff8137a319>] pci_device_remove+0x39/0xc0
[ 7265.055984]  [<ffffffff81466a1a>] __device_release_driver+0x9a/0x140
[ 7265.063821]  [<ffffffff81466bc8>] driver_detach+0xb8/0xc0
[ 7265.070579]  [<ffffffff81465a15>] bus_remove_driver+0x55/0xd0
[ 7265.077717]  [<ffffffff8146732c>] driver_unregister+0x2c/0x50
[ 7265.084849]  [<ffffffff813789ba>] pci_unregister_driver+0x2a/0x80
[ 7265.092366]  [<ffffffffa08921bd>] qib_ib_cleanup+0x37/0x65 [ib_qib]
[ 7265.100068]  [<ffffffff811096d0>] SyS_delete_module+0x190/0x220
[ 7265.107379]  [<ffffffff816a7bae>] entry_SYSCALL_64_fastpath+0x12/0x71

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Remove unnecessary exported functions
Dennis Dalessandro [Sun, 14 Feb 2016 20:11:20 +0000 (12:11 -0800)]
IB/rdmavt: Remove unnecessary exported functions

Remove exported functions which are no longer required as the
functionality has moved into rdmavt. This also requires re-ordering some
of the functions since their prototype no longer appears in a header
file. Rather than add forward declarations it is just cleaner to
re-order some of the functions.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Remove signal_supported and comments
Dennis Dalessandro [Sun, 14 Feb 2016 20:11:12 +0000 (12:11 -0800)]
IB/rdmavt: Remove signal_supported and comments

Initially it was intended that rdmavt would support some signaling
between the underlying driver and itself. However this turned out to be
unnecessary for qib and hfi1. If we need to add something like this in
later to support another driver we should do it then. As of now this
essentially dead code so remove it.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/rdmavt: Remove RVT_FLAGs
Dennis Dalessandro [Sun, 14 Feb 2016 20:11:03 +0000 (12:11 -0800)]
IB/rdmavt: Remove RVT_FLAGs

While hfi1 and qib were still supporting bits and pieces of core verbs
components there needed to be a way to convey if rdmavt should handle
allocation and initialize of resources like the queue pair table. Now
that all of this is moved into rdmavt there is no need for these flags.
They are no longer used in the drivers.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib,rdmavt: Move smi_ah to qib
Dennis Dalessandro [Sun, 14 Feb 2016 20:10:55 +0000 (12:10 -0800)]
IB/qib,rdmavt: Move smi_ah to qib

Rdmavt adopted an smi_ah from qib which is not needed by hfi1. Move this
back to qib and get it out of the common library.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
8 years agoIB/qib: Setup notify free/create mad agent callbacks for rdmavt
Dennis Dalessandro [Sun, 14 Feb 2016 20:10:45 +0000 (12:10 -0800)]
IB/qib: Setup notify free/create mad agent callbacks for rdmavt

Qib needs to be notified when mad agents are created and freed, there is
some counter maintenance that needs to be performed. Add those callbacks at
registration time with rdmavt.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>