This patch fixes the fairness issues in QP scheduling
- the timeout for cond_resched is changed to a ratio of
qp->timeout_jiffies
- workqueue_congested is used to determine if qp needs to
reschedule itself
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:34:15 +0000 (14:34 -0800)]
staging/rdma/hfi1: Fix for generic I2C interface
The original I2C interface was geared for QSFP accesses. Modify
the interface to behave more like a generic I2C controller such
that reads and writes can accept multi-byte offsets. Removed
reads following writes and moved reset to top level.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Pablo Cacho <pablo.cacho@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:33:40 +0000 (14:33 -0800)]
staging/rdma/hfi1: Consolidate CPU/IRQ affinity support
This patch unifies the affinity support for CPU and IRQ allocations into
a single code base. The goal is to allow the driver to make intelligent
placement decision based on an overall view of processes and IRQs across
as much of the driver as possible.
Pulling all the scattered affinity code into a single code base lays the
ground work for accomplishing the above goal. For example, previous
implementations made user process placement decision solely based on
other user processes. This algorithm is limited as it did not take into
account IRQ placement and could result in overloading certain CPUs.
A single code base also provides a much easier way to maintain and debug
any performance issues related to affinity.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ashutosh Dixit [Wed, 3 Feb 2016 22:33:06 +0000 (14:33 -0800)]
staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM
hfi1 HW has a high PCIe ASPM L1 exit latency and also advertises an
acceptable latency less than actual ASPM latencies. Additional
mechanisms than those provided by BIOS/OS are therefore required to
enable/disable ASPM for hfi1 to provide acceptable power/performance
trade offs. This patch adds this support.
By means of a module parameter ASPM can be either (a) always enabled
(power save mode) (b) always disabled (performance mode) (c)
enabled/disabled dynamically. The dynamic mode implements two
heuristics to alleviate possible problems with high ASPM L1 exit
latency. ASPM is normally enabled but is disabled if (a) there are any
active user space PSM contexts, or (b) for verbs, ASPM is disabled as
interrupt activity for a context starts to increase.
A few more points about the verbs implementation. In order to reduce
lock/cache contention between multiple verbs contexts, some processing
is done at the context layer before contending for device layer
locks. ASPM is disabled when two interrupts for a context happen
within 1 millisec. A timer is scheduled which will re-enable ASPM
after 1 second should the interrupt activity cease. Normally, every
interrupt, or interrupt-pair should push the timer out
further. However, since this might increase the processing load per
interrupt, pushing the timer out is postponed for half a second. If
after half a second we get two interrupts within 1 millisec the timer
is pushed out by another second.
Finally, the kernel ASPM API is not used in this patch. This is
because this patch does several non-standard things as SW workarounds
for HW issues. As mentioned above, it enables ASPM even when advertised
actual latencies are greater than acceptable latencies. Also, whereas
the kernel API only allows drivers to disable ASPM from driver probe,
this patch enables/disables ASPM directly from interrupt context. Due
to these reasons the kernel ASPM API was not used.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:32:49 +0000 (14:32 -0800)]
staging/rdma/hfi1: Correctly set RcvCtxtCtrl register
The RcvCtxtCtrl register was being incorrectly set upon context
initialization and clean up resulting, in many cases, of contexts using
settings from previous contexts' initialization. This resulted in bad
and unexpected behavior. This was especially important for the TailUpd
bit, which requires special handling and if set incorrectly could lead
to severely degraded performance.
This patch fixes the handling of the RcvCtxtCtrl register, ensuring that
each context gets initialized with settings applicable only for that
context. It also ensures the proper setting for the TailUpd bit by
setting it to either 0 or 1 (as needed by the context's configuration)
explicitly.
staging/rdma/hfi1: Fix for 32-bit counter overflow in driver and hfi1stats
When 32-bit hardware counters overflow, hfi1stats misinterprets
the counters as being 64 bits causing the deltas for the
counters to be a huge number. This patch makes hfi1stats
aware that a counter is 32 bits by making the driver write
<counter name>,32 to debugfs.
Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:32:23 +0000 (14:32 -0800)]
staging/rdma/hfi1: No firmware retry for simulation
Simulation has no firmware, so it will never move firmware
acquire to the FINAL state. Avoid that by skiping the TRY
state and moving directly to FINAL.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Don't attempt to qualify or tune loopback plugs
Loopback plugs used for testing hardware don't need to be qualified to
bring the link up unlike production cables. This patch adds an exception
for loopback plugs to the QSFP and SerDes tuning algortihm.
Implement per-VL transmit counters. Not all errors can be
attributed to a particular VL, so make a best attempt.
o Extend the egress error bits used to count toward transmit
discard.
o When an egress error or send error occur, try to map back
to a VL.
o Implement a SDMA engine to VL (back) map.
o Add per-VL port transmit counters
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
The gen3 bump code must mark a firmware download failure as fatal.
Otherwise a later load attempt will fail with a NULL dereference.
Also:
o Only do a firmware back-off for RTL. There are no alternates for
FPGA or simulation.
o Rearrange OS firmware request order to match what is actually
loaded. This results in more coherent informational messages
in the case of missing firmware.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Support external device configuration requests from 8051
This patch implements support for turning on and off the clock data
recovery mechanisms implemented in QSFP cable on request by the DC 8051
on a per-lane basis.
staging/rdma/hfi1: Get port type from configuration file
The current code employs a heuristic to guess the port type.
The canonical location to identify the port type of the
designed platform is from the platform configuration data.
This patch uses the previously fetched port type from the platform
configuration and removes the now obsolete heuristic routine
and its associated defines.
staging/rdma/hfi1: Add active and optical cable support
This patch qualifies and tunes active and optical cables for optimal
bit error rate and signal integrity settings. These settings are
fetched from the platform configuration data.
Based on attributes of the QSFP cable as read from the SFF-8636
compliant memory map, we select the appropriate settings from the
platform configuration data (examples: TX/RX equalization, enabling
cable high power, enabling TX/RX clock data recovery mechanisms, and RX
amplitude control) and apply them to the SERDES and QSFP cable.
The platform configuration data also contains system parameters such
as maximum power dissipation supported, and the cables are qualified
based on these parameters. As part of qualifying the cables, the
correct OfflineDisabledReasons are set for the appropriate scenarios.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Brent R Rothermel <brent.r.rothermel@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Fix QSFP memory read/write across 128 byte boundary
The QSFP memory cache reads both lower and upper page 0H in one shot,
which leads to the address counter wrapping around to the beginning of
lower page 00H at byte 128, as defined by SFF-8636.
This patch fixes this by modifying the underlying QSFP read and writes
to avoid this wrap around.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Bryan Morgan [Wed, 3 Feb 2016 22:30:49 +0000 (14:30 -0800)]
staging/rdma/hfi1: HFI reports wrong offline disabled reason when cable removed
Removing QSFP cable should report 'No Local Media' instead of
'Transient' as reported by 'opaportinfo'.
Workaround is to change the state to
OPA_LINKDOWN_REASON_LOCAL_MEDIA_NOT_INSTALLED in cable handler.
With cable still removed, 'opaportinfo bounce' should not cause a
state change to Polling, as reported by 'opaportinfo'.
Resolution is to prevent physical state change from Offline->Polling.
Use a macro to mask lower nibble of OPA_LINKDOWN_REASON* as needed
for offline_disabled_reason.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reported-by: Todd Rimmer <todd.rimmer@intel.com> Signed-off-by: Bryan Morgan <bryan.c.morgan@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Remove modify queue pair from hfi1
In addition to removing the modify queue pair verb from hfi1 we also
remove ancillary functions which existed only for modify queue pair and
are also already present in hfi1.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Use rdmavt version of post_send
This patch removes the post_send and post_one_send from the hfi1 driver.
The "posting" of sends will be done by rdmavt which will walk a WQE and
queue work. This patch will still provide the capability to schedule that
work as well as kick the progress. These are provided to the rdmavt layer.
Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Edward Mascarenhas <edward.mascarenhas@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Remove CQ data structures and functions from hfi1
The completion queue is not a complex data structure and it can be removed
at the same time as its functions. Unlike the more complicated queue pair
which was done in multiple patches. This single patch removes all traces
of hfi1 specific completeion queues from the hfi1 driver.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Remove qpdev and qpn table from hfi1
Another change on the way to removing queue pair functionality from
hfi1. This patch removes the private queue pair structure and the table
which holds the queue pair numbers in favor of using what is provided
by rdmavt.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
No need to keep providing the query pkey function. This is now being
done in rdmavt. Remove support from hfi1. The allocation and
maintenance of the list still resides in the driver.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mmap data structure has already been moved to rdmavt and hfi1 supports
it. Now that the mmap functionality has also been moved to rdmavt its
time for hfi1 to use that as well.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: Remove driver specific members from hfi1 qp type
In preparation for moving the queue pair data structure to rdmavt the
members of the driver specific queue pairs which are not common need to be
pushed off to a private driver structure. This structure will be available
in the queue pair once moved to rdmavt as a void pointer. This patch while
not adding a lot of value in and of itself is a prerequisite to move the
queue pair out of the drivers and into rdmavt.
The driver specific, private queue pair data structure should condense as
more of the send side code moves to rdmavt.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch begins to make use of rdmavt by registering with it and
providing access to the header files. This is just the beginning of
rdmavt support in hfi1.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Wed, 3 Feb 2016 22:20:52 +0000 (14:20 -0800)]
IB/qib: Remove modify_port and port_immutable functions
Delete code from query_port which has been moved into rvt_query_port
Create a call back function to shut down a port which may be called from
rvt_modify_port
Query gid is in rdmavt, but still relies on the driver to maintain the
guid table. Add the necessary driver call back and remove the existing
verb handler.
Harish Chegondi [Wed, 3 Feb 2016 22:20:19 +0000 (14:20 -0800)]
IB/qib: Remove qib_lookup_qpn and use rvt_lookup_qpn instead
Add calls to rcu_read_lock()/rcu_read_unlock() as rvt_lookup_qpn callers
must hold the rcu_read_lock before calling and keep the lock until the
returned qp is no longer in use.
Remove lookaside qp and some qp refcount atomics in the sdma send code
that is redundant with the s_dma_busy refcount, which will also stall
the state processing to the reset state.
Change the qpn hash function to hash_32 which is hash function used
in rvt_lookup_qpn. qpn_hash function would be eliminated in later patches.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Wed, 3 Feb 2016 22:15:20 +0000 (14:15 -0800)]
IB/rdmavt: Add support for query_port, modify_port and get_port_immutable
rvt_query_port calls into the driver through a call back function
query_port_state to populate the rest of ib_port_attr elements.
rvt_modify_port calls into the driver if needed through a call back
function shut_down_port()
Addin query gid support. Rdmavt still relies on the driver to maintain
the gid table. Rdmavt simply calls into the driver to retrive the guid
for a particular port.
IB/rdmavt: Clean up distinction between port number and index
IB core uses 1 relative indexing for ports. All of our data structures
use 0 based indexing. Add an inline function that we can use whenever we
need to validate a legal value and try to convert a port number to a
port index at the entrance into rdmavt.
Try to follow the policy that when we are talking about a port from IB
core point of view we refer to it as a port number. When port is an
index into our arrays refer to it as a port index.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Fri, 22 Jan 2016 21:07:42 +0000 (13:07 -0800)]
IB/qib: Use rdmavt version of post_send
This patch removes the post_send and post_one_send from the qib driver.
The "posting" of sends will be done by rdmavt which will walk a WQE and
queue work. This patch will still provide the capability to schedule that
work as well as kick the progress. These are provided to the rdmavt layer.
This patch adds rdmavt device structure allocation in rdamvt. The
ib_device alloc is now done in rdmavt instead of the driver. Drivers
need to tell rdmavt the number of ports when calling.
A side of effect of this patch is fixing a bug with port initialization
where the device structure port array was allocated over top of an
existing one.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Fri, 22 Jan 2016 21:04:38 +0000 (13:04 -0800)]
IB/rdmavt: add modify queue pair driver helpers
Low level drivers need to be able to check incoming attributes as well as be
able to adjust their private data on queue pair modification. Add 2 driver
callbacks, check_modify_qp and modify_qp, to facilitate this.
Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
There are a number of minor things that should be set by rdmavt rather
than by the drivers. Now that rdmavt has solidified in its design we can
go ahead and clean up this stuff.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds the simple post receive verbs call to rdmavt. The actual
interrupt handling and packet processing is still done in the low level
driver.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds support of tracing events using the kernels built-in event
tracing infrastructure. This can be extended to provide a wide range of
trace and debug capabilities which have a negligible impact on performance
when enabled. These should be preferred over the use of the rvt_pr*
functions.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Add in a post_send and post_one_send to rdmavt. The ULP will provide a WQE
to rdmavt which will then walk and queue each element. Rdmavt will then
queue the work to be done in the driver or kick the driver's progress
routine.
There needs to be a follow on patch which adds in another lock for the
head of the queue so that it can be added to and read from in parallel.
This will touch protocol handlers and require other changes in the
drivers. This will be done separately.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Fri, 22 Jan 2016 20:56:52 +0000 (12:56 -0800)]
IB/qib: Remove create qp and create qp table functionality
Rely on rdmavt functions for creation of qp and qp table. Function to
allocate a qpn is still being provided by qib as the algorithm to allocate
a qpn in qib is different from that of the algorithm in rdmavt.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Fri, 22 Jan 2016 20:56:21 +0000 (12:56 -0800)]
IB/qib: Use rdmavt pkey verbs function
Remove qib query pkey function which is no longer needed as this is now
being done in rdmavt. The allocation and maintenance of the list still
resides in the driver.