For OPA devices, SA will query the OPA classport info
instead of the IB defined classport info.
opa classport info exposes additional information and
capabilities that are specific to OPA devices.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/core: Move opa_class_port_info definition to header file
Both opa_vnic and the hfi driver use the same opa_classport_info
definition. We will also have ib_sa capable of querying opa class
port info and would need this definition. Move it to ib_mad.h
for everyone to use.
IB/SA: Modify SA to implicitly cache Class Port info
SA will query and cache class port info as part of
its initialization. SA will also invalidate and
refresh the cache based on specific events. Callers such
as IPoIB and CM can query the SA to get the classportinfo
information. Apart from making the caller code much simpler,
this change puts the onus on the SA to query and maintain
classportinfo much like how it maitains the address handle to the SM.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
The process_ecn intends to return a bool value. However it is doing
so incorrectly by ANDing the fecn mask. The fecn bit is bit 31. Bool is
not a native data type and is up to the compiler to implement how it
sees fit. It is conceivable that this upper bit gets washed out.
Fix by converting to a bool properly.
Cc: stable@vger.kernel.org Fixes: Commit fd2b562edca6 ("IB/hfi1: Pull FECN/BECN processing to a common place") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Fixes: 8764522e5251 ("staging/rdma/hfi1: Unexpected link up pkey values are not an error") Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ingress and egress port P_Key checking should always be performed for
HFIs. This patch will enable ingress and egress P_Key checking when
the port is initialized and will ignore the P_Key information sent by
the FM in the port info structure which is meant to be used only by the
switch.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Neel Desai <neel.desai@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Stuart Summers [Sun, 9 Apr 2017 17:16:53 +0000 (10:16 -0700)]
IB/hfi1: Cache neighbor secure data after link up
Secure data is transferred across the link during verify
cap. This includes Neighbor Guid, Type, and Port Number.
This transfer is not guaranteed to complete until the 8051
firmware has completed processing of the state_complete
frame. Move the consumption of this data from verify cap
handling to link up handling to ensure the data is finalized.
Additionally, do not notify the SM that the link is up until
after this data is actually available.
Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Stuart Summers <john.s.summers@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
This issue is made worse when the 8051 is busy and the reads take longer.
Fix by using a non-spinning lock procure.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Mike Marciszyn <mike.marciniszyn@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver progress routines can call cond_resched() when
a timeslice is exhausted and irqs are enabled.
If the ULP had been holding a spin lock without disabling irqs and
the post send directly called the progress routine, the cond_resched()
could yield allowing another thread from the same ULP to deadlock
on that same lock.
Correct by replacing the current hfi1_do_send() calldown with a unique
one for post send and adding an argument to hfi1_do_send() to indicate
that the send engine is running in a thread. If the routine is not
running in a thread, avoid calling cond_resched().
CC: <stable@vger.kernel.org> # 4.7.x- Fixes: Commit 831464ce4b74 ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Return SC2VL mappings to FM with VL15 instead of ILLEGAL_VL
VL15 in the SC2VL table is used to indicate an invalid SC
for the FM, however, internally the driver remaps SCs from
VL15 to ILLEGAL_VL to prevent error counts. This mapping
confuses the FM when performing a sweep, making it return
a table mismatch error. Have SMA convert ILLEGAL_VL
to VL15 entries for the SC2VL table queries.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Michael J. Ruhl [Sun, 9 Apr 2017 17:15:44 +0000 (10:15 -0700)]
IB/hfi1: Correct MulticastMask/CollectiveMask info to SMA output
The FM uses the values of MulticastMask and CollectiveMask to
determine the number of bits for net masks. The current values of
0 and 0 are incorrect. The values should be 4 and 1. Updated the
necessary code to reflect the specified values.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Michael J. Ruhl [Sun, 9 Apr 2017 17:15:32 +0000 (10:15 -0700)]
IB/core: If the MGID/MLID pair is not on the list return an error
A list of MGID/MLID pairs is built when doing a multicast attach. When
the multicast detach is called, the list is searched, and regardless of
the search outcome, the driver detach is called.
If an MGID/MLID pair is not on the list, driver detach should not be
called, and an error should be returned. Calling the driver without
removing an MGID/MLID pair from the list can leave the core and driver
out of sync.
Fixes: f4e401562c11 ("IB/uverbs: track multicast group membership for userspace QPs") Cc: stable@vger.kernel.org Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sat, 22 Apr 2017 14:28:51 +0000 (17:28 +0300)]
IB/usnic: Simplify the code to balance loc/unlock calls
Simplify code in find_free_vf_and_create_qp_grp() to avoid sparse error
regarding call to unlock in the block other than lock was called.
drivers/infiniband/hw/usnic/usnic_ib_verbs.c:206:9: warning: context imbalance
in 'find_free_vf_and_create_qp_grp' - different lock
contexts for basic block
CC: Christian Benvenuti <benve@cisco.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sat, 22 Apr 2017 14:28:50 +0000 (17:28 +0300)]
Ib/usnic: Explicitly include usnic headers
Sparse tool complains about undeclared symbols in usnic_ib_verbs.c
and usnic_ib_sysfs.c This is caused by lack of direct include of
appropriate usnic_ib_verbs.h and usnic_ib_sysfs.h, where all
these functions were declared.
Simple include eliminates 30 warnings similar to the below one:
drivers/infiniband/hw/usnic/usnic_ib_sysfs.c:304:6: warning: symbol
'usnic_ib_sysfs_unregister_usdev' was
not declared. Should it be static?
CC: Christian Benvenuti <benve@cisco.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sat, 22 Apr 2017 14:28:49 +0000 (17:28 +0300)]
Ib/core: Mark local uverbs_std_types functions to be static
Functions declared in uverbs_std_types.c are local to that file, but
they lack static declarations. This produces a lot of sparse warnings,
like the one below:
drivers/infiniband/core/uverbs_std_types.c:41:5: warning: symbol
'uverbs_free_ah' was not declared.
Should it be static?
Pan Bian [Sun, 23 Apr 2017 09:09:11 +0000 (17:09 +0800)]
iw_cxgb4: check return value of alloc_skb
Function alloc_skb() will return a NULL pointer when there is no enough
memory. However, the return value of alloc_skb() is directly used
without validation in function send_fw_pass_open_req(). This patches
checks the return value of alloc_skb() against NULL.
Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Paolo Abeni [Fri, 28 Apr 2017 09:20:01 +0000 (11:20 +0200)]
infiniband: call ipv6 route lookup via the stub interface
The infiniband address handle can be triggered to resolve an ipv6
address in response to MAD packets, regardless of the ipv6
module being disabled via the kernel command line argument.
That will cause a call into the ipv6 routing code, which is not
initialized, and a conseguent oops.
This commit addresses the above issue replacing the direct lookup
call with an indirect one via the ipv6 stub, which is properly
initialized according to the ipv6 status (e.g. if ipv6 is
disabled, the routing lookup fails gracefully)
After checking the path upwards towards root complex, actualy check
root complex atomic_req capability, and not our own NIC.
Verify that the PCIe device control register's atomic egress block
is cleared in the path.
Verify that the PCIe version is at least 2.
To make page fault handling code more flexible
split pagefault_single_data_segment() function.
Keep MR resolution in pagefault_single_data_segment() and
move actual updates into pagefault_single_mr().
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.
When implicit MR's leaf MKey becomes unused, i.e. when it's
last page being released my MMU invalidation it is marked as "dying"
and scheduled for release by garbage collector.
Currentle consequent page fault may remove "dying" flag.
Treat leaf MKey as non-existent once it was scheduled to removal
by GC.
Translation table updates of large UMR may require multiple post send
operations. The last operations can be in various lengths, but current
code set them to be the same length.
Fixes: 7d0cc6edcc70 ('IB/mlx5: Add MR cache for large UMR regions') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Since ocrdma driver is not going to be updated with any
new development activity, except for critical bug fixes
reported by partners or customers, changing the module status
to "Odd Fixes". Also, updating the web page info and the
maintainers email addresses.
Vlad Tsyrklevich [Fri, 24 Mar 2017 19:55:17 +0000 (15:55 -0400)]
infiniband/uverbs: Fix integer overflows
The 'num_sge' variable is verfied to be smaller than the 'sge_count'
variable; however, since both are user-controlled it's possible to cause
an integer overflow for the kmalloc multiply on 32-bit platforms
(num_sge and sge_count are both defined u32). By crafting an input that
causes a smaller-than-expected allocation it's possible to write
controlled data out-of-bounds.
Arnd Bergmann [Fri, 24 Mar 2017 22:02:48 +0000 (23:02 +0100)]
infiniband: hns: avoid gcc-7.0.1 warning for uninitialized data
hns_roce_v1_cq_set_ci() calls roce_set_bit() on an uninitialized field,
which will then change only a few of its bits, causing a warning with
the latest gcc:
infiniband/hw/hns/hns_roce_hw_v1.c: In function 'hns_roce_v1_cq_set_ci':
infiniband/hw/hns/hns_roce_hw_v1.c:1854:23: error: 'doorbell[1]' is used uninitialized in this function [-Werror=uninitialized]
roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1);
The code is actually correct since we always set all bits of the
port_vlan field, but gcc correctly points out that the first
access does contain uninitialized data.
This initializes the field to zero first before setting the
individual bits.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
Petr Mladek [Mon, 17 Oct 2016 15:39:32 +0000 (17:39 +0200)]
IB/fmr_pool: Convert the cleanup thread into kthread worker API
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the frm_pool kthread into the kthread worker
API because I am not sure how busy the thread is. It is well
possible that it does not need a dedicated kthread and workqueues
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.
The patch moves one iteration from the kthread into the work function.
It is queued only when there is a pending work. Therefore we do not
need to compare flush_ser and req_ser at the beginning. On the contrary,
the same work could be queued only once at a time. Therefore it has to
re-queue itself if some requests are pending.
Otherwise, wake_up_process() is replaced by queuing the work.
Important: The change is only compile tested. I did not find an easy
way how to check it in a real life.
Signed-off-by: Petr Mladek <pmladek@suse.com>
TO: Doug Ledford <dledford@redhat.com> CC: Sean Hefty <sean.hefty@intel.com> CC: Hal Rosenstock <hal.rosenstock@gmail.com> CC: linux-rdma@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
trivial fix to spelling mistake in iser_err error message
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
RDMA/bnxt_re: Use IS_ERR_OR_NULL where appropriate
Constructs such as if (ptr && !IS_ERR(ptr)) can be shorted to
just !IS_ERR_OR_NULL(ptr) instead. Make substitutions in the bnxt_re
driver where appropriate.
Colin Ian King [Fri, 17 Feb 2017 15:35:22 +0000 (15:35 +0000)]
RDMA/bnxt_re: remove redundant initialization of rc to zero
rc is initialized to zero but is then updated by calls to
bnxt_qplib_free_fast_reg_page_list and/or bnxt_qpliob_free_mrw
so the initialization is redundant and can be removed.
Detected with CoverityScan, CID#1408448 ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 20 Apr 2017 10:26:54 +0000 (13:26 +0300)]
IB/mlx5: Set correct SL in completion for RoCE
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.
While that, this patch also fills the vlan_id field in the completion if
present.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Sun, 16 Apr 2017 04:31:34 +0000 (07:31 +0300)]
IB/cma: Send MRA for reply messages
Current implementation of RDMA_CM sends MRA (Message Receipt
Acknowledgment) only for request messages but not for response messages.
As a result, a slow active side of the connection may send a ready-to-use
message to the passive side in a delay that is too long for the passive
side to wait for.
This patch adds a call to ib_send_cm_mra() upon receiving a response
message and by this tells the other side to modify the service timeout
to a bigger value, 16 times than before. As in the request case, MRA
for reply will be sent only if a duplicate response has arrived.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Matan Barak <matan@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds support to query the congestion related hardware counters
through new command and links them with other hw counters being available
in hw_counters sysfs location.
In order to reuse existing infrastructure it renames related q_counter
data structures to more generic counters to reflect q_counters and
congestion counters and maybe some other counters in the future.
New hardware counters:
* rp_cnp_handled - CNP packets handled by the reaction point
* rp_cnp_ignored - CNP packets ignored by the reaction point
* np_cnp_sent - CNP packets sent by notification point to respond to
CE marked RoCE packets
* np_ecn_marked_roce_packets - CE marked RoCE packets received by
notification point
It also avoids returning ENOSYS which is specific for invalid
system call and produces the following checkpatch.pl warning.
WARNING: ENOSYS means 'invalid syscall nr' and nothing else
+ return -ENOSYS;
Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sat, 15 Apr 2017 15:47:25 +0000 (18:47 +0300)]
IB/mthca: Check validity of output parameter pointer
The mthca driver didn't check supplied pointer to functions
mthca_cmd_poll() and mthca_cmd_wait(). This caused to the following
smatch errors:
drivers/infiniband/hw/mthca/mthca_cmd.c:371 mthca_cmd_poll() error: we previously assumed 'out_param' could be null (see line 353)
drivers/infiniband/hw/mthca/mthca_cmd.c:454 mthca_cmd_wait() error: we previously assumed 'out_param' could be null (see line 432)
In reality all callers of these functions are setting out_is_imm
flag are providing pointer too. However it is better to check
again to remove smatch errors to achieve warning free subsystem.
Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
A drop rule is described by an action drop and no destination.
If a user specified IB_FLOW_SPEC_ACTION_DROP then set the action
to MLX5_FLOW_CONTEXT_ACTION_DROP and clear the destination.
Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
This flow steering specification identifies flow for drop by the HW.
If user create a flow only with the drop specification,
then all the packets that hit this flow will be dropped, otherwise the HW
will drop only the packets that match the other L2/L3/L4 specifications.
Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/mlx5: Use IP version matching to classify IP traffic
This change adds the ability for flow steering to classify IPv4/6
packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
packets and hit IPv4/6 classifed steering rules.
When user added a flow rule with IP classification, driver was
implicitly adding ethertype matching to the created rule in order
to distinguish between IPv4 and IPv6 protocols.
Since IP packets with MPLS tag header have MPLS ethertype, they missed
the rule and ended up hitting the default filters.
Such behavior prevented from MPLS packets to undergo inbound traffic
load balancing flows (if such were defined by configuring RSS) to
achieve higher throughput - the way that non-MPLS IP packets performed.
Since our device is able to look past the MPLS tag and identify the
next protocol we introduce this solution which replaces Ethertype
matching by the device's capability to perform IP version parsing
and matching in order to distinguish between IPv4 and IPv6.
Therefore, whenever a flow with IP spec is added and device support IP
version matching, driver will implicitly add IP version matching to the
rule (Based on the IP spec type) without Ethertype matching which will
cause relevant MPLS tagged packets to hit this rule as well.
Otherwise (device doesn't support IP version matching), we fall back to
setting Ethertype matching.
If the user's filters specify an L2 ethertype and an IP spec
the rule will then match both the ethertype and the IP version.
The device's support for IP version matching is reported by the
device via dedicated capability bit in query_device_cap and named
outer/inner_ip_version.
IB/mlx5: Add inner spec and IPv6 validation in user's flow attribute list
This change fixes an incomplete validation of the user's
flow attributes list.
Previous implementation validated only matching of IPv4 Ethertype
to IPv4 spec of outer headers (in case both Ethernet with specified
Ethertype and IP specs were present) and lacked the validation of:
1. Matching of IPv6 Ethertype in Ethernet spec (if such exists) to an
IPv6 protocol spec (if such exists).
2. Validation of Ethertype to IP protocol matching on inner headers specs.
Which could cause some combinations of unmatching Ethernet and IP
protocols to pass validation and apply on the device.
The fix adds validation of IPv6 Ethertype and IP spec as well as
performing the scan on both outer and inner attributes.
Bodong Wang [Wed, 29 Mar 2017 03:12:14 +0000 (06:12 +0300)]
IB/mlx5: Fix wrong use of kfree at bad flow in create_cq_user
The kfree was called to free cqb, while it should free *cqb.
Fixes: 1cbe6fc86ccf ("IB/mlx5: Add support for CQE compressing") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Tue, 21 Mar 2017 10:57:06 +0000 (12:57 +0200)]
IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
A warning message during SRIOV multicast cleanup should have actually been
a debug level message. The condition generating the warning does no harm
and can fill the message log.
In some cases, during testing, some tests were so intense as to swamp the
message log with these warning messages, causing a stall in the console
message log output task. This stall caused an NMI to be sent to all CPUs
(so that they all dumped their stacks into the message log).
Aside from the message flood causing an NMI, the tests all passed.
Once the message flood which caused the NMI is removed (by reducing the
warning message to debug level), the NMI no longer occurs.
Sample message log (console log) output illustrating the flood and
resultant NMI (snippets with comments and modified with ... instead
of hex digits, to satisfy checkpatch.pl):
Jack Morgenstein [Tue, 21 Mar 2017 10:57:05 +0000 (12:57 +0200)]
IB/mlx4: Fix ib device initialization error flow
In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs.
However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not
called to free the allocated EQs.
Fixes: e605b743f33d ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs") Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Majd Dibbiny [Sun, 19 Mar 2017 09:01:28 +0000 (11:01 +0200)]
IB/mlx4: Support RAW Ethernet when RoCE is disabled
On some environments, such as certain SR-IOV VF configurations, RoCE
isn't supported for mlx4 Ethernet ports. Currently the driver will
not open IB device on that port.
This is problematic since we do want user-space RAW Ethernet QPs functionality
to remain in place. For that end, enhance the relevant driver flows such that we
do create a device instance in that case.
Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Sun, 19 Mar 2017 08:55:57 +0000 (10:55 +0200)]
IB/core: Fix sysfs registration error flow
The kernel commit cited below restructured ib device management
so that the device kobject is initialized in ib_alloc_device.
As part of the restructuring, the kobject is now initialized in
procedure ib_alloc_device, and is later added to the device hierarchy
in the ib_register_device call stack, in procedure
ib_device_register_sysfs (which calls device_add).
However, in the ib_device_register_sysfs error flow, if an error
occurs following the call to device_add, the cleanup procedure
device_unregister is called. This call results in the device object
being deleted -- which results in various use-after-free crashes.
The correct cleanup call is device_del -- which undoes device_add
without deleting the device object.
The device object will then (correctly) be deleted in the
ib_register_device caller's error cleanup flow, when the caller invokes
ib_dealloc_device.
Fixes: 55aeed06544f6 ("IB/core: Make ib_alloc_device init the kobject") Cc: <stable@vger.kernel.org> # v4.2+ Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Parav Pandit [Sun, 19 Mar 2017 08:55:55 +0000 (10:55 +0200)]
IB/core: Fix kernel crash during fail to initialize device
This patch fixes the kernel crash that occurs during ib_dealloc_device()
called due to provider driver fails with an error after
ib_alloc_device() and before it can register using ib_register_device().
This crashed seen in tha lab as below which can occur with any IB device
which fails to perform its device initialization before invoking
ib_register_device().
This patch avoids touching cache and port immutable structures if device
is not yet initialized.
It also releases related memory when cache and port immutable data
structure initialization fails during register_device() state.