Xiubo Li [Tue, 2 May 2017 03:38:06 +0000 (11:38 +0800)]
tcmu: Add global data block pool support
For each target there will be one ring, when the target number
grows larger and larger, it could eventually runs out of the
system memories.
In this patch for each target ring, currently for the cmd area
the size will be fixed to 8MB and for the data area the size
will grow from 0 to max 256K * PAGE_SIZE(1G for 4K page size).
For all the targets' data areas, they will get empty blocks
from the "global data block pool", which has limited to 512K *
PAGE_SIZE(2G for 4K page size) for now.
When the "global data block pool" has been used up, then any
target could wake up the unmap thread routine to shrink other
targets' data area memories. And the unmap thread routine will
always try to truncate the ring vma from the last using block
offset.
When user space has touched the data blocks out of tcmu_cmd
iov[], the tcmu_page_fault() will try to return one zeroed blocks.
Here we move the timeout's tcmu_handle_completions() into unmap
thread routine, that's to say when the timeout fired, it will
only do the tcmu_check_expired_cmd() and then wake up the unmap
thread to do the completions() and then try to shrink its idle
memories. Then the cmdr_lock could be a mutex and could simplify
this patch because the unmap_mapping_range() or zap_* may go to
sleep.
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Jianfei Hu <hujianfei@cmss.chinamobile.com> Acked-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Xiubo Li [Tue, 2 May 2017 03:38:05 +0000 (11:38 +0800)]
tcmu: Add dynamic growing data area feature support
Currently for the TCMU, the ring buffer size is fixed to 64K cmd
area + 1M data area, and this will be bottlenecks for high iops.
The struct tcmu_cmd_entry {} size is fixed about 112 bytes with
iovec[N] & N <= 4, and the size of struct iovec is about 16 bytes.
If N == 0, the ratio will be sizeof(cmd entry) : sizeof(datas) ==
112Bytes : (N * 4096)Bytes = 28 : 0, no data area is need.
If 0 < N <=4, the ratio will be sizeof(cmd entry) : sizeof(datas)
== 112Bytes : (N * 4096)Bytes = 28 : (N * 1024), so the max will
be 28 : 1024.
If N > 4, the sizeof(cmd entry) will be [(N - 4) *16 + 112] bytes,
and its corresponding data size will be [N * 4096], so the ratio
of sizeof(cmd entry) : sizeof(datas) == [(N - 4) * 16 + 112)Bytes
: (N * 4096)Bytes == 4/1024 - 12/(N * 1024), so the max is about
4 : 1024.
When N is bigger, the ratio will be smaller.
As the initial patch, we will set the cmd area size to 2M, and
the cmd area size to 32M. The TCMU will dynamically grows the data
area from 0 to max 32M size as needed.
The cmd area memory will be allocated through vmalloc(), and the
data area's blocks will be allocated individually later when needed.
The allocated data area block memory will be managed via radix tree.
For now the bitmap still be the most efficient way to search and
manage the block index, this could be update later.
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Jianfei Hu <hujianfei@cmss.chinamobile.com> Acked-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Fri, 28 Apr 2017 08:04:04 +0000 (10:04 +0200)]
target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
When setting up an ALUA target port group with an invalid ID the
error message
kstrtoul() returned -22 for tg_pt_gp_id
is displayed, which is not really helpful.
Convert it to something sane.
And while we're at it, join the messages onto a single line.
Signed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Fri, 28 Apr 2017 08:03:38 +0000 (10:03 +0200)]
target: fixup error message in target_tg_pt_gp_alua_access_type_store()
When setting up a target the error message:
Unable to do set ##_name ALUA state on non valid tg_pt_gp ID: 0
is displayed.
Apparently concatenation doesn't work in a string; one should be using
implicit string concatenation here.
Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This adds initial PGR support for just TCMU, since tcmu doesn't
have the necessary IT_NEXUS info to process PGR in userspace,
so have those commands be processed in kernel.
HA support is not available yet, we will work on it if this patch
is acceptable.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Markus Elfring [Sun, 9 Apr 2017 19:07:14 +0000 (21:07 +0200)]
target: Use kmalloc_array() in transport_kmap_data_sg()
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Markus Elfring [Sun, 9 Apr 2017 18:25:11 +0000 (20:25 +0200)]
target: Use kmalloc_array() in compare_and_write_callback()
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data structure by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Markus Elfring [Sun, 9 Apr 2017 18:15:12 +0000 (20:15 +0200)]
target: Improve size determinations in two functions
Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determinations a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Markus Elfring [Sun, 9 Apr 2017 17:02:38 +0000 (19:02 +0200)]
target: Use kcalloc() in two functions
* Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
* Replace the specification of data structures by pointer dereferences
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Markus Elfring [Sun, 9 Apr 2017 14:00:39 +0000 (16:00 +0200)]
iscsi-target: Improve size determinations in four functions
Replace the specification of four data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determinations a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Markus Elfring [Sun, 9 Apr 2017 13:06:00 +0000 (15:06 +0200)]
iscsi-target: Use kcalloc() in iscsit_allocate_iovecs()
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data structure by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Dmitry Monakhov [Fri, 31 Mar 2017 15:53:36 +0000 (19:53 +0400)]
tcm: make pi data verification configurable
Currently ramdisk and fileio always perform PI verification
before and after backend IO. This approach is not very flexible.
Because some one may want to postpone this work to other layers in
IO stack. For example if we want to test blk_integrity_profile
testcase:
https://github.com/dmonakhov/xfstests/commit/dee408c868861d6b6871dbb3381facee7effdbe4 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Dmitry Monakhov [Fri, 31 Mar 2017 15:53:35 +0000 (19:53 +0400)]
tcm_fileio: Prevent information leak for short reads
If we failed to read data from backing file (probably because some one
truncate file under us), we must zerofill cmd's data, otherwise it will
be returned as is. Most likely cmd's data are unitialized pages from
page cache. This result in information leak.
(Change BUG_ON into -EINVAL se_cmd failure - nab)
testcase: https://github.com/dmonakhov/xfstests/commit/e11a1b7b907ca67b1be51a1594025600767366d5 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Bart Van Assche [Thu, 30 Mar 2017 17:12:39 +0000 (10:12 -0700)]
target: Fix VERIFY and WRITE VERIFY command parsing
Use the value of the BYTCHK field to determine the size of the
Data-Out buffer. For VERIFY, honor the VRPROTECT, DPO and FUA
fields. This patch avoids that LIO complains about a mismatch
between the expected transfer length and the SCSI CDB length
if the value of the BYTCHK field is 0.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Max Lohrmann <post@wickenrode.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Elena Reshetova [Mon, 6 Mar 2017 14:21:11 +0000 (16:21 +0200)]
target/iblock: convert iblock_req.pending from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Xiubo Li [Fri, 31 Mar 2017 02:35:25 +0000 (10:35 +0800)]
tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
For the bidirectional case, the Data-Out buffer blocks will always at
the head of the tcmu_cmd's bitmap, and before gathering the Data-In
buffer, first of all it should skip the Data-Out ones, or the device
supporting BIDI commands won't work.
iscsi-target: Drop work-around for legacy GlobalSAN initiator
Once upon a time back in 2009, a work-around was added to support
the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login
did not propose nor respond to MaxBurstLength, FirstBurstLength,
DefaultTime2Wait and DefaultTime2Retain keys.
The work-around in iscsi_check_proposer_for_optional_reply()
allowed the missing keys to be proposed, but did not require
waiting for a response before moving to full feature phase
operation. This allowed GlobalSAN v3.3 to work out-of-the
box, and for many years we didn't run into login interopt
issues with any other initiators..
Until recently, when Martin tried a QLogic 57840S iSCSI Offload
HBA on Windows 2016 which completed login, but subsequently
failed with:
Got unknown iSCSI OpCode: 0x43
The issue was QLogic MSFT side did not propose DefaultTime2Wait +
DefaultTime2Retain, so LIO proposes them itself, and immediately
transitions to full feature phase because of the GlobalSAN hack.
However, the QLogic MSFT side still attempts to respond to
DefaultTime2Retain + DefaultTime2Wait, even though LIO has set
ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT
in last login response.
So while the QLogic MSFT side should have been proposing these
two keys to start, it was doing the correct thing per RFC-3720
attempting to respond to proposed keys before transitioning to
full feature phase.
All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541)
does correctly propose the four keys during login, making the
original work-around moot.
So in order to allow QLogic MSFT to run unmodified as-is, go
ahead and drop this long standing work-around.
Reported-by: Martin Svec <martin.svec@zoner.cz> Cc: Martin Svec <martin.svec@zoner.cz> Cc: Himanshu Madhani <Himanshu.Madhani@cavium.com> Cc: Arun Easi <arun.easi@cavium.com> Cc: stable@vger.kernel.org # 3.1+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Wed, 29 Mar 2017 05:19:24 +0000 (00:19 -0500)]
target: Fix ALUA transition state race between multiple initiators
Multiple threads could be writing to alua_access_state at
the same time, or there could be multiple STPGs in flight
(different initiators sending them or one initiator sending
them to different ports), or a combo of both and the
core_alua_do_transition_tg_pt calls will race with each other.
Because from the last patches we no longer delay running
core_alua_do_transition_tg_pt_work, there does not seem to be
any point in running that in a workqueue. And, we always
wait for it to complete one way or another, so we can sleep
in this code path. So, this patch made over target-pending just adds a
mutex and does the work core_alua_do_transition_tg_pt_work was doing in
core_alua_do_transition_tg_pt.
There is also no need to use an atomic for the
tg_pt_gp_alua_access_state. In core_alua_do_transition_tg_pt we will
test and set it under the transition mutex. And, it is a int/32 bits
so in the other places where it is read, we will never see it partially
updated.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Sagi Grimberg [Wed, 22 Mar 2017 15:07:30 +0000 (17:07 +0200)]
iser-target: avoid posting a recv buffer twice
We pre-allocate our send-queues and might overflow them
in case we have multi work-request operations which tend
to occur for large RDMA transfers over devices with limited
allowed sg elements. When we get to a queue-full condition
we might retry again later, so track our receive buffers
so we don't repost them for a retry case.
This patch addresses two queue-full handling bugs in iser-target.
The first is propagating isert_rdma_rw_ctx_post() return back
to target-core via isert_put_datain() + isert_get_dataout()
callbacks, in order to trigger queue-full logic in target-core.
Note target-core expects -EAGAIN or -ENOMEM error to signal
RDMA WRITE/READ data-transfer callbacks should be retried,
after queue-full logic been invoked.
Other types of errors propagated up from RDMA RW API will result
in target-core generating internal CHECK_CONDITION status,
avoiding subsequent isert_put_datain() and isert_get_dataout()
iscsit_transport callback retry attempts.
The second is to use transport_generic_request_failure()
during T10-PI hw-offload errors in isert_rdma_write_done()
and isert_rdma_read_done(), so CHECK_CONDITION queue-full
is handled internally by target-core.
Also add isert_put_response() T10-PI failure case fixme in
isert_rdma_write_done(), which is currently not internally
retried or released until session reinstatement.
This patch changes iscsi-target to propagate iscsit_transport
->iscsit_queue_data_in() and ->iscsit_queue_status() callback
errors, back up into target-core.
This allows target-core to retry failed iscsit_transport
callbacks using internal queue-full logic.
This patch fixes a set of queue-full response handling
bugs, where outgoing responses are leaked when a fabric
driver is propagating non -EAGAIN or -ENOMEM errors
to target-core.
It introduces TRANSPORT_COMPLETE_QF_ERR state used to
signal when CHECK_CONDITION status should be generated,
when fabric driver ->write_pending(), ->queue_data_in(),
or ->queue_status() callbacks fail with non -EAGAIN or
-ENOMEM errors, and data-transfer should not be retried.
Note all fabric driver -EAGAIN and -ENOMEM errors are
still retried indefinately with associated data-transfer
callbacks, following existing queue-full logic.
Also fix two missing ->queue_status() queue-full cases
related to CMD_T_ABORTED w/ TAS status handling.
Xiubo Li [Mon, 27 Mar 2017 09:07:41 +0000 (17:07 +0800)]
tcmu: Fix wrongly calculating of the base_command_size
The t_data_nents and t_bidi_data_nents are the numbers of the
segments, but it couldn't be sure the block size equals to size
of the segment.
For the worst case, all the blocks are discontiguous and there
will need the same number of iovecs, that's to say: blocks == iovs.
So here just set the number of iovs to block count needed by tcmu
cmd.
Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Xiubo Li [Mon, 27 Mar 2017 09:07:40 +0000 (17:07 +0800)]
tcmu: Fix possible overwrite of t_data_sg's last iov[]
If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.
To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.
So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.
Reviewed-by: Mike Christie <mchristi@redhat.com> Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
target: Avoid mappedlun symlink creation during lun shutdown
This patch closes a race between se_lun deletion during configfs
unlink in target_fabric_port_unlink() -> core_dev_del_lun()
-> core_tpg_remove_lun(), when transport_clear_lun_ref() blocks
waiting for percpu_ref RCU grace period to finish, but a new
NodeACL mappedlun is added before the RCU grace period has
completed.
This can happen in target_fabric_mappedlun_link() because it
only checks for se_lun->lun_se_dev, which is not cleared until
after transport_clear_lun_ref() percpu_ref RCU grace period
finishes.
This bug originally manifested as NULL pointer dereference
OOPsen in target_stat_scsi_att_intr_port_show_attr_dev() on
v4.1.y code, because it dereferences lun->lun_se_dev without
a explicit NULL pointer check.
In post v4.1 code with target-core RCU conversion, the code
in target_stat_scsi_att_intr_port_show_attr_dev() no longer
uses se_lun->lun_se_dev, but the same race still exists.
To address the bug, go ahead and set se_lun>lun_shutdown as
early as possible in core_tpg_remove_lun(), and ensure new
NodeACL mappedlun creation in target_fabric_mappedlun_link()
fails during se_lun shutdown.
Reported-by: James Shen <jcs@datera.io> Cc: James Shen <jcs@datera.io> Tested-by: James Shen <jcs@datera.io> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
iscsi-target: Fix TMR reference leak during session shutdown
This patch fixes a iscsi-target specific TMR reference leak
during session shutdown, that could occur when a TMR was
quiesced before the hand-off back to iscsi-target code
via transport_cmd_check_stop_to_fabric().
The reference leak happens because iscsit_free_cmd() was
incorrectly skipping the final target_put_sess_cmd() for
TMRs when transport_generic_free_cmd() returned zero because
the se_cmd->cmd_kref did not reach zero, due to the missing
se_cmd assignment in original code.
The result was iscsi_cmd and it's associated se_cmd memory
would be freed once se_sess->sess_cmd_map where released,
but the associated se_tmr_req was leaked and remained part
of se_device->dev_tmr_list.
This bug would manfiest itself as kernel paging request
OOPsen in core_tmr_lun_reset(), when a left-over se_tmr_req
attempted to dereference it's se_cmd pointer that had
already been released during normal session shutdown.
To address this bug, go ahead and treat ISCSI_OP_SCSI_CMD
and ISCSI_OP_SCSI_TMFUNC the same when there is an extra
se_cmd->cmd_kref to drop in iscsit_free_cmd(), and use
op_scsi to signal __iscsit_free_cmd() when the former
needs to clear any further iscsi related I/O state.
Reported-by: Rob Millner <rlm@daterainc.com> Cc: Rob Millner <rlm@daterainc.com> Reported-by: Chu Yuan Lin <cyl@datera.io> Cc: Chu Yuan Lin <cyl@datera.io> Tested-by: Chu Yuan Lin <cyl@datera.io> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Quinn Tran [Wed, 15 Mar 2017 16:48:55 +0000 (09:48 -0700)]
qla2xxx: Fix delayed response to command for loop mode/direct connect.
Current driver wait for FW to be in the ready state before
processing in-coming commands. For Arbitrated Loop or
Point-to- Point (not switch), FW Ready state can take a while.
FW will transition to ready state after all Nports have been
logged in. In the mean time, certain initiators have completed
the login and starts IO. Driver needs to start processing all
queues if FW is already started.
Quinn Tran [Wed, 15 Mar 2017 16:48:54 +0000 (09:48 -0700)]
qla2xxx: Change scsi host lookup method.
For target mode, when new scsi command arrive, driver first performs
a look up of the SCSI Host. The current look up method is based on
the ALPA portion of the NPort ID. For Cisco switch, the ALPA can
not be used as the index. Instead, the new search method is based
on the full value of the Nport_ID via btree lib.
Quinn Tran [Wed, 15 Mar 2017 16:48:52 +0000 (09:48 -0700)]
qla2xxx: Use IOCB interface to submit non-critical MBX.
The Mailbox interface is currently over subscribed. We like
to reserve the Mailbox interface for the chip managment and
link initialization. Any non essential Mailbox command will
be routed through the IOCB interface. The IOCB interface is
able to absorb more commands.
Following commands are being routed through IOCB interface
- Get ID List (007Ch)
- Get Port DB (0064h)
- Get Link Priv Stats (006Dh)
Quinn Tran [Wed, 15 Mar 2017 16:48:48 +0000 (09:48 -0700)]
qla2xxx: Allow relogin to proceed if remote login did not finish
If the remote port have started the login process, then the
PLOGI and PRLI should be back to back. Driver will allow
the remote port to complete the process. For the case where
the remote port decide to back off from sending PRLI, this
local port sets an expiration timer for the PRLI. Once the
expiration time passes, the relogin retry logic is allowed
to go through and perform login with the remote port.
Quinn Tran [Wed, 15 Mar 2017 16:48:47 +0000 (09:48 -0700)]
qla2xxx: Fix sess_lock & hardware_lock lock order problem.
The main lock that needs to be held for CMD or TMR submission
to upper layer is the sess_lock. The sess_lock is used to
serialize cmd submission and session deletion. The addition
of hardware_lock being held is not necessary. This patch removes
hardware_lock dependency from CMD/TMR submission.
Use hardware_lock only for error response in this case.
Quinn Tran [Wed, 15 Mar 2017 16:48:46 +0000 (09:48 -0700)]
qla2xxx: Fix inadequate lock protection for ABTS.
Normally, ABTS is sent to Target Core as Task MGMT command.
In the case of error, qla2xxx needs to send response, hardware_lock
is required to prevent request queue corruption.
Quinn Tran [Wed, 15 Mar 2017 16:48:45 +0000 (09:48 -0700)]
qla2xxx: Fix request queue corruption.
When FW notify driver or driver detects low FW resource,
driver tries to send out Busy SCSI Status to tell Initiator
side to back off. During the send process, the lock was not held.
tcmu: Convert cmd_time_out into backend device attribute
Instead of putting cmd_time_out under ../target/core/user_0/foo/control,
which has historically been used by parameters needed for initial
backend device configuration, go ahead and move cmd_time_out into
a backend device attribute.
In order to do this, tcmu_module_init() has been updated to create
a local struct configfs_attribute **tcmu_attrs, that is based upon
the existing passthrough_attrib_attrs along with the new cmd_time_out
attribute. Once **tcm_attrs has been setup, go ahead and point
it at tcmu_ops->tb_dev_attrib_attrs so it's picked up by target-core.
Also following MNC's previous change, ->cmd_time_out is stored in
milliseconds but exposed via configfs in seconds. Also, note this
patch restricts the modification of ->cmd_time_out to before +
after the TCMU device has been configured, but not while it has
active fabric exports.
Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 9 Mar 2017 08:42:09 +0000 (02:42 -0600)]
tcmu: make cmd timeout configurable
A single daemon could implement multiple types of devices
using multuple types of real devices that may not support
restarting from crashes and/or handling tcmu timeouts. This
makes the cmd timeout configurable, so handlers that do not
support it can turn if off for now.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 9 Mar 2017 08:42:08 +0000 (02:42 -0600)]
tcmu: add helper to check if dev was configured
This adds a helper to check if the dev was configured. It
will be used in the next patch to prevent updates to some
config settings after the device has been setup.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 2 Mar 2017 10:59:50 +0000 (04:59 -0600)]
target: fix race during implicit transition work flushes
This fixes the following races:
1. core_alua_do_transition_tg_pt could have read
tg_pt_gp_alua_access_state and gone into this if chunk:
if (!explicit &&
atomic_read(&tg_pt_gp->tg_pt_gp_alua_access_state) ==
ALUA_ACCESS_STATE_TRANSITION) {
and then core_alua_do_transition_tg_pt_work could update the
state. core_alua_do_transition_tg_pt would then only set
tg_pt_gp_alua_pending_state and the tg_pt_gp_alua_access_state would
not get updated with the second calls state.
2. core_alua_do_transition_tg_pt could be setting
tg_pt_gp_transition_complete while the tg_pt_gp_transition_work
is already completing. core_alua_do_transition_tg_pt then waits on the
completion that will never be called.
To handle these issues, we just call flush_work which will return when
core_alua_do_transition_tg_pt_work has completed so there is no need
to do the complete/wait. And, if core_alua_do_transition_tg_pt_work
was running, instead of trying to sneak in the state change, we just
schedule up another core_alua_do_transition_tg_pt_work call.
Note that this does not handle a possible race where there are multiple
threads call core_alua_do_transition_tg_pt at the same time. I think
we need a mutex in target_tg_pt_gp_alua_access_state_store.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 2 Mar 2017 10:59:49 +0000 (04:59 -0600)]
target: allow userspace to set state to transitioning
Userspace target_core_user handlers like tcmu-runner may want to set the
ALUA state to transitioning while it does implicit transitions. This
patch allows that state when set from configfs.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 2 Mar 2017 10:59:48 +0000 (04:59 -0600)]
target: fix ALUA transition timeout handling
The implicit transition time tells initiators the min time
to wait before timing out a transition. We currently schedule
the transition to occur in tg_pt_gp_implicit_trans_secs
seconds so there is no room for delays. If
core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata
needs to write out info to a remote file, then the initiator can
easily time out the operation.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 2 Mar 2017 05:13:26 +0000 (23:13 -0600)]
target: Use system workqueue for ALUA transitions
If tcmu-runner is processing a STPG and needs to change the kernel's
ALUA state then we cannot use the same work queue for task management
requests and ALUA transitions, because we could deadlock. The problem
occurs when a STPG times out before tcmu-runner is able to
call into target_tg_pt_gp_alua_access_state_store->
core_alua_do_port_transition -> core_alua_do_transition_tg_pt ->
queue_work. In this case, the tmr is on the work queue waiting for
the STPG to complete, but the STPG transition is now queued behind
the waiting tmr.
Note:
This bug will also be fixed by this patch:
http://www.spinics.net/lists/target-devel/msg14560.html
which switches the tmr code to use the system workqueues.
For both, I am not sure if we need a dedicated workqueue since
it is not a performance path and I do not think we need WQ_MEM_RECLAIM
to make forward progress to free up memory like the block layer does.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 2 Mar 2017 05:13:25 +0000 (23:13 -0600)]
target: fail ALUA transitions for pscsi
We do not setup the LU group for pscsi devices, so if you write
a state to alua_access_state that will cause a transition you will
get a NULL pointer dereference.
This patch will fail attempts to try and transition the path
for backend devices that set the TRANSPORT_FLAG_PASSTHROUGH_ALUA
flag.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 2 Mar 2017 05:13:24 +0000 (23:13 -0600)]
target: allow ALUA setup for some passthrough backends
This patch allows passthrough backends to use the core/base LIO
ALUA setup and state checks, but still handle the execution of
commands.
This will allow the target_core_user module to execute STPG and RTPG
in userspace, and not have to duplicate the ALUA state checks, path
information (needed so we can check if command is executable on
specific paths) and setup (rtslib sets/updates the configfs ALUA
interface like it does for iblock or file).
For STPG, the target_core_user userspace daemon, tcmu-runner will
still execute the STPG, and to update the core/base LIO state it
will use the existing configfs interface. For RTPG, tcmu-runner
will loop over configfs and/or cache the state.
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Mike Christie [Thu, 2 Mar 2017 05:14:39 +0000 (23:14 -0600)]
tcmu: allow hw_max_sectors greater than 128
tcmu hard codes the hw_max_sectors to 128 which is a litle small.
Userspace uses the max_sectors to report the optimal IO size and
some initiators perform better with larger IOs (open-iscsi seems
to do better with 256 to 512 depending on the test).
(Fix do not display hw max sectors twice - MNC)
Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
All in-tree fabric drivers provide a tfo->check_stop_free(),
so there is no need to do the extra check within existing
transport_cmd_check_stop_to_fabric() code.
Just to be sure, add a check in target_fabric_tf_ops_check()
to notify any out-of-tree drivers that might be missing it.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Max Lohrmann [Wed, 8 Mar 2017 06:09:56 +0000 (22:09 -0800)]
target: Fix VERIFY_16 handling in sbc_parse_cdb
As reported by Max, the Windows 2008 R2 chkdsk utility expects
VERIFY_16 to be supported, and does not handle the returned
CHECK_CONDITION properly, resulting in an infinite loop.
The following fixes a divide by zero OOPs with TYPE_TAPE
due to pscsi_tape_read_blocksize() failing causing a zero
sd->sector_size being propigated up via dev_attrib.hw_block_size.
It also fixes another long-standing bug where TYPE_TAPE and
TYPE_MEDIMUM_CHANGER where using pscsi_create_type_other(),
which does not call scsi_device_get() to take the device
reference. Instead, rename pscsi_create_type_rom() to
pscsi_create_type_nondisk() and use it for all cases.
Finally, also drop a dump_stack() in pscsi_get_blocks() for
non TYPE_DISK, which in modern target-core can get invoked
via target_sense_desc_format() during CHECK_CONDITION.
Reported-by: Malcolm Haak <insanemal@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4) Fix sendmsg deadlock in rxrpc, from David Howells.
5) Add missing RCU locking to transport hashtable scan, from Xin Long.
6) Fix potential packet loss in mlxsw driver, from Ido Schimmel.
7) Fix race in NAPI handling between poll handlers and busy polling,
from Eric Dumazet.
8) TX path in vxlan and geneve need proper RCU locking, from Jakub
Kicinski.
9) SYN processing in DCCP and TCP need to disable BH, from Eric
Dumazet.
10) Properly handle net_enable_timestamp() being invoked from IRQ
context, also from Eric Dumazet.
11) Fix crash on device-tree systems in xgene driver, from Alban Bedel.
12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de
Melo.
13) Fix use-after-free in netvsc driver, from Dexuan Cui.
14) Fix max MTU setting in bonding driver, from WANG Cong.
15) xen-netback hash table can be allocated from softirq context, so use
GFP_ATOMIC. From Anoob Soman.
16) Fix MAC address change bug in bgmac driver, from Hari Vyas.
17) strparser needs to destroy strp_wq on module exit, from WANG Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
strparser: destroy workqueue on module exit
sfc: fix IPID endianness in TSOv2
sfc: avoid max() in array size
rds: remove unnecessary returned value check
rxrpc: Fix potential NULL-pointer exception
nfp: correct DMA direction in XDP DMA sync
nfp: don't tell FW about the reserved buffer space
net: ethernet: bgmac: mac address change bug
net: ethernet: bgmac: init sequence bug
xen-netback: don't vfree() queues under spinlock
xen-netback: keep a local pointer for vif in backend_disconnect()
netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails
netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups
netfilter: nf_conntrack_sip: fix wrong memory initialisation
can: flexcan: fix typo in comment
can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
can: gs_usb: fix coding style
can: gs_usb: Don't use stack memory for USB transfers
ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines
ixgbe: update the rss key on h/w, when ethtool ask for it
...
Linus Torvalds [Sat, 4 Mar 2017 19:36:19 +0000 (11:36 -0800)]
Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM changes for the 4.11 merge window:
PPC:
- correct assumption about ASDR on POWER9
- fix MMIO emulation on POWER9
x86:
- add a simple test for ioperm
- cleanup TSS (going through KVM tree as the whole undertaking was
caused by VMX's use of TSS)
- fix nVMX interrupt delivery
- fix some performance counters in the guest
... and two cleanup patches"
* tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Fix pending events injection
x86/kvm/vmx: remove unused variable in segment_base()
selftests/x86: Add a basic selftest for ioperm
x86/asm: Tidy up TSS limit code
kvm: convert kvm.users_count from atomic_t to refcount_t
KVM: x86: never specify a sample period for virtualized in_tx_cp counters
KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
KVM: PPC: Book3S HV: Fix software walk of guest process page tables
Linus Torvalds [Sat, 4 Mar 2017 19:26:18 +0000 (11:26 -0800)]
Merge tag 'staging-4.11-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver fixes from Greg KH:
"Here are a few small staging and IIO driver fixes for issues that
showed up after the big set if changes you merged last week.
Nothing major, just small bugs resolved in some IIO drivers, a lustre
allocation fix, and some RaspberryPi driver fixes for reported
problems, as well as a MAINTAINERS entry update.
All of these have been in linux-next for a week with no reported
issues"
* tag 'staging-4.11-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: fsl-mc: fix warning in DT ranges parser
MAINTAINERS: Remove Noralf Trønnes as fbtft maintainer
staging: vchiq_2835_arm: Make cache-line-size a required DT property
staging: bcm2835/mmal-vchiq: unlock on error in buffer_from_host()
staging/lustre/lnet: Fix allocation size for sv_cpt_data
iio: adc: xilinx: Fix error handling
iio: 104-quad-8: Fix off-by-one error when addressing flag register
iio: adc: handle unknow of_device_id data
Linus Torvalds [Sat, 4 Mar 2017 18:42:53 +0000 (10:42 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- vmalloc stack regression in CCM
- Build problem in CRC32 on ARM
- Memory leak in cavium
- Missing Kconfig dependencies in atmel and mediatek
- XTS Regression on some platforms (s390 and ppc)
- Memory overrun in CCM test vector
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: vmx - Use skcipher for xts fallback
crypto: vmx - Use skcipher for cbc fallback
crypto: testmgr - Pad aes_ccm_enc_tv_template vector
crypto: arm/crc32 - add build time test for CRC instruction support
crypto: arm/crc32 - fix build error with outdated binutils
crypto: ccm - move cbcmac input off the stack
crypto: xts - Propagate NEED_FALLBACK bit
crypto: api - Add crypto_requires_off helper
crypto: atmel - CRYPTO_DEV_MEDIATEK should depend on HAS_DMA
crypto: atmel - CRYPTO_DEV_ATMEL_TDES and CRYPTO_DEV_ATMEL_SHA should depend on HAS_DMA
crypto: cavium - fix leak on curr if curr->head fails to be allocated
crypto: cavium - Fix couple of static checker errors
Linus Torvalds [Sat, 4 Mar 2017 05:36:56 +0000 (21:36 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is the set of stuff that didn't quite make the initial pull and a
set of fixes for stuff which did.
The new stuff is basically lpfc (nvme), qedi and aacraid. The fixes
cover a lot of previously submitted stuff, the most important of which
probably covers some of the failing irq vectors allocation and other
fallout from having the SCSI command allocated as part of the block
allocation functions"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (59 commits)
scsi: qedi: Fix memory leak in tmf response processing.
scsi: aacraid: remove redundant zero check on ret
scsi: lpfc: use proper format string for dma_addr_t
scsi: lpfc: use div_u64 for 64-bit division
scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m
scsi: cciss: correct check map error.
scsi: qla2xxx: fix spelling mistake: "seperator" -> "separator"
scsi: aacraid: Fixed expander hotplug for SMART family
scsi: mpt3sas: switch to pci_alloc_irq_vectors
scsi: qedf: fixup compilation warning about atomic_t usage
scsi: remove scsi_execute_req_flags
scsi: merge __scsi_execute into scsi_execute
scsi: simplify scsi_execute_req_flags
scsi: make the sense header argument to scsi_test_unit_ready mandatory
scsi: sd: improve TUR handling in sd_check_events
scsi: always zero sshdr in scsi_normalize_sense
scsi: scsi_dh_emc: return success in clariion_std_inquiry()
scsi: fix memory leak of sdpk on when gd fails to allocate
scsi: sd: make sd_devt_release() static
scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.
...
WANG Cong [Fri, 3 Mar 2017 20:21:14 +0000 (12:21 -0800)]
strparser: destroy workqueue on module exit
Fixes: 43a0c6751a32 ("strparser: Stream parser for messages") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Missing check for full sock in ip_route_me_harder(), from
Florian Westphal.
2) Incorrect sip helper structure initilization that breaks it when
several ports are used, from Christophe Leroy.
3) Fix incorrect assumption when looking up for matching with adjacent
intervals in the nft_set_rbtree.
4) Fix broken netlink event error reporting in nf_tables that results
in misleading ESRCH errors propagated to userspace listeners.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 4 Mar 2017 00:48:48 +0000 (16:48 -0800)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A fix and regression test case for nvdimm namespace label
compatibility.
Details:
- An "nvdimm namespace label" is metadata on an nvdimm that
provisions dimm capacity into a "namespace" that can host a block
device / dax-filesytem, or a device-dax character device.
A namespace is an object that other operating environment and
platform firmware needs to comprehend for capabilities like booting
from an nvdimm.
The label metadata contains a checksum that Linux was not
calculating correctly leading to other environments rejecting the
Linux label.
These have received a build success notification from the kbuild
robot, and a positive test result from Nick who reported the problem"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit, libnvdimm: fix interleave set cookie calculation
tools/testing/nvdimm: make iset cookie predictable
Linus Torvalds [Sat, 4 Mar 2017 00:44:21 +0000 (16:44 -0800)]
Merge tag 'pci-v4.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- fix NULL pointer dereferences in many DesignWare-based drivers due to
refactoring error
- fix Altera config write breakage due to my refactoring error
* tag 'pci-v4.11-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: altera: Fix TLP_CFG_DW0 for TLP write
PCI: dwc: Fix crashes seen due to missing assignments
Linus Torvalds [Sat, 4 Mar 2017 00:20:06 +0000 (16:20 -0800)]
Merge branch 'parisc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes and cleanups from Helge Deller:
"Nothing really important in this patchset: fix resource leaks in error
paths, coding style cleanups and code removal"
* 'parisc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Remove flush_user_dcache_range and flush_user_icache_range
parisc: fix a printk
parisc: ccio-dma: Handle return NULL error from ioremap_nocache
parisc: Define access_ok() as macro
parisc: eisa: Fix resource leaks in error paths
parisc: eisa: Remove coding style errors
Linus Torvalds [Sat, 4 Mar 2017 00:17:55 +0000 (16:17 -0800)]
Merge tag 'xtensa-20170303' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa updates from Max Filippov:
- clean up bootable image build targets: provide separate 'Image',
'zImage' and 'uImage' make targets that only build corresponding
image type. Make 'all' build all images appropriate for a platform
- allow merging vectors code into .text section as a preparation step
for XIP support
- fix handling external FDT when the kernel is built without
BLK_DEV_INITRD support
* tag 'xtensa-20170303' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: allow merging vectors into .text section
xtensa: clean up bootable image build targets
xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
Linus Torvalds [Sat, 4 Mar 2017 00:15:48 +0000 (16:15 -0800)]
Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC late DT updates from Arnd Bergmann:
"These updates have been kept in a separate branch mostly because they
rely on updates to the respective clk drivers to keep the shared
header files in sync.
This includes two branches for arm64 dt updates, both following up on
earlier changes for the same platforms that are already merged:
Samsung:
- add USB3 support in Exynos7
- minor PM related updates
Amlogic:
- new machines: WeTek Set-top-boxes
- various devices added to DT
There are also a couple of bugfixes that trickled in since the start
of the merge window:
- The moxart_defconfig was not building the intended platform
- CPU-hotplug was broken on ux500
- Coresight was broken on Juno (never worked)"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (26 commits)
ARM: deconfig: fix the moxart defconfig
ARM: ux500: resume the second core properly
arm64: dts: juno: update definition for programmable replicator
arm64: dts: exynos: Add regulators for Vbus and Vbus-Boost
arm64: dts: exynos: Add USB 3.0 controller node for Exynos7
arm64: dts: exynos: Use macros for pinctrl configuration on Exynos7
pinctrl: dt-bindings: samsung: Add Exynos7 specific pinctrl macro definitions
arm64: dts: exynos: Add initial configuration for DISP clocks for TM2/TM2e
ARM64: dts: meson-gxbb-p200: add ADC laddered keys
ARM64: dts: meson: meson-gx: add the SAR ADC
ARM64: dts: meson-gxl: add the pwm_ao_b pin
ARM64: dts: meson-gx: add the missing pwm_AO_ab node
clk: gxbb: fix CLKID_ETH defined twice
ARM64: dts: meson-gxl: rename Nexbox A95x for consistency
clk: gxbb: add the SAR ADC clocks and expose them
dt-bindings: amlogic: Add WeTek boards
ARM64: dts: meson-gxbb: Add support for WeTek Hub and Play
dt-bindings: vendor-prefix: Add wetek vendor prefix
ARM64: dts: meson-gxm: Rename q200 and q201 DT files for consistency
ARM64: dts: meson-gx: Add HDMI HPD/DDC pinctrl nodes
...
Linus Torvalds [Sat, 4 Mar 2017 00:00:59 +0000 (16:00 -0800)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull SMB3 fixes from Steve French:
"Some small bug fixes as well as SMB2.1/SMB3 enablement for DFS (global
namespace) which previously was only enabled for CIFS"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
smb2: Enforce sec= mount option
CIFS: Fix sparse warnings
CIFS: implement get_dfs_refer for SMB2+
CIFS: use DFS pathnames in SMB2+ Create requests
CIFS: set signing flag in SMB2+ TreeConnect if needed
CIFS: let ses->ipc_tid hold smb2 TreeIds
CIFS: add use_ipc flag to SMB2_ioctl()
CIFS: add build_path_from_dentry_optional_prefix()
CIFS: move DFS response parsing out of SMB1 code
CIFS: Fix possible use after free in demultiplex thread
Martyn Welch [Fri, 3 Mar 2017 22:43:30 +0000 (22:43 +0000)]
docs: Fix htmldocs build failure
Build of HTML docs failing due to conversion of deviceiobook.tmpl in 8a8a602f and regulator.tmpl in 028f2533 to RST without removing from
DOCBOOKS in Makefile, resulting (in the case of deviceiobook) the
following error:
make[1]: *** No rule to make target 'Documentation/DocBook/deviceiobook.xml', needed by 'Documentation/DocBook/deviceiobook.aux.xml'. Stop.
Makefile:1452: recipe for target 'htmldocs' failed
make: *** [htmldocs] Error 2
Update DOCBOOKS to reflect available books.
Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Linus Torvalds [Fri, 3 Mar 2017 19:55:57 +0000 (11:55 -0800)]
Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
"Because copy up can take a long time, serialized copy ups could be a
big performance bottleneck. This update allows concurrent copy up of
regular files eliminating this potential problem.
There are also minor fixes"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: drop CAP_SYS_RESOURCE from saved mounter's credentials
ovl: properly implement sync_filesystem()
ovl: concurrent copy up of regular files
ovl: introduce copy up waitqueue
ovl: copy up regular file using O_TMPFILE
ovl: rearrange code in ovl_copy_up_locked()
ovl: check if upperdir fs supports O_TMPFILE
Linus Torvalds [Fri, 3 Mar 2017 19:38:56 +0000 (11:38 -0800)]
Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs 'statx()' update from Al Viro.
This adds the new extended stat() interface that internally subsumes our
previous stat interfaces, and allows user mode to specify in more detail
what kind of information it wants.
It also allows for some explicit synchronization information to be
passed to the filesystem, which can be relevant for network filesystems:
is the cached value ok, or do you need open/close consistency, or what?
From David Howells.
Andreas Dilger points out that the first version of the extended statx
interface was posted June 29, 2010:
Linus Torvalds [Fri, 3 Mar 2017 18:53:35 +0000 (10:53 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A collection of fixes for this merge window, either fixes for existing
issues, or parts that were waiting for acks to come in. This pull
request contains:
- Allocation of nvme queues on the right node from Shaohua.
This was ready long before the merge window, but waiting on an ack
from Bjorn on the PCI bit. Now that we have that, the three patches
can go in.
- Two fixes for blk-mq-sched with nvmeof, which uses hctx specific
request allocations. This caused an oops. One part from Sagi, one
part from Omar.
- A loop partition scan deadlock fix from Omar, fixing a regression
in this merge window.
- A three-patch series from Keith, closing up a hole on clearing out
requests on shutdown/resume.
- A stable fix for nbd from Josef, fixing a leak of sockets.
- Two fixes for a regression in this window from Jan, fixing a
problem with one of his earlier patches dealing with queue vs bdi
life times.
- A fix for a regression with virtio-blk, causing an IO stall if
scheduling is used. From me.
- A fix for an io context lock ordering problem. From me"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: Move bdi_unregister() to del_gendisk()
blk-mq: ensure that bd->last is always set correctly
block: don't call ioc_exit_icq() with the queue lock held for blk-mq
block: Initialize bd_bdi on inode initialization
loop: fix LO_FLAGS_PARTSCAN hang
nvme: Complete all stuck requests
blk-mq: Provide freeze queue timeout
blk-mq: Export blk_mq_freeze_queue_wait
nbd: stop leaking sockets
blk-mq: move update of tags->rqs to __blk_mq_alloc_request()
blk-mq: kill blk_mq_set_alloc_data()
blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request
blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset
nvme: allocate nvme_queue in correct node
PCI: add an API to get node from vector
blk-mq: allocate blk_mq_tags and requests in correct node
Linus Torvalds [Fri, 3 Mar 2017 18:16:38 +0000 (10:16 -0800)]
Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
"The point of these changes is to significantly reduce the
<linux/sched.h> header footprint, to speed up the kernel build and to
have a cleaner header structure.
After these changes the new <linux/sched.h>'s typical preprocessed
size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
lines), which is around 40% faster to build on typical configs.
Not much changed from the last version (-v2) posted three weeks ago: I
eliminated quirks, backmerged fixes plus I rebased it to an upstream
SHA1 from yesterday that includes most changes queued up in -next plus
all sched.h changes that were pending from Andrew.
I've re-tested the series both on x86 and on cross-arch defconfigs,
and did a bisectability test at a number of random points.
I tried to test as many build configurations as possible, but some
build breakage is probably still left - but it should be mostly
limited to architectures that have no cross-compiler binaries
available on kernel.org, and non-default configurations"
* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
sched/headers: Clean up <linux/sched.h>
sched/headers: Remove #ifdefs from <linux/sched.h>
sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
sched/core: Remove unused prefetch_stack()
sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
sched/headers: Remove <linux/signal.h> from <linux/sched.h>
sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
sched/headers: Remove the runqueue_is_locked() prototype
sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
...
Linus Torvalds [Fri, 3 Mar 2017 18:13:12 +0000 (10:13 -0800)]
Merge tag 'linux-kselftest-4.11-rc1-urgent_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"This update consists of an urgent fix for individual test build
failures introduced in the 4.11-rc1 update"
* tag 'linux-kselftest-4.11-rc1-urgent_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: lib.mk Fix individual test builds
David S. Miller [Fri, 3 Mar 2017 18:06:39 +0000 (10:06 -0800)]
Merge branch 'sfx-fixes'
Edward Cree says:
====================
sfc: couple of fixes
First patch addresses a construct that causes sparse to error out.
With that fixed, sparse makes some warnings on ef10.c, second patch
fixes one of them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 3 Mar 2017 15:22:27 +0000 (15:22 +0000)]
sfc: fix IPID endianness in TSOv2
The value we read from the header is in network byte order, whereas
EFX_POPULATE_QWORD_* takes values in host byte order (which it then
converts to little-endian, as MCDI is little-endian).
Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 3 Mar 2017 15:22:09 +0000 (15:22 +0000)]
sfc: avoid max() in array size
It confuses sparse, which thinks the size isn't constant. Let's achieve
the same thing with a BUILD_BUG_ON, since we know which one should be
bigger and don't expect them ever to change.
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Fri, 3 Mar 2017 05:44:26 +0000 (00:44 -0500)]
rds: remove unnecessary returned value check
The function rds_trans_register always returns 0. As such, it is not
necessary to check the returned value.
Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 2 Mar 2017 23:48:52 +0000 (23:48 +0000)]
rxrpc: Fix potential NULL-pointer exception
Fix a potential NULL-pointer exception in rxrpc_do_sendmsg(). The call
state check that I added should have gone into the else-body of the
if-statement where we actually have a call to check.
Found by CoverityScan CID#1414316 ("Dereference after null check").
Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 3 Mar 2017 17:46:54 +0000 (09:46 -0800)]
Merge branch 'nfp-fixes'
Jakub Kicinski says:
====================
nfp: RX and XDP buffer fixes
Two trivial fixes for code introduced with XDP support. First
one corrects the buffer size we populate a register with. The
register is designed to be used for scatter transfers which
the driver (and most FWs) don't support so it's not critical.
The other one for DMA direction is mostly cosmetic, DMA API
doesn't seem to care today about the precise direction in sync
calls.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Mar 2017 23:26:21 +0000 (15:26 -0800)]
nfp: correct DMA direction in XDP DMA sync
dma_sync_single_for_*() takes the direction in which the buffer
was mapped, not the direction of the sync. We should sync XDP
buffers bidirectionally.
Fixes: ecd63a0217d5 ("nfp: add XDP support in the driver") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 2 Mar 2017 23:26:20 +0000 (15:26 -0800)]
nfp: don't tell FW about the reserved buffer space
Since commit c0f031bc8866 ("nfp_net: use alloc_frag() and build_skb()")
we are allocating buffers which have to hold both the data and skb to
be created in place by build_skb().
FW should only be told about the buffer space it can DMA to, that
is without the build_skb() headroom and tailroom. Note: firmware
applications should validate the buffers against both MTU and
free list buffer size so oversized packets would not pass through
the NIC anyway.
Fixes: c0f031bc8866 ("nfp: use alloc_frag() and build_skb()") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Changes in v5:
* Rebased to the latest code and fixed up a compile error due to the
mac_addr struct going away (found by David Miller)
Changes in v4:
* Added the udelays from the previous code (per David Miller)
Changes in v3:
* Reworked the init sequence patch to only remove the device reset if
the device is actually in reset. Given that this code doesn't bear
much resemblance to the original code, I'm changing the author of the
patch. This was tested on NS2 SVK.
Changes in v2:
* Reworked the first match to make it more obvious what portions of the
register were being preserved (Per Rafal Mileki)
* Style change to reorder the function variables in patch 2 (per Sergei
Shtylyov)
Bug fixes for bgmac driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hari Vyas [Thu, 2 Mar 2017 22:59:57 +0000 (17:59 -0500)]
net: ethernet: bgmac: mac address change bug
ndo_set_mac_address() passes struct sockaddr * as 2nd parameter to
bgmac_set_mac_address() but code assumed u8 *. This caused two bytes
chopping and the wrong mac address was configured.
Signed-off-by: Hari Vyas <hariv@broadcom.com> Signed-off-by: Jon Mason <jon.mason@broadcom.com> Fixes: 4e209001b86 ("bgmac: write mac address to hardware in ndo_set_mac_address") Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Thu, 2 Mar 2017 22:59:56 +0000 (17:59 -0500)]
net: ethernet: bgmac: init sequence bug
Fix a bug in the 'bgmac' driver init sequence that blind writes for init
sequence where it should preserve most bits other than the ones it is
deliberately manipulating.
The code now checks to see if the adapter needs to be brought out of
reset (where as before it was doing an IDM write to bring it out of
reset regardless of whether it was in reset or not). Also, removed
unnecessary usleeps (as there is already a read present to flush the
IDM writes).
Signed-off-by: Zac Schroff <zschroff@broadcom.com> Signed-off-by: Jon Mason <jon.mason@broadcom.com> Fixes: f6a95a24957 ("net: ethernet: bgmac: Add platform device support") Signed-off-by: David S. Miller <davem@davemloft.net>