git.karo-electronics.de Git - karo-tx-linux.git/log

ibmvfc: fix little endian issues

Added big endian annotations to relevant data structure fields, and necessary
byte swappings to support little endian builds.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

qla2xxx: Use dma_zalloc_coherent

Use the zeroing function instead of dma_alloc_coherent & memset(,0,)

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

bfa: use ARRAY_SIZE instead of sizeof/sizeof[0]

Use macro definition

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Anil Gurumurthy <anil.gurumurthy@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

bfa: remove useless return variables

This patch remove variables that are initialized with a constant,
are never updated, and are only used as parameter of return.
Return the constant instead of using a variable.

Verified by compilation only.

The coccinelle script that find and fixes this issue is:
// <smpl>
@@
type T;
constant C;
identifier ret;
@@
- T ret = C;
... when != ret
when strict
return
- ret
+ C
;
// </smpl>

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Sudarsana Kalluru <Sudarsana.Kalluru@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address

bfa_swap_words() shifts its argument (assumed to be 64-bit) by 32 bits
each way.  In two places the argument type is dma_addr_t, which may be
32-bit, in which case the effect of the bit shift is undefined:

drivers/scsi/bfa/bfa_fcpim.c: In function 'bfa_ioim_send_ioreq':
drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: left shift count >= width of type [enabled by default]
    addr = bfa_sgaddr_le(sg_dma_address(sg));
    ^
drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: right shift count >= width of type [enabled by default]
drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: left shift count >= width of type [enabled by default]
    addr = bfa_sgaddr_le(sg_dma_address(sg));
    ^
drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: right shift count >= width of type [enabled by default]

Avoid this by adding casts to u64 in bfa_swap_words().

Compile-tested only.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Anil Gurumurthy <anil.gurumurthy@qlogic.com>
Cc: stable@vger.kernel.org
Fixes: f16a17507b09 ('[SCSI] bfa: remove all OS wrappers')
Signed-off-by: Christoph Hellwig <hch@lst.de>

bfa: Use dma_zalloc_coherent

Use the zeroing function instead of dma_alloc_coherent & memset(,0,)

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Anil Gurumurthy <anil.gurumurthy@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

aic7xxx: Use kstrdup

Use kstrdup when the goal of an allocation is copy a string into the
allocated region.

The Coccinelle semantic patch that makes this change is as follows:

// <smpl>
@@
expression from,to;
expression flag,E1,E2;
statement S;
@@

-  to = kmalloc(strlen(from) + 1,flag);
+  to = kstrdup(from, flag);
   ... when != $from = E1 \| to = E1 $
   if (to==NULL || ...) S
   ... when != $from = E2 \| to = E2 $
-  strcpy(to, from);
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: add defines for new FC port speeds.

These speeds are to support the next generation of FCoE port speeds.

Signed-off-by: Dick Kennedy <Dick.Kennedy@Emulex.Com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

bnx2i, be2iscsi: fix custom stats length

The custom stats is an array with custom_length indicating the length
of the array. This patch fixes bnx2i and be2iscsi's setting of the
custom stats length. They both just have the one, eh_abort_cnt, so that should
be in the first entry of the custom array and custom_length should then
be one.

Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Acked-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

iscsi: kill redundant casts

Remove two redundant casts from char * to char *.

Signed-off-by: Nick Black <nlb@google.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>

tgt: remove SCSI_TGT and SCSI_FC_TGT_ATTRS

The Kconfig symbols SCSI_TGT and SCSI_FC_TGT_ATTRS are unused since
"tgt: removal". Setting them has no effect. Remove these symbols.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

Commit ID: 7e660100d85af860e7ad763202fff717adcdaacd added code to derive the
FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
basic I/O timeout of the device. Fix this bug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: add a blacklist flag which enables VPD page inquiries

Despite supporting modern SCSI features some storage devices continue to
claim conformance to an older version of the SPC spec. This is done for
compatibility with legacy operating systems.

Linux by default will not attempt to read VPD pages on devices that
claim SPC-2 or older. Introduce a blacklist flag that can be used to
trigger VPD page inquiries on devices that are known to support them.

Reported-by: KY Srinivasan <kys@microsoft.com>
Tested-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: move the writeable field from struct scsi_device to struct scsi_cd

We currently set the field in common code based on the device type,
but then only use it in the cdrom driver which also overrides the
value previously set in the generic code.

Just leave this entirely to the CDROM driver to make everyones life
simpler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: add a symbolic name for the ZBC device type

Make sure we have a symbolic name for the ZBC type available,
so that e.g. patch for a SATA to translate ZAC commands can
make use of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: update scsi_device_types

Add two new device types, most importantly the zoned block device
one.

Split from an earlier patch by Hannes Reinecke.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

fnic: reject device resets without assigned tags for the blk-mq case

Current the midlayer fakes up a struct request for the explicit reset
ioctls, and those don't have a tag allocated to them. The fnic driver pokes
into midlayer structures to paper over this design issue, but that won't
work for the blk-mq case.

Either someone who can actually test the hardware will have to come up with
a similar hack for the blk-mq case, or we'll have to bite the bullet and fix
the way the EH ioctls work for real, but until that happens we fail these
explicit requests here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
Cc: Hiral Patel <hiralpat@cisco.com>
Cc: Suma Ramars <sramars@cisco.com>
Cc: Brian Uchino <buchino@cisco.com>

scsi: add support for a blk-mq based I/O path.

This patch adds support for an alternate I/O path in the scsi midlayer
which uses the blk-mq infrastructure instead of the legacy request code.

Use of blk-mq is fully transparent to drivers, although for now a host
template field is provided to opt out of blk-mq usage in case any unforseen
incompatibilities arise.

In general replacing the legacy request code with blk-mq is a simple and
mostly mechanical transformation.  The biggest exception is the new code
that deals with the fact the I/O submissions in blk-mq must happen from
process context, which slightly complicates the I/O completion handler.
The second biggest differences is that blk-mq is build around the concept
of preallocated requests that also include driver specific data, which
in SCSI context means the scsi_cmnd structure.  This completely avoids
dynamic memory allocations for the fast path through I/O submission.

Due the preallocated requests the MQ code path exclusively uses the
host-wide shared tag allocator instead of a per-LUN one.  This only
affects drivers actually using the block layer provided tag allocator
instead of their own.  Unlike the old path blk-mq always provides a tag,
although drivers don't have to use it.

For now the blk-mq path is disable by defauly and must be enabled using
the "use_blk_mq" module parameter.  Once the remaining work in the block
layer to make blk-mq more suitable for slow devices is complete I hope
to make it the default and eventually even remove the old code path.

Based on the earlier scsi-mq prototype by Nicholas Bellinger.

Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and
various sugestions and code contributions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scatterlist: allow chaining to preallocated chunks

Blk-mq drivers usually preallocate their S/G list as part of the request,
but if we want to support the very large S/G lists currently supported by
the SCSI code that would tie up a lot of memory in the preallocated request
pool. Add support to the scatterlist code so that it can initialize a
S/G list that uses a preallocated first chunks and dynamically allocated
additional chunks. That way the scsi-mq code can preallocate a first
page worth of S/G entries as part of the request, and dynamically extend
the S/G list when needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: unwind blk_end_request_all and blk_end_request_err calls

Replace the calls to the various blk_end_request variants with opencode
equivalents. Blk-mq is using a model that gives the driver control
between the bio updates and the actual completion, and making the old
code follow that same model allows us to keep the code more similar for
both paths.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: only maintain target_blocked if the driver has a target queue limit

This saves us an atomic operation for each I/O submission and completion
for the usual case where the driver doesn't set a per-target can_queue
value. Only a few iscsi hardware offload drivers set the per-target
can_queue value at the moment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: fix the {host,target,device}_blocked counter mess

Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted. Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.

With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values. Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: convert device_busy to atomic_t

Avoid taking the queue_lock to check the per-device queue limit. Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock round trip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: convert host_busy to atomic_t

Avoid taking the host-wide host_lock to check the per-host queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: convert target_busy to an atomic_t

Avoid taking the host-wide host_lock to check the per-target queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: push host_lock down into scsi_{host,target}_queue_ready

Prepare for not taking a host-wide lock in the dispatch path by pushing
the lock down into the places that actually need it. Note that this
patch is just a preparation step, as it will actually increase lock
roundtrips and thus decrease performance on its own.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: set ->scsi_done before calling scsi_dispatch_cmd

The blk-mq code path will set this to a different function, so make the
code simpler by setting it up in a legacy-request specific place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: centralize command re-queueing in scsi_dispatch_fn

Make sure we only have the logic for requeing commands in one place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: split __scsi_queue_insert

Factor out a helper to set the _blocked values, which we'll reuse for the
blk-mq code path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>

scsi: add scsi_setup_cmnd helper

Factor out command setup code that will be shared with the blk-mq code path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>

scsi: mark scsi_setup_blk_pc_cmnd static

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

sd: split sd_init_command

Factor out a function to initialize regular read/write commands and leave
sd_init_command as a simple dispatcher to the different prepare routines.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

sd: retry discard commands

Currently cmd->allowed is initialized from rq->retries for discard
commands, but retries is always 0 for non-BLOCK_PC requests. Set it
to the standard number of retries instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

sd: retry write same commands

Currently cmd->allowed is initialized from rq->retries for write same
commands, but retries is always 0 for non-BLOCK_PC requests. Set it
to the standard number of retries instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

sd: don't use scsi_setup_blk_pc_cmnd for discard requests

Simplify handling of discard requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

sd: don't use scsi_setup_blk_pc_cmnd for write same requests

Simplify handling of write same requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>

sd: don't use scsi_setup_blk_pc_cmnd for flush requests

Simplify handling of flush requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Also rename scsi_setup_flush_cmnd to sd_setup_flush_cmnd for consistency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

scsi: set sc_data_direction in common code

The data direction fiel in the SCSI command is derived only from the block
request structure. Move setting it up into common code instead of
duplicating it in the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

scsi: restructure command initialization for TYPE_FS requests

We should call the device handler prep_fn for all TYPE_FS requests,
not just simple read/write calls that are handled by the disk driver.

Restructure the common I/O code to call the prep_fn handler and zero
out the CDB, and just leave the call to scsi_init_io to the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

scsi: move the nr_phys_segments assert into scsi_init_io

scsi_init_io should only be called for requests that transfer data,
so move the assert that a request has segments from the callers into
scsi_init_io.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

scsi_lib: remove the description string in scsi_io_completion()

During IO with fabric faults, one generally sees several "Unhandled error
code" messages in the syslog as shown below:

sd 4:0:6:2: [sdbw] Unhandled error code
sd 4:0:6:2: [sdbw] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 4:0:6:2: [sdbw] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdbw, sector 0

This comes from scsi_io_completion (in scsi_lib.c) while handling error
codes other than DID_RESET or not deferred sense keys i.e. this is
actually handled by the SCSI mid layer. But what gets displayed here is
"Unhandled error code" which is quite misleading as it indicates
something that is not addressed by the mid layer.

The description string is based on the sense key and sometimes on the
additional sense code;
since the ACTION_FAIL case always prints the sense key and the
additional sense code, this patch removes the description string
completely because it does not add useful information.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: cleanup switch in scsi_adjust_queue_depth

While checking what scsi_adjust_queue_depth() did I thought its switch
statement could be clearer:

- remove redundant assignment (to sdev->queue_depth)
- re-order cases (thus removing the fall-through)

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Tested-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: remove various exports that were only used by scsi_tgt

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

tgt: defconfig cleanup

Because of the removal of the scsi_tgt kernel module, the kbuild variables
CONFIG_SCSI_TGT, CONFIG_SCSI_SRP_TGT_ATTRS and CONFIG_SCSI_FC_TGT_ATTRS
are obsolete. This patch removes these variables. This patch is the result
of the following command:

find -name '*defconfig' | while read f; do grep -vwE 'CONFIG_SCSI_TGT|CONFIG_SCSI_SRP_TGT_ATTRS|CONFIG_SCSI_FC_TGT_ATTRS|CONFIG_SRP' $f >/tmp/t && mv /tmp/t $f; done

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

tgt: removal

Now that the ibmvstgt driver as the only user of scsi_tgt is gone, the
scsi_tgt kernel module, the CONFIG_SCSI_TGT, CONFIG_SCSI_SRP_TGT_ATTRS and
CONFIG_SCSI_FC_TGT_ATTRS kbuild variable, the scsi_host_template
transfer_response method are no longer needed.

[hch: minor updates to the current tree, changelog update]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

libsrp: removal

Remove the libsrp module which was only used by the now removed ibmvstgt
driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

ibmvstgt: remove

The IBM virtual SCSI protocol has been obsoleted by ibmvfc, and there
are no reported of the driver left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

scsi: use dev_printk variants where possible

Using dev_printk variants prefixes the logging message with
the originating device, which makes debugging easier.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: use dev_printk() variants for ioctl

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: Implement st_printk()

Update the st driver to use dev_printk() variants instead of
plain printk(); this will prefix logging messages with the
appropriate device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: Implement ch_printk()

Update the ch driver to use dev_printk() variants instead of
plain printk(); this will prefix logging messages with the
appropriate device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: Implement sg_printk()

Update the sg driver to use dev_printk() variants instead of
plain printk(); this will prefix logging messages with the
appropriate device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Doug Gilbert <dgilbert@interlog.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: Implement sr_printk()

Update the sr driver to use dev_printk() variants instead of
plain printk(); this will prefix logging messages with the
appropriate device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi_scan: Fixup scsilun_to_int()

scsilun_to_int() has an error which prevents it from generating
correct LUN numbers for 64bit values.
Also we should remove the misleading comment about portions of
the LUN being ignored; the initiator should treat the LUN as
an opaque value.
And, finally, the example given should use the correct
prefix (here: extended flat space addressing scheme).

This patch includes the modifications suggested by
Bart van Assche.

Cc: Bart van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: use 64-bit value for 'max_luns'

Now that we're using 64-bit LUNs internally we need to increase
the size of max_luns to 64 bits, too.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

Add module param type 'ullong'

Some driver might want to pass in an 64-bit value, so introduce
a module param type 'ullong'.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: use 64-bit LUNs

The SCSI standard defines 64-bit values for LUNs, and large arrays
employing large or hierarchical LUN numbers become more and more
common.

So update the linux SCSI stack to use 64-bit LUN numbers.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

qla2xxx: Restrict max_lun to 16-bit for older HBAs

Older HBAs are only capable of supporting 16-bit LUNs,
so we need to make sure to adjust max_lun accordingly.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Chad Dupuis <chad.dupuis@qlogic.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi_scan: Restrict sequential scan to 256 LUNs

Sequential scan for more than 256 LUNs is very fragile as
LUNs might not be numbered sequentially after that point.

SAM revisions later than SCSI-3 impose a structure on
LUNs larger than 256, making LUN numbers between 256
and 16384 illegal.
SCSI-3, however allows for plain 64-bit numbers with
no internal structure.

So restrict sequential LUN scan to 256 LUNs and add a
new blacklist flag 'BLIST_SCSI3LUN' to scan up to
max_lun devices.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: Remove CONFIG_SCSI_MULTI_LUN

Obsolete; either use 'max_lun' if the host supports only a
limited number of LUNs or BLIST_NOLUN if the target has
problems addressing more than one LUN.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sg: O_EXCL and other lock handling

This addresses a problem reported by Vaughan Cao concerning
the correctness of the O_EXCL logic in the sg driver. POSIX
doesn't defined O_EXCL semantics on devices but "allow only
one open file descriptor at a time per sg device" is a rough
definition. The sg driver's semantics have been to wait
on an open() when O_NONBLOCK is not given and there are
O_EXCL headwinds. Nasty things can happen during that wait
such as the device being detached (removed). So multiple
locks are reworked in this patch making it large and hard
to break down into digestible bits.

This patch is against Linus's current git repository which
doesn't include any sg patches sent in the last few weeks.
Hence this patch touches as little as possible that it
doesn't need to and strips out most SCSI_LOG_TIMEOUT()
changes in v3 because Hannes said he was going to rework all
that stuff.

The sg3_utils package has several test programs written to
test this patch. See examples/sg_tst_excl*.cpp .

Not all the locks and flags in sg have been re-worked in
this patch, notably sg_request::done . That can wait for
a follow-up patch if this one meets with approval.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

sg: add SG_FLAG_Q_AT_TAIL flag

When the SG_IO ioctl was copied into the block layer and
later into the bsg driver, subtle differences emerged.

One difference is the way injected commands are queued through
the block layer (i.e. this is not SCSI device queueing nor SATA
NCQ). Summarizing:
   - SG_IO in the block layer: blk_exec*(at_head=false)
   - sg SG_IO: at_head=true
   - bsg SG_IO: at_head=true

Some time ago Boaz Harrosh introduced a sg v4 flag called
BSG_FLAG_Q_AT_TAIL to override the bsg driver default.
This patch does the equivalent for the sg driver.

ChangeLog:
     Introduce SG_FLAG_Q_AT_TAIL flag to cause commands
     to be injected into the block layer with
     at_head=false.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sg: relax 16 byte cdb restriction

- remove the 16 byte CDB (SCSI command) length limit from the sg driver
   by handling longer CDBs the same way as the bsg driver. Remove comment
   from sg.h public interface about the cmd_len field being limited to 16
   bytes.
- remove some dead code caused by this change
- cleanup comment block at the top of sg.h, fix urls

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: Limit transfer length

Until now the per-command transfer length has exclusively been gated by
the max_sectors parameter in the scsi_host template. Given that the size
of this parameter has been bumped to an unsigned int we have to be
careful not to exceed the target device's capabilities.

If the if the device specifies a Maximum Transfer Length in the Block
Limits VPD we'll use that value. Otherwise we'll use 0xffffffff for
devices that have use_16_for_rw set and 0xffff for the rest. We then
combine the chosen disk limit with max_sectors in the host template. The
smaller of the two will be used to set the max_hw_sectors queue limit.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: bad return code of init_sd

In init_sd function, if kmem_cache_create or mempool_create_slab_pools
calls fail, the error will not be correclty reported because
class_register previously set the value of err to 0.

Signed-off-by: Clément Calmels <clement.calmels@free.fr>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: notify block layer when using temporary change to cache_type

This is a fix for commit 39c60a0948cc06139e2fbfe084f83cb7e7deae3b

  "sd: fix array cache flushing bug causing performance problems"

We must notify the block layer via q->flush_flags after a temporary change
of the cache_type to write through.  Without this, a SYNCHRONIZE CACHE
command will still be generated.  This patch factors out a helper that
can be called from sd_revalidate_disk and cache_type_store.

Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi_debug: allow huge transfer length for read/write commands

This change enables to test read/write commands with huge transfer
length such as 1GB.  For example:

# modprobe scsi_debug dev_size_mb=1024 clustering=1 opts=1
# cat /sys/block/$DEV/queue/max_hw_sectors_kb > \
/sys/block/$DEV/queue/max_sectors_kb
# fio --name=test --rw=write --bs=1g --size=1g --filename=/dev/$DEV \
--mem=mmaphuge  --direct=1

The data type of max_sectors in scsi_host_template has been extended
to unsigned int by the previous change.  So we can increase it from
0xffff to 0xffffffff to allow such huge transfer length.

Also, this increases sg_tablesize and max_segment_size, otherwise the
maximum transfer length is limited to 64MB.
(sg_tablesize * max_segment_size = 256 * 256KB)

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: increase upper limit for max_sectors

max_sectors in struct Scsi_Host specifies maximum number of sectors
allowed in a single SCSI command. The data type of max_sectors is
unsigned short, so the maximum transfer length per SCSI command is
limited to less than 256MB in 4096-bytes sector size. (0xffff * 4096)

This commit increases the SCSI mid level's limitation for max_sectors
upto the block layer's limitation for max_hw_sectors by extending the
data type of max_sectors in struct Scsi_Host and scsi_host_template,
so that SCSI lower level drivers can specify more than 0xffff.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sd: use READ_16 or WRITE_16 when transfer length is greater than 0xffff

This change makes the scsi disk driver handle the requests whose
transfer length is greater than 0xffff with READ_16 or WRITE_16.

However, this is a preparation for extending the data type of
max_sectors in struct Scsi_Host and scsi_host_template. So, it is
impossible to happen this condition for now, because SCSI low-level
drivers can not specify max_sectors greater than 0xffff due to the
data type limitation.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

sg: prevent integer overflow when converting from sectors to bytes

This prevents integer overflow when converting the request queue's
max_sectors from sectors to bytes. However, this is a preparation for
extending the data type of max_sectors in struct Scsi_Host and
scsi_host_template. So, it is impossible to happen this integer
overflow for now, because SCSI low-level drivers can not specify
max_sectors greater than 0xffff due to the data type limitation.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: remove two cancel_delayed_work() calls from the mid-layer

scsi_put_command() is either invoked before blk_start_request() or
after block layer processing has completed.  scsi_cmnd.abort_work
is scheduled from inside the SCSI timeout handler.  The block layer
guarantees that either the regular completion handler
(softirq_done_fn()) or the timeout handler (rq_timed_out_fn()) is
invoked but not both. This means that scsi_put_command() is never
invoked while abort_work is scheduled.  Hence remove the
cancel_delayed_work() call from scsi_put_command().

Similarly, scsi_abort_command() is only invoked from the SCSI
timeout handler. If scsi_abort_command() is invoked for a SCSI
command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
hence that scsi_cmnd.abort_work is no longer pending. Hence also
remove the cancel_delayed_work() call from scsi_abort_command().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>

scsi: handle flush errors properly

Flush commands don't transfer data and thus need to be special cased
in the I/O completion handler so that we can propagate errors to
the block layer and filesystem.

Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Reported-by: Steven Haber <steven@qumulo.com>
Tested-by: Steven Haber <steven@qumulo.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>

Merge tag 'stable/for-linus-3.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
"Two fixes found during migration of PV guests.  David would be the one
  doing this pull but he is on vacation.

  Fixes:
   - fix console deadlock when resuming PV guests
   - fix regression hit when ballooning and resuming PV guests"

* tag 'stable/for-linus-3.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: set ballooned out pages as invalid in p2m
  xen/manage: fix potential deadlock when resuming the console

Merge tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"A few more fixes for ftrace infrastructure.

  I was cleaning out my INBOX and found two fixes from zhangwei from a
  year ago that were lost in my mail.  These fix an inconsistency
  between trace_puts() and the way trace_printk() works.  The reason
  this is important to fix is because when trace_printk() doesn't have
  any arguments, it turns into a trace_puts().  Not being able to enable
  a stack trace against trace_printk() because it does not have any
  arguments is quite confusing.  Also, the fix is rather trivial and low
  risk.

  While porting some changes to PowerPC I discovered that it still has
  the function graph tracer filter bug that if you also enable stack
  tracing the function graph tracer filter is ignored.  I fixed that up.

  Finally, Martin Lau, fixed a bug that would cause readers of the
  ftrace ring buffer to block forever even though it was suppose to be
  NONBLOCK"

This also includes the fix from an earlier pull request:

"Oleg Nesterov fixed a memory leak that happens if a user creates a
  tracing instance, sets up a filter in an event, and then removes that
  instance.  The filter allocates memory that is never freed when the
  instance is destroyed"

* tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Fix polling on trace_pipe
  tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs
  tracing: Fix graph tracer with stack tracer on other archs
  tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
  tracing: instance_rmdir() leaks ftrace_event_file->filter

Merge tag 'for-linus-20140716' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:

- Fix ELM suspend/resume

- Reduce warnings if NAND ECC is too weak

- Add CFI support for Sharp LH28F640BF NOR

   The last fix is coming in because other commits in the 3.16 cycle
   depended on this support.

* tag 'for-linus-20140716' of git://git.infradead.org/linux-mtd:
  mtd: cfi_cmdset_0001.c: add support for Sharp LH28F640BF NOR
  mtd: nand: reduce the warning noise when the ECC is too weak
  mtd: devices: elm: fix elm_context_save() and elm_context_restore() functions

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"A cpufreq lockup fix and a compiler warning fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix compiler warnings
x86, tsc: Fix cpufreq lockup

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Tooling fixes and an Intel PMU driver fixlet"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do not allow optimized switch for non-cloned events
  perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  perf symbols: Get kernel start address by symbol name
  perf tools: Fix segfault in cumulative.callchain report

Merge tag 'sound-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Things seem to calm down so far, just a small few HD-audio fixes
  (regression fixes and a new codec ID addition) popping up"

* tag 'sound-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix broken PM due to incomplete i915 initialization
  ALSA: hda - Revert stream assignment order for Intel controllers
  ALSA: hda - Add new GPU codec ID 0x10de0070 to snd-hda
  ALSA: hda: Fix build warning

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull quota fix from Jan Kara:
"Fix locking of dquot shrinker"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: missing lock in dqcache_shrink_scan()

Merge tag 'gpio-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
"Fix up some merge confusion from the merge window"

* tag 'gpio-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mcp23s08: Eliminates redundant checking.

ring-buffer: Fix polling on trace_pipe

ring_buffer_poll_wait() should always put the poll_table to its wait_queue
even there is immediate data available.  Otherwise, the following epoll and
read sequence will eventually hang forever:

1. Put some data to make the trace_pipe ring_buffer read ready first
2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
3. epoll_wait()
4. read(trace_pipe_fd) till EAGAIN
5. Add some more data to the trace_pipe ring_buffer
6. epoll_wait() -> this epoll_wait() will block forever

~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
  ring_buffer_poll_wait() returns immediately without adding poll_table,
  which has poll_table->_qproc pointing to ep_poll_callback(), to its
  wait_queue.
~ During the epoll_wait() call in step 3 and step 6,
  ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
  because the poll_table->_qproc is NULL and it is how epoll works.
~ When there is new data available in step 6, ring_buffer does not know
  it has to call ep_poll_callback() because it is not in its wait queue.
  Hence, block forever.

Other poll implementation seems to call poll_wait() unconditionally as the very
first thing to do.  For example, tcp_poll() in tcp.c.

Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com
Cc: stable@vger.kernel.org # 2.6.27
Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled"
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Martin Lau <kafai@fb.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

quota: missing lock in dqcache_shrink_scan()

Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API)
accidentally removed locking from quota shrinker. Fix it -
dqcache_shrink_scan() should use dq_list_lock to protect the
scan on free_dquots list.

CC: stable@vger.kernel.org
Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>

tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs

The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
so add it, to be consistent with __trace_printk/__trace_bprintk.
Those functions are all called by the same function: trace_printk().

Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
"This contains miscellaneous fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: replace count*size kzalloc by kcalloc
  fuse: release temporary page if fuse_writepage_locked() failed
  fuse: restructure ->rename2()
  fuse: avoid scheduling while atomic
  fuse: handle large user and group ID
  fuse: inode: drop cast
  fuse: ignore entry-timeout on LOOKUP_REVAL
  fuse: timeout comparison fix

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Bluetooth pairing fixes from Johan Hedberg.

2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB,
    from Max Stepanov.

3) New iwlwifi chip IDs, from Oren Givon.

4) bnx2x driver reads wrong PCI config space MSI register, from Yijing
    Wang.

5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu.

6) Fix double SKB free in openvswitch, from Andy Zhou.

7) Fix sk_dst_set() being racey with UDP sockets, leading to strange
    crashes, from Eric Dumazet.

8) Interpret the NAPI budget correctly in the new systemport driver,
    from Florian Fainelli.

9) VLAN code frees percpu stats in the wrong place, leading to crashes
    in the get stats handler.  From Eric Dumazet.

10) TCP sockets doing a repair can crash with a divide by zero, because
    we invoke tcp_push() with an MSS value of zero.  Just skip that part
    of the sendmsg paths in repair mode.  From Christoph Paasch.

11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai.

12) Don't ignore path MTU icmp messages with a zero mtu, machines out
    there still spit them out, and all of our per-protocol handlers for
    PMTU can cope with it just fine.  From Edward Allcutt.

13) Some NETDEV_CHANGE notifier invocations were not passing in the
    correct kind of cookie as the argument, from Loic Prylli.

14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul
    Maloy.

15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix
    from Dmitry Popov.

16) Fix skb->sk assigned without taking a reference to 'sk' in
    appletalk, from Andrey Utkin.

17) Fix some info leaks in ULP event signalling to userspace in SCTP,
    from Daniel Borkmann.

18) Fix deadlocks in HSO driver, from Olivier Sobrie.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
  hso: fix deadlock when receiving bursts of data
  hso: remove unused workqueue
  net: ppp: don't call sk_chk_filter twice
  mlx4: mark napi id for gro_skb
  bonding: fix ad_select module param check
  net: pppoe: use correct channel MTU when using Multilink PPP
  neigh: sysctl - simplify address calculation of gc_* variables
  net: sctp: fix information leaks in ulpevent layer
  MAINTAINERS: update r8169 maintainer
  net: bcmgenet: fix RGMII_MODE_EN bit
  tipc: clear 'next'-pointer of message fragments before reassembly
  r8152: fix r8152_csum_workaround function
  be2net: set EQ DB clear-intr bit in be_open()
  GRE: enable offloads for GRE
  farsync: fix invalid memory accesses in fst_add_one() and fst_init_card()
  igb: do a reset on SR-IOV re-init if device is down
  igb: Workaround for i210 Errata 25: Slow System Clock
  usbnet: smsc95xx: add reset_resume function with reset operation
  dp83640: Always decode received status frames
  r8169: disable L23
  ...

tracing: Fix graph tracer with stack tracer on other archs

Running my ftrace tests on PowerPC, it failed the test that checks
if function_graph tracer is affected by the stack tracer. It was.
Looking into this, I found that the update_function_graph_func()
must be called even if the trampoline function is not changed.
This is because archs like PowerPC do not support ftrace_ops being
passed by assembly and instead uses a helper function (what the
trampoline function points to). Since this function is not changed
even when multiple ftrace_ops are added to the code, the test that
falls out before calling update_function_graph_func() will miss that
the update must still be done.

Call update_function_graph_function() for all calls to
update_ftrace_function()

Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs

Currently trace option stacktrace is not applicable for
trace_printk with constant string argument, the reason is
in __trace_puts/__trace_bputs ftrace_trace_stack is missing.

In contrast, when using trace_printk with non constant string
argument(will call into __trace_printk/__trace_bprintk), then
trace option stacktrace is workable, this inconstant result
will confuses users a lot.

Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ALSA: hda - Fix broken PM due to incomplete i915 initialization

When the initialization of Intel HDMI controller fails due to missing
i915 kernel symbols (e.g. HD-audio is built in while i915 is module),
the driver discontinues the probe. However, since the probe was done
asynchronously, the driver object still remains, thus the relevant PM
ops are still called at suspend/resume. This results in the bad access
to the incomplete audio card object, eventually leads to Oops or stall
at PM.

This patch adds the missing checks of chip->init_failed flag at each
PM callback in order to fix the problem above.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

hso: fix deadlock when receiving bursts of data

When the module sends bursts of data, sometimes a deadlock happens in
the hso driver when the tty buffer doesn't get the chance to be flushed
quickly enough.

Remove the endless while loop in function put_rxbuf_data() which is
called by the urb completion handler.
If there isn't enough room in the tty buffer, discards all the data
received in the URB.

Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Dan Williams <dcbw@redhat.com>
Cc: Jan Dumon <j.dumon@option.com>
Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hso: remove unused workqueue

The workqueue "retry_unthrottle_workqueue" is not scheduled anywhere
in the code. So, remove it.

Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Stefan Richter:
"The 1394 drivers cannot and are not supposed to be built on platforms
  which don't provide the DMA mapping API (regression since v3.16-rc1
  with CONFIG_COMPILE_TEST=y on some architectures)"

* tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: IEEE 1394 (FireWire) support should depend on HAS_DMA

Merge git://git.kvack.org/~bcrl/aio-fixes

Pull another aio fix from Ben LaHaise:
"put_reqs_available() can now be called from within irq context, which
  means that it (and its sibling function get_reqs_available()) now need
  to be irq-safe, not just preempt-safe"

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: protect reqs_available updates from changes in interrupt handlers

net/l2tp: don't fall back on UDP [get|set]sockopt

The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

"If we wanted this to work, it'd have to look up the tunnel and then
use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: Phil Turnbull <phil.turnbull@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net: ppp: don't call sk_chk_filter twice

Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use
sk_unattached_filter api") causes sk_chk_filter() to be called twice when
setting a PPP pass or active filter. This applies to both the generic PPP
subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP
subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from
within get_filter(). The second one is through the call chain

  ppp_ioctl() or isdn_ppp_ioctl()
  --> sk_unattached_filter_create()
      --> __sk_prepare_filter()
          --> sk_chk_filter()

The first call from within get_filter() should be deleted as get_filter() is
called just before calling sk_unattached_filter_create() later on, which
eventually calls sk_chk_filter() anyway.

For 3.15.x, this proposed change is a bugfix rather than a pure optimization as
in that branch, sk_chk_filter() may replace filter codes by other codes which
are not recognized when executing sk_chk_filter() a second time. So with
3.15.x, if sk_chk_filter() is called twice, the second invocation may yield
EINVAL (this depends on the filter codes found in the filter to be set, but
because the replacement is done for frequently used codes, this is almost
always the case). The net effect is that setting pass and/or active PPP filters
does not work anymore, since sk_unattached_filter_create() always returns
EINVAL due to the second call to sk_chk_filter(), regardless whether the filter
was originally sane or not.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: mark napi id for gro_skb

Napi id was not marked for gro_skb, this will lead rx busy loop won't
work correctly since they stack never try to call low latency receive
method because of a zero socket napi id. Fix this by marking napi id
for gro_skb.

The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
(from 20531.68 to 30610.88).

Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix ad_select module param check

Obvious copy/paste error when I converted the ad_select to the new
option API. "lacp_rate" there should be "ad_select" so we can get the
proper value.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>
Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option
API")
Reported-by: Karim Scheik <karim.scheik@prisma-solutions.at>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pppoe: use correct channel MTU when using Multilink PPP

The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():

/*
* hdrlen includes the 2-byte PPP protocol field, but the
* MTU counts only the payload excluding the protocol field.
* (RFC1661 Section 2)
*/
mtu = pch->chan->mtu - (hdrlen - 2);

However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.

In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)

Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:

   2948 (echo payload)
+    8 (ICMPv4 header)
+   20 (IPv4 header)
---------------------
   2976 (PPP payload)

These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:

   1489 (PPP payload)
+    4 (MP header)
+    2 (PPP protocol field for the MP payload (0x3d))
+    6 (PPPoE header)
--------------------------
   1501 (Ethernet payload)

This packet exceeds the link MTU and is discarded.

If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A

ping -s 2948 -c 1 192.168.15.254

leads to the smaller fragment that is correctly received on the server side:

(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [end], length 1492

and to the bigger fragment that is not received on the server side:

(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
  Flags [begin], length 1493

With the patch below, we correctly obtain three fragments:

52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
  Flags [end], length 5

And the ICMPv4 echo request is successfully received at the server side:

IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
  length 2976)
    192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
      length 2956

The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: sysctl - simplify address calculation of gc_* variables

The code in neigh_sysctl_register() relies on a specific layout of
struct neigh_table, namely that the 'gc_*' variables are directly
following the 'parms' member in a specific order. The code, though,
expresses this in the most ugly way.

Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
the table. This way we can refer to the 'gc_*' variables directly.

Similarly seen in the grsecurity patch, written by Brad Spengler.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: fix information leaks in ulpevent layer

While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

  The sctp_sndrcvinfo structure is defined below:

  struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;
    <-- 2 bytes hole  -->
    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;
  };

* 6.1.3. SCTP_REMOTE_ERROR:

  A remote peer may send an Operation Error message to its peer.
  This message indicates a variety of error conditions on an
  association. The entire ERROR chunk as it appears on the wire
  is included in an SCTP_REMOTE_ERROR event. Please refer to the
  SCTP specification [RFC4960] and any extensions for a list of
  possible error formats. An SCTP error notification has the
  following format:

  struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;
    <-- 2 bytes hole  -->
    sctp_assoc_t sre_assoc_id;
    uint8_t  sre_data[];
  };

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tracing: instance_rmdir() leaks ftrace_event_file->filter

instance_rmdir() path destroys the event files but forgets to free
file->filter. Change remove_event_file_dir() to free_event_filter().

Link: http://lkml.kernel.org/p/20140711190638.GA19517@redhat.com
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
Cc: stable@vger.kernel.org # 3.11+
Fixes: f6a84bdc75b5 "tracing: Introduce remove_event_file_dir()"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>