git.karo-electronics.de Git - karo-tx-linux.git/log

crypto: caam - fix DMA direction mismatch in ahash_done_ctx_dst

caam_jr ffe301000.jr: DMA-API: device driver frees DMA memory with different direction [device address=0x00000000062ad1ac] [size=28 bytes] [mapped with DMA_FROM_DEVICE] [unmapped with DMA_TO_DEVICE]
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:1131
Modules linked in: caamhash(+) [last unloaded: caamhash]
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W     3.16.0-rc1 #23
task: c0789380 ti: effd2000 task.ti: c07d6000
NIP: c02885cc LR: c02885cc CTR: c02d7020
REGS: effd3d50 TRAP: 0700   Tainted: G        W      (3.16.0-rc1)
MSR: 00021002 <CE,ME>  CR: 44048082  XER: 00000000

GPR00: c02885cc effd3e00 c0789380 000000c6 c1de3478 c1de382c 00000000 00021002
GPR08: 00000007 00000000 01660000 0000012f 84048082 00000000 00000018 c07db080
GPR16: 00000006 00000100 0000002c eee567e0 c07e1e10 c0da1180 00029002 c0d96708
GPR24: c07a0000 c07a4acc effd3e58 ee29b140 0000001c ee130210 00000000 c0d96700
NIP [c02885cc] check_unmap+0x47c/0xab0
LR [c02885cc] check_unmap+0x47c/0xab0
Call Trace:
[effd3e00] [c02885cc] check_unmap+0x47c/0xab0 (unreliable)
[effd3e50] [c0288c78] debug_dma_unmap_page+0x78/0x90
[effd3ed0] [f9350974] ahash_done_ctx_dst+0xa4/0x200 [caamhash]
[effd3f00] [c0429640] caam_jr_dequeue+0x1c0/0x280
[effd3f50] [c002c94c] tasklet_action+0xcc/0x1a0
[effd3f80] [c002cb30] __do_softirq+0x110/0x220
[effd3fe0] [c002cf34] irq_exit+0xa4/0xe0
[effd3ff0] [c000d834] call_do_irq+0x24/0x3c
[c07d7d50] [c000489c] do_IRQ+0x8c/0x110
[c07d7d70] [c000f86c] ret_from_except+0x0/0x18
--- Exception: 501 at _raw_spin_unlock_irq+0x30/0x50
    LR = _raw_spin_unlock_irq+0x2c/0x50
[c07d7e40] [c0053084] finish_task_switch+0x74/0x130
[c07d7e60] [c058f278] __schedule+0x238/0x620
[c07d7f70] [c058fb50] schedule_preempt_disabled+0x10/0x20
[c07d7f80] [c00686a0] cpu_startup_entry+0x100/0x1b0
[c07d7fb0] [c074793c] start_kernel+0x338/0x34c
[c07d7ff0] [c00003d8] set_ivor+0x140/0x17c
Instruction dump:
7d495214 7d294214 806a0010 80c90010 811a001c 813a0020 815a0024 90610008
3c60c06d 90c1000c 3863f764 4830d131 <0fe00000> 3c60c06d 3863f0f4 4830d121
---[ end trace db1fae088c75c270 ]---
Mapped at:
[<f9352454>] ahash_update_first+0x5b4/0xba0 [caamhash]
[<c022ff28>] __test_hash+0x288/0x6c0
[<c0230388>] test_hash+0x28/0xb0
[<c02304a4>] alg_test_hash+0x94/0xc0
[<c022fa94>] alg_test+0x114/0x2e0

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - fix DMA unmapping error in hash_digest_key

Key being hashed is unmapped using the digest size instead of
initial length:

caam_jr ffe301000.jr: DMA-API: device driver frees DMA memory with different size [device address=0x000000002eeedac0] [map size=80 bytes] [unmap size=20 bytes]
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:1090
Modules linked in: caamhash(+)
CPU: 0 PID: 1327 Comm: cryptomgr_test Not tainted 3.16.0-rc1 #23
task: eebda5d0 ti: ee26a000 task.ti: ee26a000
NIP: c0288790 LR: c0288790 CTR: c02d7020
REGS: ee26ba30 TRAP: 0700 Not tainted (3.16.0-rc1)
MSR: 00021002 <CE,ME> CR: 44022082 XER: 00000000

GPR00: c0288790 ee26bae0 eebda5d0 0000009f c1de3478 c1de382c 00000000 00021002
GPR08: 00000007 00000000 01660000 0000012f 82022082 00000000 c07a1900 eeda29c0
GPR16: 00000000 c61deea0 000c49a0 00000260 c07e1e10 c0da1180 00029002 c0d9ef08
GPR24: c07a0000 c07a4acc ee26bb38 ee2765c0 00000014 ee130210 00000000 00000014
NIP [c0288790] check_unmap+0x640/0xab0
LR [c0288790] check_unmap+0x640/0xab0
Call Trace:
[ee26bae0] [c0288790] check_unmap+0x640/0xab0 (unreliable)
[ee26bb30] [c0288c78] debug_dma_unmap_page+0x78/0x90
[ee26bbb0] [f929c3d4] ahash_setkey+0x374/0x720 [caamhash]
[ee26bc30] [c022fec8] __test_hash+0x228/0x6c0
[ee26bde0] [c0230388] test_hash+0x28/0xb0
[ee26be00] [c0230458] alg_test_hash+0x48/0xc0
[ee26be20] [c022fa94] alg_test+0x114/0x2e0
[ee26bea0] [c022cd1c] cryptomgr_test+0x4c/0x60
[ee26beb0] [c00497a4] kthread+0xc4/0xe0
[ee26bf40] [c000f2fc] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
41de03e8 83da0020 3c60c06d 83fa0024 3863f520 813b0020 815b0024 80fa0018
811a001c 93c10008 93e1000c 4830cf6d <0fe00000> 3c60c06d 3863f0f4 4830cf5d
---[ end trace db1fae088c75c26c ]---
Mapped at:
[<f929c15c>] ahash_setkey+0xfc/0x720 [caamhash]
[<c022fec8>] __test_hash+0x228/0x6c0
[<c0230388>] test_hash+0x28/0xb0
[<c0230458>] alg_test_hash+0x48/0xc0
[<c022fa94>] alg_test+0x114/0x2e0

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - fix "failed to check map error" DMA warnings

Use dma_mapping_error for every dma_map_single / dma_map_page.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - fix typo in dma_mapping_error

dma_mapping_error checks for an incorrect DMA address:
s/ctx->sh_desc_enc_dma/ctx->sh_desc_dec_dma

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - set coherent_dma_mask

Replace dma_set_mask with dma_set_mask_and_coherent, since both
streaming and coherent DMA mappings are being used.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - avoid DMA mapping from text, rodata, stack

With DMA_API_DEBUG set, following warnings are emitted
(tested on CAAM accelerator):
DMA-API: device driver maps memory from kernel text or rodata
DMA-API: device driver maps memory from stack
and the culprits are:
-key in __test_aead and __test_hash
-result in __test_hash

MAX_KEYLEN is changed to accommodate maximum key length from
existing test vectors in crypto/testmgr.h (131 bytes) and rounded.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Base AXI DMA cache settings on device tree

The default cache operations for ARM64 were changed during 3.15.
To use coherent operations a "dma-coherent" device tree property
is required. If that property is not present in the device tree
node then the non-coherent operations are assigned for the device.

Add support to the ccp driver to assign the AXI DMA cache settings
based on whether the "dma-coherent" property is present in the device
node. If present, use settings that work with the caches. If not
present, use settings that do not look at the caches.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - drbg_exit() can be static

CC: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - remove an unneeded cast

The cast to (unsigned int *) doesn't hurt anything but it is pointless.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - Check for CAAM block presence before registering with crypto layer

The layer which registers with the crypto API should check for the presence of
the CAAM device it is going to use. If the platform's device tree doesn't have
the required CAAM node, the layer should return an error and not register the
algorithms with crypto API layer.

Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - HMAC-SHA1 DRBG has crypto strength of 128 bits

The patch corrects the security strength of the HMAC-SHA1 DRBG to 128
bits. This strength defines the size of the seed required for the DRBG.
Thus, the patch lowers the seeding requirement from 256 bits to 128 bits
for HMAC-SHA1.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Mix a time stamp into DRBG state

The current locking approach of the DRBG tries to keep the protected
code paths very minimal. It is therefore possible that two threads query
one DRBG instance at the same time. When thread A requests random
numbers, a shadow copy of the DRBG state is created upon which the
request for A is processed. After finishing the state for A's request is
merged back into the DRBG state. If now thread B requests random numbers
from the same DRBG after the request for thread A is received, but
before A's shadow state is merged back, the random numbers for B will be
identical to the ones for A. Please note that the time window is very
small for this scenario.

To prevent that there is even a theoretical chance for thread A and B
having the same DRBG state, the current time stamp is provided as
additional information string for each new request.

The addition of the time stamp as additional information string implies
that now all generate functions must be capable to process a linked
list with additional information strings instead of a scalar.

CC: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Select correct DRBG core for stdrng

When the DRBG is initialized, the core is looked up using the DRBG name.
The name that can be used for the lookup is registered in
cra_driver_name. The cra_name value contains stdrng.

Thus, the lookup code must use crypto_tfm_alg_driver_name to obtain the
precise DRBG name and select the correct DRBG.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Call CTR DRBG DF function only once

The CTR DRBG requires the update function to be called twice when
generating a random number. In both cases, update function must process
the additional information string by using the DF function. As the DF
produces the same result in both cases, we can save one invocation of
the DF function when the first DF function result is reused.

The result of the DF function is stored in the scratchpad storage. The
patch ensures that the scratchpad is not cleared when we want to reuse
the DF result. For achieving this, the CTR DRBG update function must
know by whom and in which scenario it is called. This information is
provided with the reseed parameter to the update function.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Fix format string for debugging statements

The initial format strings caused warnings on several architectures. The
updated format strings now match the variable types.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
CC: Joe Perches <joe@perches.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - cleanup of preprocessor macros

The structure used to construct the module description line was marked
problematic by the sparse code analysis tool. The module line
description now does not contain any ifdefs to prevent error reports
from sparse.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qce - add dependancy to Kconfig

Make qce crypto driver depend on ARCH_QCOM and make
possible to test driver compilation.

Signed-off-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qce - fix sparse warnings

Fix few sparse warnings of type:
- sparse: incorrect type in argument
- sparse: incorrect type in initializer

Signed-off-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - Enabling multiple caam debug support for C29x platform

In the current setup debug file system enables us to debug the operational
details for only one CAAM. This patch adds the support for debugging multiple
CAAM's.

Signed-off-by: Nitesh Narayan Lal <b44382@freescale.com>
Signed-off-by: Vakul Garg <b16394@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: dts - Addition of missing SEC compatibile property in c29x device tree

The driver is compatible with SEC version 4.0, which was missing from
device tree resulting that the caam driver doesn't gets probed. Since
SEC is backward compatible with older versions, so this patch adds those
missing versions in c29x device tree.

Signed-off-by: Nitesh Narayan Lal <b44382@freescale.com>
Signed-off-by: Vakul Garg <b16394@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Use Kconfig to ensure at least one RNG option is set

This patch removes the build-time test that ensures at least one RNG
is set. Instead we will simply not build drbg if no options are set
through Kconfig.

This also fixes a typo in the name of the Kconfig option CRYTPO_DRBG
(should be CRYPTO_DRBG).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - use of kernel linked list

The DRBG-style linked list to manage input data that is fed into the
cipher invocations is replaced with the kernel linked list
implementation.

The change is transparent to users of the interfaces offered by the
DRBG. Therefore, no changes to the testmgr code is needed.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - fix memory corruption for AES192

For the CTR DRBG, the drbg_state->scratchpad temp buffer (i.e. the
memory location immediately before the drbg_state->tfm variable
is the buffer that the BCC function operates on. BCC operates
blockwise. Making the temp buffer drbg_statelen(drbg) in size is
sufficient when the DRBG state length is a multiple of the block
size. For AES192 this is not the case and the length for temp is
insufficient (yes, that also means for such ciphers, the final
output of all BCC rounds are truncated before used to update the
state of the DRBG!!).

The patch enlarges the temp buffer from drbg_statelen to
drbg_statelen + drbg_blocklen to have sufficient space.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ux500 - make interrupt mode plausible

The interrupt handler in the ux500 crypto driver has an obviously
incorrect way to access the data buffer, which for a while has
caused this build warning:

../ux500/cryp/cryp_core.c: In function 'cryp_interrupt_handler':
../ux500/cryp/cryp_core.c:234:5: warning: passing argument 1 of '__fswab32' makes integer from pointer without a cast [enabled by default]
     writel_relaxed(ctx->indata,
     ^
In file included from ../include/linux/swab.h:4:0,
                 from ../include/uapi/linux/byteorder/big_endian.h:12,
                 from ../include/linux/byteorder/big_endian.h:4,
                 from ../arch/arm/include/uapi/asm/byteorder.h:19,
                 from ../include/asm-generic/bitops/le.h:5,
                 from ../arch/arm/include/asm/bitops.h:340,
                 from ../include/linux/bitops.h:33,
                 from ../include/linux/kernel.h:10,
                 from ../include/linux/clk.h:16,
                 from ../drivers/crypto/ux500/cryp/cryp_core.c:12:
../include/uapi/linux/swab.h:57:119: note: expected '__u32' but argument is of type 'const u8 *'
static inline __attribute_const__ __u32 __fswab32(__u32 val)

There are at least two, possibly three problems here:
a) when writing into the FIFO, we copy the pointer rather than the
   actual data we want to give to the hardware
b) the data pointer is an array of 8-bit values, while the FIFO
   is 32-bit wide, so both the read and write access fail to do
   a proper type conversion
c) This seems incorrect for big-endian kernels, on which we need to
   byte-swap any register access, but not normally FIFO accesses,
   at least the DMA case doesn't do it either.

This converts the bogus loop to use the same readsl/writesl pair
that we use for the two other modes (DMA and polling). This is
more efficient and consistent, and probably correct for endianess.

The bug has existed since the driver was first merged, and was
probably never detected because nobody tried to use interrupt mode.
It might make sense to backport this fix to stable kernels, depending
on how the crypto maintainers feel about that.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-crypto@vger.kernel.org
Cc: Fabio Baltieri <fabio.baltieri@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: tcrypt - print cra driver name in tcrypt tests output

Print the driver name that is being tested. The driver name can be
inferred parsing /proc/crypto but having it in the output is
clearer

Signed-off-by: Luca Clementi <luca.clementi@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ARM: DT: qcom: Add Qualcomm crypto driver binding document

Here is Qualcomm crypto driver device tree binding documentation
to used as a reference example.

Signed-off-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qce - Build Qualcomm crypto driver

Modify crypto Kconfig and Makefile in order to build the qce
driver and adds qce Makefile as well.

Signed-off-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qce - Qualcomm crypto engine driver

The driver is separated by functional parts. The core part
implements a platform driver probe and remove callbaks.
The probe enables clocks, checks crypto version, initialize
and request dma channels, create done tasklet and init
crypto queue and finally register the algorithms into crypto
core subsystem.

- DMA and SG helper functions
implement dmaengine and sg-list helper functions used by
other parts of the crypto driver.

- ablkcipher algorithms
implementation of AES, DES and 3DES crypto API callbacks,
the crypto register alg function, the async request handler
and its dma done callback function.

- SHA and HMAC transforms
implementation and registration of ahash crypto type.
It includes sha1, sha256, hmac(sha1) and hmac(sha256).

- infrastructure to setup the crypto hw
contains functions used to setup/prepare hardware registers for
all algorithms supported by the crypto block. It also exports
few helper functions needed by algorithms:
- to check hardware status
- to start crypto hardware
- to translate data stream to big endian form

Adds register addresses and bit/masks used by the driver
as well.

Signed-off-by: Stanimir Varbanov <svarbanov@mm-sol.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: fips - only panic on bad/missing crypto mod signatures

Per further discussion with NIST, the requirements for FIPS state that
we only need to panic the system on failed kernel module signature checks
for crypto subsystem modules. This moves the fips-mode-only module
signature check out of the generic module loading code, into the crypto
subsystem, at points where we can catch both algorithm module loads and
mode module loads. At the same time, make CONFIG_CRYPTO_FIPS dependent on
CONFIG_MODULE_SIG, as this is entirely necessary for FIPS mode.

v2: remove extraneous blank line, perform checks in static inline
function, drop no longer necessary fips.h include.

CC: "David S. Miller" <davem@davemloft.net>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Stephan Mueller <stephan.mueller@atsec.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Fix error path crash when no firmware is present

Firmware loader crashes when no firmware file is present.

Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Fixed new checkpatch warnings

After updates to checkpatch new warnings pops up this patch fixes them.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Acked-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Updated Firmware Info Metadata

Updated Firmware Info Metadata

Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Fix random config build warnings

Fix random config build warnings:

Implicit-function-declaration ‘__raw_writel’
Cast to pointer from integer of different size [-Wint-to-pointer-cast]

Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - simplify ordering of linked list in drbg_ctr_df

As reported by a static code analyzer, the code for the ordering of
the linked list can be simplified.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: lzo - use kvfree() helper

kvfree() helper is now available, use it instead of open code it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: des3_ede-x86_64 - fix parse warning

Patch fixes following sparse warning:

CHECK arch/x86/crypto/des3_ede_glue.c
arch/x86/crypto/des3_ede_glue.c:308:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:309:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:310:52: warning: restricted __be64 degrades to integer
arch/x86/crypto/des3_ede_glue.c:326:44: warning: restricted __be64 degrades to integer

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - Correct the dma mapping for sg table

At few places in caamhash and caamalg, after allocating a dmable
buffer for sg table , the buffer was being modified. As per
definition of DMA_FROM_DEVICE ,afer allocation the memory should
be treated as read-only by the driver. This patch shifts the
allocation of dmable buffer for sg table after it is populated
by the driver, making it read-only as per the DMA API's requirement.

Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - Add definition of rd/wr_reg64 for little endian platform

CAAM IP has certain 64 bit registers . 32 bit architectures cannot force
atomic-64 operations. This patch adds definition of these atomic-64
operations for little endian platforms. The definitions which existed
previously were for big endian platforms.

Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - Configuration for platforms with virtualization enabled in CAAM

For platforms with virtualization enabled

    1. The job ring registers can be written to only is the job ring has been
       started i.e STARTR bit in JRSTART register is 1

    2. For DECO's under direct software control, with virtualization enabled
       PL, BMT, ICID and SDID values need to be provided. These are provided by
       selecting a Job ring in start mode whose parameters would be used for the
       DECO access programming.

Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - Correct definition of registers in memory map

Some registers like SECVID, CHAVID, CHA Revision Number,
CTPR were defined as 64 bit resgisters. The IP provides
a DWT bit(Double word Transpose) to transpose the two words when
a double word register is accessed. However setting this bit
would also affect the operation of job descriptors as well as
other registers which are truly double word in nature.
So, for the IP to work correctly on big-endian as well as
little-endian SoC's, change is required to access all 32 bit
registers as 32 bit quantities.

Signed-off-by: Ruchika Gupta <ruchika.gupta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Fix build problem with O=

qat adds -I to the ccflags. Unfortunately it uses CURDIR which
breaks when make is invoked with O=. This patch replaces CURDIR
with $(src) which should work with/without O=.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - add 4 more test vectors for GHASH

This adds 4 test vectors for GHASH (of which one for chunked mode), making
a total of 5.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: aes - AES CTR x86_64 "by8" AVX optimization

This patch introduces "by8" AES CTR mode AVX optimization inspired by
Intel Optimized IPSEC Cryptograhpic library. For additional information,
please see:
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=22972

The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
aes_ctr_enc_256_avx_by8() are adapted from
Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
are enabled in a platform, the glue code in AESNI module overrieds the
existing "by4" CTR mode en/decryption with the "by8"
AES CTR mode en/decryption.

On a Haswell desktop, with turbo disabled and all cpus running
at maximum frequency, the "by8" CTR mode optimization
shows better performance results across data & key sizes
as measured by tcrypt.

The average performance improvement of the "by8" version over the "by4"
version is as follows:

For 128 bit key and data sizes >= 256 bytes, there is a 10-16% improvement.
For 192 bit key and data sizes >= 256 bytes, there is a 20-22% improvement.
For 256 bit key and data sizes >= 256 bytes, there is a 20-25% improvement.

A typical run of tcrypt with AES CTR mode encryption of the "by4" and "by8"
optimization shows the following results:

tcrypt with "by4" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 bytes)

tcrypt with "by8" AES CTR mode encryption optimization on a Haswell Desktop:
---------------------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 539 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1153 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8458 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 353 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 360 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 512 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1277 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8745 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 348 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 335 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 451 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1030 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 6611 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 354 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 346 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 488 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1154 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 8390 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 357 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 362 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 515 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1284 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 8681 cycles (8192 bytes)

crypto: Incorporate feed back to AES CTR mode optimization patch

Specifically, the following:
a) alignment around main loop in aes_ctrby8_avx_x86_64.S
b) .rodata around data constants used in the assembely code.
c) the use of CONFIG_AVX in the glue code.
d) fix up white space.
e) informational message for "by8" AES CTR mode optimization
f) "by8" AES CTR mode optimization can be simply enabled
if the platform supports both AES and AVX features. The
optimization works superbly on Sandybridge as well.

Testing on Haswell shows no performance change since the last.

Testing on Sandybridge shows that the "by8" AES CTR mode optimization
greatly improves performance.

tcrypt log with "by4" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 408 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 707 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1864 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12813 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 395 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 432 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 780 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2132 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15765 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 416 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 438 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 842 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2383 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16945 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 389 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 409 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 704 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1865 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 12783 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 409 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 434 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 792 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 2151 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 15804 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 421 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 444 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 840 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 2394 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 16928 cycles (8192 bytes)

tcrypt log with "by8" AES CTR mode optimization on Sandybridge
--------------------------------------------------------------

testing speed of __ctr-aes-aesni encryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 383 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 401 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 522 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1136 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7046 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 394 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 418 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 559 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1263 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9072 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 408 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 428 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1385 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9224 cycles (8192 bytes)

testing speed of __ctr-aes-aesni decryption
test 0 (128 bit key, 16 byte blocks): 1 operation in 390 cycles (16 bytes)
test 1 (128 bit key, 64 byte blocks): 1 operation in 402 cycles (64 bytes)
test 2 (128 bit key, 256 byte blocks): 1 operation in 530 cycles (256 bytes)
test 3 (128 bit key, 1024 byte blocks): 1 operation in 1135 cycles (1024 bytes)
test 4 (128 bit key, 8192 byte blocks): 1 operation in 7079 cycles (8192 bytes)
test 5 (192 bit key, 16 byte blocks): 1 operation in 414 cycles (16 bytes)
test 6 (192 bit key, 64 byte blocks): 1 operation in 417 cycles (64 bytes)
test 7 (192 bit key, 256 byte blocks): 1 operation in 572 cycles (256 bytes)
test 8 (192 bit key, 1024 byte blocks): 1 operation in 1312 cycles (1024 bytes)
test 9 (192 bit key, 8192 byte blocks): 1 operation in 9073 cycles (8192 bytes)
test 10 (256 bit key, 16 byte blocks): 1 operation in 415 cycles (16 bytes)
test 11 (256 bit key, 64 byte blocks): 1 operation in 454 cycles (64 bytes)
test 12 (256 bit key, 256 byte blocks): 1 operation in 598 cycles (256 bytes)
test 13 (256 bit key, 1024 byte blocks): 1 operation in 1407 cycles (1024 bytes)
test 14 (256 bit key, 8192 byte blocks): 1 operation in 9288 cycles (8192 bytes)

crypto: Fix redundant checks

a) Fix the redundant check for cpu_has_aes
b) Fix the key length check when invoking the CTR mode "by8"
encryptor/decryptor.

crypto: fix typo in AES ctr mode transform

Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
Reviewed-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: des_3des - add x86-64 assembly implementation

Patch adds x86_64 assembly implementation of Triple DES EDE cipher algorithm.
Two assembly implementations are provided. First is regular 'one-block at
time' encrypt/decrypt function. Second is 'three-blocks at time' function that
gains performance increase on out-of-order CPUs.

tcrypt test results:

Intel Core i5-4570:

des3_ede-asm vs des3_ede-generic:
size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B     1.21x   1.22x   1.27x   1.36x   1.25x   1.25x
64B     1.98x   1.96x   1.23x   2.04x   2.01x   2.00x
256B    2.34x   2.37x   1.21x   2.40x   2.38x   2.39x
1024B   2.50x   2.47x   1.22x   2.51x   2.52x   2.51x
8192B   2.51x   2.53x   1.21x   2.56x   2.54x   2.55x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: tcrypt - add ctr(des3_ede) sync speed test

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - remove duplicate FIFOST_CONT_MASK define

The FIFOST_CONT_MASK define is cut and pasted twice so we can delete the
second instance.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: crc32c-pclmul - Shrink K_table to 32-bit words

There's no need for the K_table to be made of 64-bit words.  For some
reason, the original authors didn't fully reduce the values modulo the
CRC32C polynomial, and so had some 33-bit values in there.  They can
all be reduced to 32 bits.

Doing that cuts the table size in half.  Since the code depends on both
pclmulq and crc32, SSE 4.1 is obviously present, so we can use pmovzxdq
to fetch it in the correct format.

This adds (measured on Ivy Bridge) 1 cycle per main loop iteration
(CRC of up to 3K bytes), less than 0.2%.  The hope is that the reduced
D-cache footprint will make up the loss in other code.

Two other related fixes:
* K_table is read-only, so belongs in .rodata, and
* There's no need for more than 8-byte alignment

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Update to makefiles

Update to makefiles etc.
Don't update the firmware/Makefile yet since there is no FW binary in
the crypto repo yet. This will be added later.

v3 - removed change to ./firmware/Makefile

Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Intel(R) QAT DH895xcc accelerator

This patch adds DH895xCC hardware specific code.
It hooks to the common infrastructure and provides acceleration for crypto
algorithms.

Acked-by: John Griffin <john.griffin@intel.com>
Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Intel(R) QAT accelengine part of fw loader

This patch adds acceleration engine handler part the firmware loader.

Acked-by: Bo Cui <bo.cui@intel.com>
Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Karen Xiang <karen.xiang@intel.com>
Signed-off-by: Pingchaox Yang <pingchaox.yang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Intel(R) QAT ucode part of fw loader

This patch adds microcode part of the firmware loader.

v4 - splits FW loader part into two smaller patches.

Acked-by: Bo Cui <bo.cui@intel.com>
Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Karen Xiang <karen.xiang@intel.com>
Signed-off-by: Pingchaox Yang <pingchaox.yang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Intel(R) QAT crypto interface

This patch adds qat crypto interface.

Acked-by: John Griffin <john.griffin@intel.com>
Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Intel(R) QAT FW interface

This patch adds FW interface structure definitions.

Acked-by: John Griffin <john.griffin@intel.com>
Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Intel(R) QAT transport code

This patch adds a code that implements communication channel between the
driver and the firmware.

Acked-by: John Griffin <john.griffin@intel.com>
Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: qat - Intel(R) QAT driver framework

This patch adds a common infractructure that will be used by all Intel(R)
QuickAssist Technology (QAT) devices.

v2 - added ./drivers/crypto/qat/Kconfig and ./drivers/crypto/qat/Makefile
v4 - splits common part into more, smaller patches

Acked-by: John Griffin <john.griffin@intel.com>
Reviewed-by: Bruce W. Allan <bruce.w.allan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Add platform device support for arm64

Add support for the CCP on arm64 as a platform device.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - CCP device bindings documentation

This patch provides the documentation of the device bindings
for the AMD Cryptographic Coprocessor driver.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: ccp - Modify PCI support in prep for arm64 support

Modify the PCI device support in prep for supporting the
CCP as a platform device for arm64.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - Add DRBG test code to testmgr

The DRBG test code implements the CAVS test approach.

As discussed for the test vectors, all DRBG types are covered with
testing. However, not every backend cipher is covered with testing. To
prevent the testmgr from logging missing testing, the NULL test is
registered for all backend ciphers not covered with specific test cases.

All currently implemented DRBG types and backend ciphers are defined
in SP800-90A. Therefore, the fips_allowed flag is set for all.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - DRBG testmgr test vectors

All types of the DRBG (CTR, HMAC, Hash) are covered with test vectors.
In addition, all permutations of use cases of the DRBG are covered:

        * with and without predition resistance
        * with and without additional information string
        * with and without personalization string

As the DRBG implementation is agnositc of the specific backend cipher,
only test vectors for one specific backend cipher is used. For example:
the Hash DRBG uses the same code paths irrespectively of using SHA-256
or SHA-512. Thus, the test vectors for SHA-256 cover the testing of all
DRBG code paths of SHA-512.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - compile the DRBG code

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - DRBG kernel configuration options

The different DRBG types of CTR, Hash, HMAC can be enabled or disabled
at compile time. At least one DRBG type shall be selected.

The default is the HMAC DRBG as its code base is smallest.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - header file for DRBG

The header file includes the definition of:

* DRBG data structures with
        - struct drbg_state as main structure
        - struct drbg_core referencing the backend ciphers
        - struct drbg_state_ops callbach handlers for specific code
          supporting the Hash, HMAC, CTR DRBG implementations
        - struct drbg_conc defining a linked list for input data
        - struct drbg_test_data holding the test "entropy" data for CAVS
          testing and testmgr.c
        - struct drbg_gen allowing test data, additional information
          string and personalization string data to be funneled through
          the kernel crypto API -- the DRBG requires additional
          parameters when invoking the reset and random number
          generation requests than intended by the kernel crypto API

* wrapper function to the kernel crypto API functions using struct
  drbg_gen to pass through all data needed for DRBG

* wrapper functions to kernel crypto API functions usable for testing
  code to inject test_data into the DRBG as needed by CAVS testing and
  testmgr.c.

* DRBG flags required for the operation of the DRBG and for selecting
  the particular DRBG type and backend cipher

* getter functions for data from struct drbg_core

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drbg - SP800-90A Deterministic Random Bit Generator

This is a clean-room implementation of the DRBG defined in SP800-90A.
All three viable DRBGs defined in the standard are implemented:

* HMAC: This is the leanest DRBG and compiled per default
* Hash: The more complex DRBG can be enabled at compile time
* CTR: The most complex DRBG can also be enabled at compile time

The DRBG implementation offers the following:

* All three DRBG types are implemented with a derivation function.
* All DRBG types are available with and without prediction resistance.
* All SHA types of SHA-1, SHA-256, SHA-384, SHA-512 are available for
the HMAC and Hash DRBGs.
* All AES types of AES-128, AES-192 and AES-256 are available for the
CTR DRBG.
* A self test is implemented with drbg_healthcheck().
* The FIPS 140-2 continuous self test is implemented.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: drivers - Add 2 missing __exit_p

References to __exit functions must be wrapped with __exit_p.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Marcelo Henrique Cerri <mhcerri@linux.vnet.ibm.com>
Cc: Fionnuala Gunter <fin@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: caam - Introduce the use of the managed version of kzalloc

This patch moves data allocated using kzalloc to managed data allocated
using devm_kzalloc and cleans now unnecessary kfrees in probe and remove
functions.  Also, linux/device.h is added to make sure the devm_*()
routine declarations are unambiguously available. Earlier, in the probe
function ctrlpriv was leaked on the failure of ctrl = of_iomap(nprop, 0);
as well as on the failure of ctrlpriv->jrpdev = kzalloc(...); . These
two bugs have been fixed by the patch.

The following Coccinelle semantic patch was used for making the change:

identifier p, probefn, removefn;
@@
struct platform_driver p = {
  .probe = probefn,
  .remove = removefn,
};

@prb@
identifier platform.probefn, pdev;
expression e, e1, e2;
@@
probefn(struct platform_device *pdev, ...) {
  <+...
- e = kzalloc(e1, e2)
+ e = devm_kzalloc(&pdev->dev, e1, e2)
  ...
?-kfree(e);
  ...+>
}

@rem depends on prb@
identifier platform.removefn;
expression e;
@@
removefn(...) {
  <...
- kfree(e);
  ...>
}

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: lzo - try kmalloc() before vmalloc()

zswap allocates one LZO context per online cpu.

Using vmalloc() for small (16KB) memory areas has drawback of slowing
down /proc/vmallocinfo and /proc/meminfo reads, TLB pressure and poor
NUMA locality, as default NUMA policy at boot time is to interleave
pages :

edumazet:~# grep lzo /proc/vmallocinfo | head -4
0xffffc90006062000-0xffffc90006067000   20480 lzo_init+0x1b/0x30 pages=4 vmalloc N0=2 N1=2
0xffffc90006067000-0xffffc9000606c000   20480 lzo_init+0x1b/0x30 pages=4 vmalloc N0=2 N1=2
0xffffc9000606c000-0xffffc90006071000   20480 lzo_init+0x1b/0x30 pages=4 vmalloc N0=2 N1=2
0xffffc90006071000-0xffffc90006076000   20480 lzo_init+0x1b/0x30 pages=4 vmalloc N0=2 N1=2

This patch tries a regular kmalloc() and fallback to vmalloc in case
memory is too fragmented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: skcipher - Don't use __crypto_dequeue_request()

Use skcipher_givcrypt_cast(crypto_dequeue_request(queue)) instead, which
does the same thing in much cleaner way. The skcipher_givcrypt_cast()
actually uses container_of() instead of messing around with offsetof()
too.

Signed-off-by: Marek Vasut <marex@denx.de>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: api - Move crypto_yield() to algapi.h

It makes no sense for crypto_yield() to be defined in scatterwalk.h ,
move it into algapi.h as it's an internal function to crypto API.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Linux 3.16-rc1

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix checksumming regressions, from Tom Herbert.

2) Undo unintentional permissions changes for SCTP rto_alpha and
    rto_beta sysfs knobs, from Denial Borkmann.

3) VXLAN, like other IP tunnels, should advertize it's encapsulation
    size using dev->needed_headroom instead of dev->hard_header_len.
    From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: sctp: fix permissions for rto_alpha and rto_beta knobs
  vxlan: Checksum fixes
  net: add skb_pop_rcv_encapsulation
  udp: call __skb_checksum_complete when doing full checksum
  net: Fix save software checksum complete
  net: Fix GSO constants to match NETIF flags
  udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
  vxlan: use dev->needed_headroom instead of dev->hard_header_len
  MAINTAINERS: update cxgb4 maintainer

Merge tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux

Pull more clock framework updates from Mike Turquette:
"This contains the second half the of the clk changes for 3.16.

  They are simply fixes and code refactoring for the OMAP clock drivers.
  The sunxi clock driver changes include splitting out the one
  mega-driver into several smaller pieces and adding support for the A31
  SoC clocks"

* tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits)
  clk: sunxi: document PRCM clock compatible strings
  clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support
  clk: sun6i: Protect SDRAM gating bit
  clk: sun6i: Protect CPU clock
  clk: sunxi: Rework clock protection code
  clk: sunxi: Move the GMAC clock to a file of its own
  clk: sunxi: Move the 24M oscillator to a file of its own
  clk: sunxi: Remove calls to clk_put
  clk: sunxi: document new A31 USB clock compatible
  clk: sunxi: Implement A31 USB clock
  ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies
  CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies
  ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC)
  CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck
  CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)
  dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings
  ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock
  CLK: TI: gate: add composite interface clock to OMAP2 only build
  ARM: OMAP2: clock: add DT boot support for cpufreq_ck
  CLK: TI: OMAP2: add clock init support
  ...

Merge git://git.infradead.org/users/willy/linux-nvme

Pull NVMe update from Matthew Wilcox:
"Mostly bugfixes again for the NVMe driver.  I'd like to call out the
  exported tracepoint in the block layer; I believe Keith has cleared
  this with Jens.

  We've had a few reports from people who're really pounding on NVMe
  devices at scale, hence the timeout changes (and new module
  parameters), hotplug cpu deadlock, tracepoints, and minor performance
  tweaks"

[ Jens hadn't seen that tracepoint thing, but is ok with it - it will
  end up going away when mq conversion happens ]

* git://git.infradead.org/users/willy/linux-nvme: (22 commits)
  NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.
  NVMe: Use Log Page constants in SCSI emulation
  NVMe: Define Log Page constants
  NVMe: Fix hot cpu notification dead lock
  NVMe: Rename io_timeout to nvme_io_timeout
  NVMe: Use last bytes of f/w rev SCSI Inquiry
  NVMe: Adhere to request queue block accounting enable/disable
  NVMe: Fix nvme get/put queue semantics
  NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
  NVMe: Make admin timeout a module parameter
  NVMe: Make iod bio timeout a parameter
  NVMe: Prevent possible NULL pointer dereference
  NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
  NVMe: Update data structures for NVMe 1.2
  NVMe: Enable BUILD_BUG_ON checks
  NVMe: Update namespace and controller identify structures to the 1.1a spec
  NVMe: Flush with data support
  NVMe: Configure support for block flush
  NVMe: Add tracepoints
  NVMe: Protect against badly formatted CQEs
  ...

net: sctp: fix permissions for rto_alpha and rto_beta knobs

Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs
to jiffies conversions.") has silently changed permissions for
rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
this was to discourage users from tweaking rto_alpha and
rto_beta knobs in production environments since they are key
to correctly compute rtt/srtt.

RFC4960 under section 6.3.1. RTO Calculation says regarding
rto_alpha and rto_beta under rule C3 and C4:

  [...]
  C3)  When a new RTT measurement R' is made, set

       RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|

       and

       SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'

       Note: The value of SRTT used in the update to RTTVAR
       is its value before updating SRTT itself using the
       second assignment. After the computation, update
       RTO <- SRTT + 4 * RTTVAR.

  C4)  When data is in flight and when allowed by rule C5
       below, a new RTT measurement MUST be made each round
       trip. Furthermore, new RTT measurements SHOULD be
       made no more than once per round trip for a given
       destination transport address. There are two reasons
       for this recommendation: First, it appears that
       measuring more frequently often does not in practice
       yield any significant benefit [ALLMAN99]; second,
       if measurements are made more often, then the values
       of RTO.Alpha and RTO.Beta in rule C3 above should be
       adjusted so that SRTT and RTTVAR still adjust to
       changes at roughly the same rate (in terms of how many
       round trips it takes them to reflect new values) as
       they would if making only one measurement per
       round-trip and using RTO.Alpha and RTO.Beta as given
       in rule C3. However, the exact nature of these
       adjustments remains a research issue.
  [...]

While it is discouraged to adjust rto_alpha and rto_beta
and not further specified how to adjust them, the RFC also
doesn't explicitly forbid it, but rather gives a RECOMMENDED
default value (rto_alpha=3, rto_beta=2). We have a couple
of users relying on the old permissions before they got
changed. That said, if someone really has the urge to adjust
them, we could allow it with a warning in the log.

Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'csum_fixes'

Tom Herbert says:

====================
Fixes related to some recent checksum modifications.

- Fix GSO constants to match NETIF flags
- Fix logic in saving checksum complete in __skb_checksum_complete
- Call __skb_checksum_complete from UDP if we are checksumming over
whole packet in order to save checksum.
- Fixes to VXLAN to work correctly with checksum complete
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Checksum fixes

Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet
header to work properly with checksum complete.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add skb_pop_rcv_encapsulation

This function is used by UDP encapsulation protocols in RX when
crossing encapsulation boundary. If ip_summed is set to
CHECKSUM_UNNECESSARY and encapsulation is not set, change to
CHECKSUM_NONE since the checksum has not been validated within the
encapsulation. Clears csum_valid by the same rationale.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: call __skb_checksum_complete when doing full checksum

In __udp_lib_checksum_complete check if checksum is being done over all
the data (len is equal to skb->len) and if it is call
__skb_checksum_complete instead of __skb_checksum_complete_head. This
allows checksum to be saved in checksum complete.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix save software checksum complete

Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead5172927732f51fde
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix GSO constants to match NETIF flags

Joseph Gasparakis reported that VXLAN GSO offload stopped working with
i40e device after recent UDP changes. The problem is that the
SKB_GSO_* bits are out of sync with the corresponding NETIF flags. This
patch fixes that. Also, we add BUILD_BUG_ONs in net_gso_ok for several
GSO constants that were missing to avoid the problem in the future.

Reported-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
"This is just a couple of drivers (hpsa and lpfc) that got left out for
  further testing in linux-next.  We also have one fix to a prior
  submission (qla2xxx sparse)"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits)
  qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch
  lpfc: Update lpfc version to driver version 10.2.8001.0
  lpfc: Fix ExpressLane priority setup
  lpfc: mark old devices as obsolete
  lpfc: Fix for initializing RRQ bitmap
  lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries
  lpfc: Update lpfc version to driver version 10.2.8000.0
  lpfc: Update Copyright on changed files from 8.3.45 patches
  lpfc: Update Copyright on changed files
  lpfc: Fixed locking for scsi task management commands
  lpfc: Convert runtime references to old xlane cfg param to fof cfg param
  lpfc: Fix FW dump using sysfs
  lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock
  lpfc: Fixed kernel panic in lpfc_abort_handler
  lpfc: Fix locking for postbufq when freeing
  lpfc: Fix locking for lpfc_hba_down_post
  lpfc: Fix dynamic transitions of FirstBurst from on to off
  hpsa: fix handling of hpsa_volume_offline return value
  hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id
  hpsa: remove messages about volume status VPD inquiry page not supported
  ...

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull more btrfs updates from Chris Mason:
"This has a few fixes since our last pull and a new ioctl for doing
  btree searches from userland.  It's very similar to the existing
  ioctl, but lets us return larger items back down to the app"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix error handling in create_pending_snapshot
  btrfs: fix use of uninit "ret" in end_extent_writepage()
  btrfs: free ulist in qgroup_shared_accounting() error path
  Btrfs: fix qgroups sanity test crash or hang
  btrfs: prevent RCU warning when dereferencing radix tree slot
  Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting
  btrfs: new ioctl TREE_SEARCH_V2
  btrfs: tree_search, search_ioctl: direct copy to userspace
  btrfs: new function read_extent_buffer_to_user
  btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW
  btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer
  btrfs: tree_search, search_ioctl: accept varying buffer
  btrfs: tree_search: eliminate redundant nr_items check

Merge git://git.kvack.org/~bcrl/aio-next

Pull aio fix and cleanups from Ben LaHaise:
"This consists of a couple of code cleanups plus a minor bug fix"

* git://git.kvack.org/~bcrl/aio-next:
  aio: cleanup: flatten kill_ioctx()
  aio: report error from io_destroy() when threads race in io_destroy()
  fs/aio.c: Remove ctx parameter in kiocb_cancel

fix __swap_writepage() compile failure on old gcc versions

Tetsuo Handa wrote:
"Commit 62a8067a7f35 ("bio_vec-backed iov_iter") introduced an unnamed
  union inside a struct which gcc-4.4.7 cannot handle.  Name the unnamed
   union as u in order to fix build failure"

Let's do this instead: there is only one place in the entire tree that
steps into this breakage.  Anon structs and unions work in older gcc
versions; as the matter of fact, we have those in the tree - see e.g.
struct ieee80211_tx_info in include/net/mac80211.h

What doesn't work is handling their initializers:

struct {
int a;
union {
int b;
char c;
};
} x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};

is the obvious syntax for initializer, perfectly fine for C11 and
handled correctly by gcc-4.7 or later.

Earlier versions, though, break on it - declaration is fine and so's
access to fields (i.e.  x[0].c = 'a'; would produce the right code), but
members of the anon structs and unions are not inserted into the right
namespace.  Tellingly, those older versions will not barf on struct {int
a; struct {int a;};}; - looks like they just have it hacked up somewhere
around the handling of .  and -> instead of doing the right thing.

The easiest way to deal with that crap is to turn initialization of
those fields (in the only place where we have such initializer of
iov_iter) into plain assignment.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'hsi-for-3.16-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi

Pull HSI build fixes from Sebastian Reichel:
- tighten dependency between ssi-protocol and omap-ssi to fix build
   failures with randconfig.
- use normal module refcounting in omap driver to fix build with
   disabled module support

* tag 'hsi-for-3.16-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
  hsi: omap_ssi_port: use normal module refcounting
  HSI: fix omap ssi driver dependency

Merge tag 'gpio-v3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
"A first GPIO fix for the v3.16 series, this was serious since it
  blocks the OMAP boot.

  Sending you this vital fix before leaving for a short vacation so it
  does not sit collecting dust in my tree for no good reason.

  Apart from this, our v3.16 cycle looks like a good start"

* tag 'gpio-v3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: of: Fix handling for deferred probe for -gpio suffix

Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 vdso fixes from Peter Anvin:
"Fixes for x86/vdso.

  One is a simple build fix for bigendian hosts, one is to make "make
  vdso_install" work again, and the rest is about working around a bug
  in Google's Go language -- two are documentation patches that improves
  the sample code that the Go coders took, modified, and broke; the
  other two implements a workaround that keeps existing Go binaries from
  segfaulting at least"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix vdso_install
  x86/vdso: Hack to keep 64-bit Go programs working
  x86/vdso: Add PUT_LE to store little-endian values
  x86/vdso/doc: Make vDSO examples more portable
  x86/vdso/doc: Rename vdso_test.c to vdso_standalone_test_x86.c
  x86, vdso: Remove one final use of htole16()

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon updates from Guenter Roeck:
- new driver for Sensirion SHTC1 humidity / temperature sensor
- convert ltc4151 and vexpress drivers to use devm functions
- drop generic chip detection from lm85 driver
- avoid forward declarations in atxp1 driver
- fix sign extensions in ina2xx driver

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: vexpress: Use devm helper for hwmon device registration
  hwmon: (atxp1) Avoid forward declaration
  hwmon: add support for Sensirion SHTC1 sensor
  hwmon: (ltc4151) Convert to devm_hwmon_device_register_with_groups
  hwmon: (lm85) Drop generic detection
  hwmon: (ina2xx) Cast to s16 on shunt and current regs

udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup

Its too easy to add thousand of UDP sockets on a particular bucket,
and slow down an innocent multicast receiver.

Early demux is supposed to be an optimization, we should avoid spending
too much time in it.

It is interesting to note __udp4_lib_demux_lookup() only tries to
match first socket in the chain.

10 is the threshold we already have in __udp4_lib_lookup() to switch
to secondary hash.

Fixes: 421b3885bf6d5 ("udp: ipv4: Add udp early demux")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Held <drheld@google.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: use dev->needed_headroom instead of dev->hard_header_len

When we mirror packets from a vxlan tunnel to other device,
the mirror device should see the same packets (that is, without
outer header). Because vxlan tunnel sets dev->hard_header_len,
tcf_mirred() resets mac header back to outer mac, the mirror device
actually sees packets with outer headers

Vxlan tunnel should set dev->needed_headroom instead of
dev->hard_header_len, like what other ip tunnels do. This fixes
the above problem.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: stephen hemminger <stephen@networkplumber.org>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: update cxgb4 maintainer

Hari's been doing the patch submissions for a while now and he'll be
taking over as maintainer.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86/vdso: Fix vdso_install

"make vdso_install" installs unstripped versions of the vdso objects
for the benefit of the debugger. This was broken by checkin:

6f121e548f83 x86, vdso: Reimplement vdso.so preparation in build-time C

The filenames are different now, so update the Makefile to cope.

This still installs the 64-bit vdso as vdso64.so. We believe this
will be okay, as the only known user is a patched gdb which is known
to use build-ids, but if it turns out to be a problem we may have to
add a link.

Inspired by a patch from Sam Ravnborg.

Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/b10299edd8ba98d17e07dafcd895b8ecf4d99eff.1402586707.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.

This patch contains several fixes for Scsi START_STOP_UNIT. The previous
code did not account for signed vs. unsigned arithmetic which resulted
in an invalid lowest power state caculation when the device only supports
1 power state.

The code for Power Condition == 2 (Idle) was not following the spec. The
spec calls for setting the device to specific power states, depending
upon Power Condition Modifier, without accounting for the number of
power states supported by the device.

The code for Power Condition == 3 (Standby) was using a hard-coded '0'
which is replaced with the macro POWER_STATE_0.

Signed-off-by: Dan McLeran <daniel.mcleran@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>

btrfs: fix error handling in create_pending_snapshot

fcebe456 cut and pasted some code to a later point
in create_pending_snapshot(), but didn't switch
to the appropriate error handling for this stage
of the function.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: fix use of uninit "ret" in end_extent_writepage()

If this condition in end_extent_writepage() is false:

if (tree->ops && tree->ops->writepage_end_io_hook)

we will then test an uninitialized "ret" at:

ret = ret < 0 ? ret : -EIO;

The test for ret is for the case where ->writepage_end_io_hook
failed, and we'd choose that ret as the error; but if
there is no ->writepage_end_io_hook, nothing sets ret.

Initializing ret to 0 should be sufficient; if
writepage_end_io_hook wasn't set, (!uptodate) means
non-zero err was passed in, so we choose -EIO in that case.

Signed-of-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: free ulist in qgroup_shared_accounting() error path

If tmp = ulist_alloc(GFP_NOFS) fails, we return without
freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS)
and cause a memory leak.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix qgroups sanity test crash or hang

Often when running the qgroups sanity test, a crash or a hang happened.
This is because the extent buffer the test uses for the root node doesn't
have an header level explicitly set, making it have a random level value.
This is a problem when it's not zero for the btrfs_search_slot() calls
the test ends up doing, resulting in crashes or hangs such as the following:

[ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on
(...)
[ 6454.127760] BTRFS: selftest: Running qgroup tests
[ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup
[ 6454.127966] BTRFS: selftest: Qgroup basic add
[ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383]
[ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs]
[ 6480.152005] irq event stamp: 188448
[ 6480.152005] hardirqs last  enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30
[ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80
[ 6480.152005] softirqs last  enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450
[ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0
[ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4
[ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000
[ 6480.152005] RIP: 0010:[<ffffffff81349a63>]  [<ffffffff81349a63>] __write_lock_failed+0x13/0x20
[ 6480.152005] RSP: 0018:ffff8800d0d038e8  EFLAGS: 00000287
[ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852
[ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8
[ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb
[ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858
[ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0
[ 6480.152005] FS:  00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000
[ 6480.152005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0
[ 6480.152005] Stack:
[ 6480.152005]  ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8
[ 6480.152005]  ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007
[ 6480.152005]  ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16
[ 6480.152005] Call Trace:
[ 6480.152005]  [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0
[ 6480.152005]  [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80
[ 6480.152005]  [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs]
[ 6480.152005]  [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs]
[ 6480.152005]  [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 6480.152005]  [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs]
[ 6480.152005]  [<ffffffff811a727f>] ? create_object+0x23f/0x300
[ 6480.152005]  [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[ 6480.152005]  [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs]
[ 6480.152005]  [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs]
[ 6480.152005]  [<ffffffff8108cad6>] ? local_clock+0x16/0x30
[ 6480.152005]  [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs]
[ 6480.152005]  [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs]
[ 6480.152005]  [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs]
[ 6480.152005]  [<ffffffff81000352>] do_one_initcall+0x102/0x150
[ 6480.152005]  [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50
[ 6480.152005]  [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74
[ 6480.152005]  [<ffffffff810d91cc>] load_module+0x1cdc/0x2630
(...)

Therefore initialize the extent buffer as an empty leaf (level 0).

Issue easy to reproduce when btrfs is built as a module via:

    $ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: prevent RCU warning when dereferencing radix tree slot

Mark the dereference as protected by lock. Not doing so triggers
an RCU warning since the radix tree assumed that RCU is in use.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting

Steps to reproduce:

# mkfs.btrfs -f /dev/sd[b-f] -m raid5 -d raid5
# mkfs.ext4 /dev/sdc --->corrupt one of btrfs device
# mount /dev/sdb /mnt -o degraded
# btrfs scrub start -BRd /mnt

This is because readahead would skip missing device, this is not true
for RAID5/6, because REQ_GET_READ_MIRRORS return 1 for RAID5/6 block
mapping. If expected data locates in missing device, readahead thread
would not call __readahead_hook() which makes event @rc->elems=0
wait forever.

Fix this problem by checking return value of btrfs_map_block(),we
can only skip missing device safely if there are several mirrors.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: new ioctl TREE_SEARCH_V2

This new ioctl call allows the user to supply a buffer of varying size in which
a tree search can store its results. This is much more flexible if you want to
receive items which are larger than the current fixed buffer of 3992 bytes or
if you want to fetch more items at once. Items larger than this buffer are for
example some of the type EXTENT_CSUM.

Signed-off-by: Gerhard Heift <Gerhard@Heift.Name>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: David Sterba <dsterba@suse.cz>