tpm_tis: Introduce intermediate layer for TPM access
This splits tpm_tis in a high-level protocol part and a low-level interface
for the actual TPM communication. The low-level interface can then be
implemented by additional drivers to provide access to TPMs using other
mechanisms, for example native I2C or SPI transfers, while still reusing
the same TIS protocol implementation.
Though the ioread/iowrite calls cannot fail, other implementations of this
interface might want to return error codes if their communication fails.
This follows the usual pattern of negative values representing errors and
zero representing success. Positive values are not used (yet).
Errors are passed back to the caller if possible. If the interface of a
function does not allow that, it tries to do the most sensible thing it
can, but this might also mean ignoring the error in this instance.
This commit is based on the initial work by Peter Huewe.
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Split priv_data structure in common and phy specific structures. This will
allow in future patches to reuse the same tis logic on top of new phy such
as spi and i2c. Ultimately, other drivers may reuse this tis logic.
(e.g: st33zp24...)
iobase field is specific to TPM addressed on 0xFED4xxxx on LPC/SPI bus.
This commit is based on the initial work by Peter Huewe.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Peter Huewe <peter.huewe@infineon.com> Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Stefan Berger [Wed, 11 May 2016 15:28:27 +0000 (11:28 -0400)]
tpm: Fix suspend regression
Fix the suspend regression due to the wrong way of retrieving the
chip structure. The suspend functions are attached to the hardware
device, not the chip and thus must rely on drvdata.
Fixes: e89f8b1ade9cc1a ("tpm: Remove all uses of drvdata from the TPM Core") Reported-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Jeremiah Mahler <jmmahler@gmail.com> Acked-by: Jarkko Sakkinen <jarkko.sakkine@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkine@linux.intel.com>
Stephen Rothwell [Thu, 28 Apr 2016 05:27:17 +0000 (15:27 +1000)]
tpm: fix for typo in tpm/tpm_ibmvtpm.c
Fixes: 28157164b056 ("tpm: Remove useless priv field in struct tpm_vendor_specific") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Wed, 27 Apr 2016 16:58:46 +0000 (10:58 -0600)]
tpm: Fix IRQ unwind ordering in TIS
The devm for the IRQ was placed on the chip, not the pdev. This can
cause the irq to be still callable after the pdev has been cleaned up
(eg priv kfree'd).
Found by CONFIG_DEBUG_SHIRQ=y
Reported-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Fixes: 233a065e0cd0 ("tpm: Get rid of chip->pdev") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Stefan Berger [Mon, 18 Apr 2016 17:26:15 +0000 (13:26 -0400)]
tpm: Proxy driver for supporting multiple emulated TPMs
This patch implements a proxy driver for supporting multiple emulated TPMs
in a system.
The driver implements a device /dev/vtpmx that is used to created
a client device pair /dev/tpmX (e.g., /dev/tpm10) and a server side that
is accessed using a file descriptor returned by an ioctl.
The device /dev/tpmX is the usual TPM device created by the core TPM
driver. Applications or kernel subsystems can send TPM commands to it
and the corresponding server-side file descriptor receives these
commands and delivers them to an emulated TPM.
The driver retrievs the TPM 1.2 durations and timeouts. Since this requires
the startup of the TPM, we send a startup for TPM 1.2 as well as TPM 2.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> CC: linux-kernel@vger.kernel.org CC: linux-doc@vger.kernel.org CC: linux-api@vger.kernel.org Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Stefan Berger [Mon, 18 Apr 2016 17:26:14 +0000 (13:26 -0400)]
tpm: Introduce TPM_CHIP_FLAG_VIRTUAL
Introduce TPM_CHIP_FLAG_VIRTUAL to be used when the chip device has no
parent device.
Prevent sysfs entries requiring a parent device from being created.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Mon, 18 Apr 2016 17:26:13 +0000 (13:26 -0400)]
tpm: Remove all uses of drvdata from the TPM Core
The final thing preventing this was the way the sysfs files were
attached to the pdev. Follow the approach developed for ppi and move
the sysfs files to the chip->dev with symlinks from the pdev
for compatibility. Everything in the core now sanely uses container_of
to get the chip.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
tpm: Remove useless priv field in struct tpm_vendor_specific
Remove useless priv field in struct tpm_vendor_specific and take benefit
of chip->dev.driver_data. As priv is the latest field available in
struct tpm_vendor_specific, remove any reference to that structure.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
tpm: Move tpm_vendor_specific data related with PTP specification to tpm_chip
Move tpm_vendor_specific data related to TCG PTP specification to tpm_chip.
Move all fields directly linked with well known TCG concepts and used in
TPM drivers (tpm_i2c_atmel, tpm_i2c_infineon, tpm_i2c_nuvoton, tpm_tis
and xen-tpmfront) as well as in TPM core files (tpm-sysfs, tpm-interface
and tpm2-cmd) in tpm_chip.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jarkko Sakkinen [Wed, 23 Mar 2016 05:31:56 +0000 (07:31 +0200)]
tpm: drop manufacturer_id from struct tpm_vendor_specific
Dropped manufacturer_id from struct tpm_vendor_specific and redeclared
it in the private struct priv_data that tpm_tis uses because the field
is only used tpm_tis.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Jarkko Sakkinen [Wed, 23 Mar 2016 05:10:22 +0000 (07:10 +0200)]
tpm: drop tpm_atmel specific fields from tpm_vendor_specific
Introduced a private struct tpm_atmel_priv that contains the variables
have_region and region_size that were previously located in struct
tpm_vendor_specific. These fields were only used by tpm_atmel.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Jarkko Sakkinen [Tue, 22 Mar 2016 04:20:09 +0000 (06:20 +0200)]
tpm: drop int_queue from tpm_vendor_specific
Drop field int_queue from tpm_vendor_specific as it is used only by
tpm_tis. Probably all of the fields should be eventually dropped and
moved to the private structures of different drivers but it is better to
do this one step at a time in order not to break anything.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Jarkko Sakkinen [Mon, 25 Apr 2016 09:20:07 +0000 (12:20 +0300)]
tpm: check for TPM_CHIP_FLAG_TPM2 before calling tpm2_shutdown()
Fixes: 20e0152393b41 ("tpm: fix crash in tpm_tis deinitialization") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reported-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Jarkko Sakkinen [Tue, 19 Apr 2016 09:54:18 +0000 (12:54 +0300)]
tpm_crb: fix mapping of the buffers
On my Lenovo x250 the following situation occurs:
[18697.813871] tpm_crb MSFT0101:00: can't request region for resource
[mem 0xacdff080-0xacdfffff]
The mapping of the control area overlaps the mapping of the command
buffer. The control area is mapped over page, which is not right. It
should mapped over sizeof(struct crb_control_area).
Fixing this issue unmasks another issue. Command and response buffers
can overlap and they do interleave on this machine. According to the PTP
specification the overlapping means that they are mapped to the same
buffer.
The commit has been also on a Haswell NUC where things worked before
applying this fix so that the both code paths for response buffer
initialization are tested.
Cc: stable@vger.kernel.org Fixes: 1bd047be37d9 ("tpm_crb: Use devm_ioremap_resource") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
tpm/st33zp24/spi: Drop two useless checks in ACPI probe path
When st33zp24_spi_acpi_request_resources() gets called we
already know that the entries in ->acpi_match_table have matched ACPI ID
of the device.
In addition spi_device pointer cannot be NULL in any case (otherwise I2C
core would not call ->probe() for the driver in the first place).
Drop the two useless checks from the driver.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
tpm/st33zp24/i2c: Drop two useless checks in ACPI probe path
When st33zp24_i2c_acpi_request_resources() gets called we
already know that the entries in ->acpi_match_table have matched ACPI ID
of the device.
In addition I2C client pointer cannot be NULL in any case (otherwise I2C
core would not call ->probe() for the driver in the first place).
Drop the two useless checks from the driver.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jarkko Sakkinen [Tue, 15 Mar 2016 19:41:40 +0000 (21:41 +0200)]
tpm_crb: drop struct resource res from struct crb_priv
The iomem resource is needed only temporarily so it is better to pass
it on instead of storing it permanently. Named the variable as io_res
so that the code better documents itself.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Jarkko Sakkinen [Mon, 11 Apr 2016 11:20:57 +0000 (14:20 +0300)]
tpm: fix crash in tpm_tis deinitialization
rmmod crashes the driver because tpm_chip_unregister() already sets ops
to NULL. This commit fixes the issue by moving tpm2_shutdown() to
tpm_chip_unregister(). This commit is also cleanup because it removes
duplicate code from tpm_crb and tpm_tis to the core.
Fixes: 4d3eac5e156a ("tpm: Provide strong locking for device removal") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Jarkko Sakkinen [Thu, 31 Mar 2016 10:05:36 +0000 (13:05 +0300)]
tpm: cleanup tpm_tis_remove()
Created a local variable pointing to the INT_ENABLE_x register. The
expression clearing INT_ENABLE_x.globalIntEnable is unreadable and
hard to modify without surpassing the 80 char boundary.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Christophe Ricard <christophe-h.ricard@st.com>
Arnd Bergmann [Wed, 16 Mar 2016 08:19:48 +0000 (09:19 +0100)]
tpm: fix tpm_bios_log_setup stub prototype
A cleanup patch changed the prototype of the regular tpm_bios_log_setup
function, but not that of the stub that is used when the TPM is disabled,
causing a harmless build warning:
drivers/char/tpm/tpm-chip.c: In function 'tpm1_chip_register':
drivers/char/tpm/tpm-chip.c:287:38: error: passing argument 1 of 'tpm_bios_log_setup' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
chip->bios_dir = tpm_bios_log_setup(dev_name(&chip->dev));
In file included from ../drivers/char/tpm/tpm-chip.c:30:0:
../drivers/char/tpm/tpm_eventlog.h:83:31: note: expected 'char *' but argument is of type 'const char *'
static inline struct dentry **tpm_bios_log_setup(char *name)
This changes the stub function to match the normal prototype,
avoiding that warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: aca8db8088c3 ("tpm: Get rid of devname") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Stefan Berger [Mon, 29 Feb 2016 13:53:02 +0000 (08:53 -0500)]
tpm: Replace device number bitmap with IDR
Replace the device number bitmap with IDR. Extend the number of devices we
can create to 64k.
Since an IDR allows us to associate a pointer with an ID, we use this now
to rewrite tpm_chip_find_get() to simply look up the chip pointer by the
given device ID.
Protect the IDR calls with a mutex.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Thu, 11 Feb 2016 19:45:48 +0000 (12:45 -0700)]
tpm: Split out the devm stuff from tpmm_chip_alloc
tpm_chip_alloc becomes a typical subsystem allocate call.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Stefan Berger [Mon, 29 Feb 2016 13:53:01 +0000 (08:53 -0500)]
tpm: Get rid of module locking
Now that the tpm core has strong locking around 'ops' it is possible
to remove a TPM driver, module and all, even while user space still
has things like /dev/tpmX open. For consistency and simplicity, drop
the module locking entirely.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Sat, 13 Feb 2016 03:29:53 +0000 (20:29 -0700)]
tpm: Provide strong locking for device removal
Add a read/write semaphore around the ops function pointers so
ops can be set to null when the driver un-registers.
Previously the tpm core expected module locking to be enough to
ensure that tpm_unregister could not be called during certain times,
however that hasn't been sufficient for a long time.
Introduce a read/write semaphore around 'ops' so the core can set
it to null when unregistering. This provides a strong fence around
the driver callbacks, guaranteeing to the driver that no callbacks
are running or will run again.
For now the ops_lock is placed very high in the call stack, it could
be pushed down and made more granular in future if necessary.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Mon, 29 Feb 2016 17:29:48 +0000 (12:29 -0500)]
tpm: Get rid of devname
Now that we have a proper struct device just use dev_name() to
access this value instead of keeping two copies.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Jason Gunthorpe [Mon, 29 Feb 2016 17:29:47 +0000 (12:29 -0500)]
tpm: Get rid of chip->pdev
This is a hold over from before the struct device conversion.
- All prints should be using &chip->dev, which is the Linux
standard. This changes prints to use tpm0 as the device name,
not the PnP/etc ID.
- The few places involving sysfs/modules that really do need the
parent just use chip->dev.parent instead
- We no longer need to get_device(pdev) in any places since it is no
longer used by any of the code. The kref on the parent is held
by the device core during device_add and dropped in device_del
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Tested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Simplify st33zp24_spi_acpi_request_resources, st33zp24_spi_of_request_resources
and st33zp24_spi_request_resources to have the same prototype and using
spi_get_drvdata.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Simplify st33zp24_i2c_acpi_request_resources, st33zp24_i2c_of_request_resources
and st33zp24_i2c_request_resources to have the same prototype and using
i2c_get_clientdata.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Simplify st33zp24_spi_acpi_request_resources, st33zp24_spi_of_request_resources
and st33zp24_spi_request_resources to have the same prototype and using
spi_get_drvdata.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Simplify st33zp24_i2c_acpi_request_resources, st33zp24_i2c_of_request_resources
and st33zp24_i2c_request_resources to have the same prototype and using
i2c_get_clientdata.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
The core st33zp24 module is useless without either the I2C or the
SPI access module. So hide NFC_ST_NCI and select it automatically
if either TCG_TIS_ST33ZP24_I2C or TCG_TIS_ST33ZP24_SPI is selected.
This avoids presenting TCG_TIS_ST33ZP24 when neither TCG_TIS_ST33ZP24_I2C
nor TCG_TIS_ST33ZP24_SPI can be selected.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Kees Cook [Fri, 3 Jun 2016 02:59:42 +0000 (19:59 -0700)]
um/ptrace: run seccomp after ptrace
Close the hole where ptrace can change a syscall out from under seccomp.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: user-mode-linux-devel@lists.sourceforge.net
Kees Cook [Thu, 2 Jun 2016 02:29:15 +0000 (19:29 -0700)]
seccomp: recheck the syscall after RET_TRACE
When RET_TRACE triggers, a tracer may change a syscall into something that
should be filtered by seccomp. This re-runs seccomp after a trace event
to make sure things continue to pass.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org>
Andy Lutomirski [Fri, 27 May 2016 20:08:59 +0000 (13:08 -0700)]
x86/entry: Get rid of two-phase syscall entry work
I added two-phase syscall entry work back when the entry slow path
was very slow. Nowadays, the entry slow path is fast and two-phase
entry work serves no purpose. Remove it.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
Andy Lutomirski [Fri, 27 May 2016 19:57:02 +0000 (12:57 -0700)]
seccomp: Add a seccomp_data parameter secure_computing()
Currently, if arch code wants to supply seccomp_data directly to
seccomp (which is generally much faster than having seccomp do it
using the syscall_get_xyz() API), it has to use the two-phase
seccomp hooks. Add it to the easy hooks, too.
Cc: linux-arch@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
Kees Cook [Thu, 26 May 2016 18:47:01 +0000 (11:47 -0700)]
seccomp: add tests for ptrace hole
One problem with seccomp was that ptrace could be used to change a
syscall after seccomp filtering had completed. This was a well documented
limitation, and it was recommended to block ptrace when defining a filter
to avoid this problem. This can be quite a limitation for containers or
other places where ptrace is desired even under seccomp filters.
This adds tests for both SECCOMP_RET_TRACE and PTRACE_SYSCALL manipulations.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org>
Tyler Hicks [Fri, 3 Jun 2016 04:43:22 +0000 (23:43 -0500)]
net: Use ns_capable_noaudit() when determining net sysctl permissions
The capability check should not be audited since it is only being used
to determine the inode permissions. A failed check does not indicate a
violation of security policy but, when an LSM is enabled, a denial audit
message was being generated.
The denial audit message caused confusion for some application authors
because root-running Go applications always triggered the denial. To
prevent this confusion, the capability check in net_ctl_permissions() is
switched to the noaudit variant.
BugLink: https://launchpad.net/bugs/1465724 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
Tyler Hicks [Fri, 3 Jun 2016 04:43:21 +0000 (23:43 -0500)]
kernel: Add noaudit variant of ns_capable()
When checking the current cred for a capability in a specific user
namespace, it isn't always desirable to have the LSMs audit the check.
This patch adds a noaudit variant of ns_capable() for when those
situations arise.
The common logic between ns_capable() and the new ns_capable_noaudit()
is moved into a single, shared function to keep duplicated code to a
minimum and ease maintainability.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
Casey Schaufler [Wed, 1 Jun 2016 00:24:15 +0000 (17:24 -0700)]
LSM: Fix for security_inode_getsecurity and -EOPNOTSUPP
Serge Hallyn pointed out that the current implementation of
security_inode_getsecurity() works if there is only one hook
provided for it, but will fail if there is more than one and
the attribute requested isn't supplied by the first module.
This isn't a problem today, since only SELinux and Smack
provide this hook and there is (currently) no way to enable
both of those modules at the same time. Serge, however, wants
to introduce a capability attribute and an inode_getsecurity
hook in the capability security module to handle it. This
addresses that upcoming problem, will be required for "extreme
stacking" and is just a better implementation.
Linus Torvalds [Sun, 5 Jun 2016 18:15:33 +0000 (11:15 -0700)]
Merge branch 'parisc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Fix printk time stamps on SMP systems which got wrong due to a patch
which was added during the merge window
- Fix two bugs in the stack backtrace code: Races in module unloading
and possible invalid accesses to memory due to wrong instruction
decoding (Mikulas Patocka)
- Fix userspace crash when syscalls access invalid unaligned userspace
addresses. Those syscalls will now return EFAULT as expected.
(tagged for stable kernel series)
* 'parisc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Move die_if_kernel() prototype into traps.h header
parisc: Fix pagefault crash in unaligned __get_user() call
parisc: Fix printk time during boot
parisc: Fix backtrace on PA-RISC
Linus Torvalds [Sun, 5 Jun 2016 18:02:00 +0000 (11:02 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key handling update from James Morris:
"This alters a new keyctl function added in the current merge window to
allow for a future extension planned for the next merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: Add placeholder for KDF usage with DH
devpts: Make each mount of devpts an independent filesystem.
The /dev/ptmx device node is changed to lookup the directory entry "pts"
in the same directory as the /dev/ptmx device node was opened in. If
there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx
uses that filesystem. Otherwise the open of /dev/ptmx fails.
The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that
userspace can now safely depend on each mount of devpts creating a new
instance of the filesystem.
Each mount of devpts is now a separate and equal filesystem.
Reserved ttys are now available to all instances of devpts where the
mounter is in the initial mount namespace.
A new vfs helper path_pts is introduced that finds a directory entry
named "pts" in the directory of the passed in path, and changes the
passed in path to point to it. The helper path_pts uses a function
path_parent_directory that was factored out of follow_dotdot.
In the implementation of devpts:
- devpts_mnt is killed as it is no longer meaningful if all mounts of
devpts are equal.
- pts_sb_from_inode is replaced by just inode->i_sb as all cached
inodes in the tty layer are now from the devpts filesystem.
- devpts_add_ref is rolled into the new function devpts_ptmx. And the
unnecessary inode hold is removed.
- devpts_del_ref is renamed devpts_release and reduced to just a
deacrivate_super.
- The newinstance mount option continues to be accepted but is now
ignored.
In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as
they are never used.
Documentation/filesystems/devices.txt is updated to describe the current
situation.
This has been verified to work properly on openwrt-15.05, centos5,
centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3,
ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1,
slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. With the
caveat that on centos6 and on slackware-14.1 that there wind up being
two instances of the devpts filesystem mounted on /dev/pts, the lower
copy does not end up getting used.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This means the userspace program clock_adjtime called the clock_adjtime()
syscall and then crashed inside the compat_get_timex() function.
Syscalls should never crash programs, but instead return EFAULT.
The IIR register contains the executed instruction, which disassebles
into "ldw 0(sr3,r5),r9".
This load-word instruction is part of __get_user() which tried to read the word
at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The
unaligned handler is able to emulate all ldw instructions, but it fails if it
fails to read the source e.g. because of page fault.
int main(void) {
/* allocate 8k */
char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
/* free second half (upper 4k) and make it invalid. */
munmap(ptr+4096, 4096);
/* syscall where first int is unaligned and clobbers into invalid memory region */
/* syscall should return EFAULT */
return syscall(__NR_clock_adjtime, 0, ptr+4095);
}
To fix this issue we simply need to check if the faulting instruction address
is in the exception fixup table when the unaligned handler failed. If it
is, call the fixup routine instead of crashing.
While looking at the unaligned handler I found another issue as well: The
target register should not be modified if the handler was unsuccessful.
Mikulas Patocka [Tue, 28 Jun 2011 22:48:19 +0000 (00:48 +0200)]
parisc: Fix backtrace on PA-RISC
This patch fixes backtrace on PA-RISC
There were several problems:
1) The code that decodes instructions handles instructions that subtract
from the stack pointer incorrectly. If the instruction subtracts the
number X from the stack pointer the code increases the frame size by
(0x100000000-X). This results in invalid accesses to memory and
recursive page faults.
2) Because gcc reorders blocks, handling instructions that subtract from
the frame pointer is incorrect. For example, this function
int f(int a)
{
if (__builtin_expect(a, 1))
return a;
g();
return a;
}
is compiled in such a way, that the code that decreases the stack
pointer for the first "return a" is placed before the code for "g" call.
If we recognize this decrement, we mistakenly believe that the frame
size for the "g" call is zero.
To fix problems 1) and 2), the patch doesn't recognize instructions that
decrease the stack pointer at all. To further safeguard the unwind code
against nonsense values, we don't allow frame size larger than
Total_frame_size.
3) The backtrace is not locked. If stack dump races with module unload,
invalid table can be accessed.
This patch adds a spinlock when processing module tables.
Note, that for correct backtrace, you need recent binutils.
Binutils 2.18 from Debian 5 produce garbage unwind tables.
Binutils 2.21 work better (it sometimes forgets function frames, but at
least it doesn't generate garbage).
Linus Torvalds [Sat, 4 Jun 2016 19:30:36 +0000 (12:30 -0700)]
Merge tag 'drm-fixes-for-v4.7-rc2' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A bunch of ARM drivers got into the fixes vibe this time around, so
this contains a bunch of fixes for imx, atmel hlcdc, arm hdlcd (only
so many combos of hlcd), mediatek and omap drm.
Other than that there is one mgag200 fix and a few core drm regression
fixes"
* tag 'drm-fixes-for-v4.7-rc2' of git://people.freedesktop.org/~airlied/linux: (34 commits)
drm/omap: fix unused variable warning.
drm: hdlcd: Add information about the underlying framebuffers in debugfs
drm: hdlcd: Cleanup the atomic plane operations
drm/hdlcd: Fix up crtc_state->event handling
drm: hdlcd: Revamp runtime power management
drm/mediatek: mtk_dsi: Remove spurious drm_connector_unregister
drm/mediatek: mtk_dpi: remove invalid error message
drm: atmel-hlcdc: fix a NULL check
drm: atmel-hlcdc: fix atmel_hlcdc_crtc_reset() implementation
drm/mgag200: Black screen fix for G200e rev 4
drm: Wrap direct calls to driver->gem_free_object from CMA
drm: fix fb refcount issue with atomic modesetting
drm: make drm_atomic_set_mode_prop_for_crtc() more reliable
drm/sti: remove extra mode fixup
drm: add missing drm_mode_set_crtcinfo call
drm/omap: include gpio/consumer.h where needed
drm/omap: include linux/seq_file.h where needed
Revert "drm/omap: no need to select OMAP2_DSS"
drm/omap: Remove regulator API abuse
OMAPDSS: HDMI5: Change DDC timings
...
Linus Torvalds [Sat, 4 Jun 2016 19:25:36 +0000 (12:25 -0700)]
Merge tag 'vfio-v4.7-rc2' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
"Fix irqfd shutdown ordering, build warning, and VPD short read"
* tag 'vfio-v4.7-rc2' of git://github.com/awilliam/linux-vfio:
vfio/pci: Allow VPD short read
vfio/type1: Fix build warning
vfio/pci: Fix ordering of eventfd vs virqfd shutdown
Linus Torvalds [Sat, 4 Jun 2016 18:56:28 +0000 (11:56 -0700)]
Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"The important part of this pull is Filipe's set of fixes for btrfs
device replacement. Filipe fixed a few issues seen on the list and a
number he found on his own"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent
Btrfs: fix race between device replace and read repair
Btrfs: fix race between device replace and discard
Btrfs: fix race between device replace and chunk allocation
Btrfs: fix race setting block group back to RW mode during device replace
Btrfs: fix unprotected assignment of the left cursor for device replace
Btrfs: fix race setting block group readonly during device replace
Btrfs: fix race between device replace and block group removal
Btrfs: fix race between readahead and device replace/removal
Linus Torvalds [Sat, 4 Jun 2016 18:37:53 +0000 (11:37 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"We have a few follow-up fixes for the libceph refactor from Ilya, and
then some cephfs + fscache fixes from Zheng.
The first two FS-Cache patches are acked by David Howells and deemed
trivial enough to go through our tree. The rest fix some issues with
the ceph fscache handling (disable cache for inodes opened for write,
and simplify the revalidation logic accordingly, dropping the
now-unnecessary work queue)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: use i_version to check validity of fscache
ceph: improve fscache revalidation
ceph: disable fscache when inode is opened for write
ceph: avoid unnecessary fscache invalidation/revlidation
ceph: call __fscache_uncache_page() if readpages fails
FS-Cache: make check_consistency callback return int
FS-Cache: wake write waiter after invalidating writes
libceph: use %s instead of %pE in dout()s
libceph: put request only if it's done in handle_reply()
libceph: change ceph_osdmap_flag() to take osdc
Linus Torvalds [Sat, 4 Jun 2016 18:26:49 +0000 (11:26 -0700)]
Merge tag 'acpi-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Two fixes for problems introduced recently (ACPICA and the ACPI
backlight driver) and one fix for an older issue that prevents at
least one system from booting.
Specifics:
- Fix an incorrect check introduced by recent ACPICA changes which
causes problems with booting KVM guests to happen, among other
things (Lv Zheng).
- Fix a backlight issue introduced by recent changes to the ACPI
video driver (Aaron Lu).
- Fix the ACPI processor initialization which attempts to register an
IO region without checking if that really is necessary and
sometimes prevents drivers loaded subsequently from registering
their resources which leads to boot issues (Rafael Wysocki)"
* tag 'acpi-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / processor: Avoid reserving IO regions too early
ACPICA / Hardware: Fix old register check in acpi_hw_get_access_bit_width()
ACPI / Thermal / video: fix max_level incorrect value
Linus Torvalds [Sat, 4 Jun 2016 18:07:57 +0000 (11:07 -0700)]
Merge tag 'pm-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Two fixes for problems introduced recently in the cpufreq core and the
intel_pstate driver.
Specifics:
- Fix a silly mistake related to the clamp_val() usage in a function
added by a recent commit (Rafael Wysocki).
- Reduce the log level of an annoying message added to intel_pstate
during the recent merge window (Srinivas Pandruvada)"
* tag 'pm-4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Fix clamp_val() usage in cpufreq_driver_fast_switch()
cpufreq: intel_pstate: Downgrade print level for _PPC
Linus Torvalds [Sat, 4 Jun 2016 17:51:29 +0000 (10:51 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge various fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
mm, oom_reaper: do not use siglock in try_oom_reaper()
mm, page_alloc: prevent infinite loop in buffered_rmqueue()
checkpatch: reduce git commit description style false positives
mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
mm: check the return value of lookup_page_ext for all call sites
kdump: fix dmesg gdbmacro to work with record based printk
mm: fix overflow in vm_map_ram()
Linus Torvalds [Fri, 3 Jun 2016 23:12:35 +0000 (16:12 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- a few simple fixes for fallout from the recent gic-v3 changes
- a workaround for a Cavium thunderX erratum
- a bugfix for the pic32 irqchip to make external interrupts work proper
- a missing return value in the generic IPI management code
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-pic32-evic: Fix bug with external interrupts.
irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
irqchip/gic-v3: Fix quiescence check in gic_enable_redist
irqchip/gic-v3: Fix copy+paste mistakes in defines
irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
genirq: Fix missing return value in irq_destroy_ipi()
Mel Gorman [Fri, 3 Jun 2016 21:56:01 +0000 (14:56 -0700)]
mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
The optimistic fast path may use cpuset_current_mems_allowed instead of
of a NULL nodemask supplied by the caller for cpuset allocations. The
preferred zone is calculated on this basis for statistic purposes and as
a starting point in the zonelist iterator.
However, if the context can ignore memory policies due to being atomic
or being able to ignore watermarks then the starting point in the
zonelist iterator is no longer correct. This patch resets the zonelist
iterator in the allocator slowpath if the context can ignore memory
policies. This will alter the zone used for statistics but only after
it is known that it makes sense for that context. Resetting it before
entering the slowpath would potentially allow an ALLOC_CPUSET allocation
to be accounted for against the wrong zone. Note that while nodemask is
not explicitly set to the original nodemask, it would only have been
overwritten if cpuset_enabled() and it was reset before the slowpath was
entered.
Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 3 Jun 2016 21:55:58 +0000 (14:55 -0700)]
mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
Geert Uytterhoeven reported the following problem that bisected to
commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone
in a zonelist twice") on m68k/ARAnyM
The relationship is not obvious but it's due to a failure to rescan the
full zonelist after the fair zone allocation policy exhausts the batch
count. While this is a functional problem, it's also a performance
issue. A page allocator microbenchmark showed the following
4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
Min alloc-odr0-1 327.00 ( 0.00%) 326.00 ( 0.31%)
Min alloc-odr0-2 235.00 ( 0.00%) 235.00 ( 0.00%)
Min alloc-odr0-4 198.00 ( 0.00%) 198.00 ( 0.00%)
Min alloc-odr0-8 170.00 ( 0.00%) 170.00 ( 0.00%)
Min alloc-odr0-16 156.00 ( 0.00%) 156.00 ( 0.00%)
Min alloc-odr0-32 150.00 ( 0.00%) 150.00 ( 0.00%)
Min alloc-odr0-64 146.00 ( 0.00%) 146.00 ( 0.00%)
Min alloc-odr0-128 145.00 ( 0.00%) 145.00 ( 0.00%)
Min alloc-odr0-256 155.00 ( 0.00%) 155.00 ( 0.00%)
Min alloc-odr0-512 168.00 ( 0.00%) 165.00 ( 1.79%)
Min alloc-odr0-1024 175.00 ( 0.00%) 174.00 ( 0.57%)
Min alloc-odr0-2048 180.00 ( 0.00%) 180.00 ( 0.00%)
Min alloc-odr0-4096 187.00 ( 0.00%) 186.00 ( 0.53%)
Min alloc-odr0-8192 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-16384 191.00 ( 0.00%) 191.00 ( 0.00%)
Min alloc-odr1-1 736.00 ( 0.00%) 445.00 ( 39.54%)
Min alloc-odr1-2 343.00 ( 0.00%) 335.00 ( 2.33%)
Min alloc-odr1-4 277.00 ( 0.00%) 270.00 ( 2.53%)
Min alloc-odr1-8 238.00 ( 0.00%) 233.00 ( 2.10%)
Min alloc-odr1-16 224.00 ( 0.00%) 218.00 ( 2.68%)
Min alloc-odr1-32 210.00 ( 0.00%) 208.00 ( 0.95%)
Min alloc-odr1-64 207.00 ( 0.00%) 203.00 ( 1.93%)
Min alloc-odr1-128 276.00 ( 0.00%) 202.00 ( 26.81%)
Min alloc-odr1-256 206.00 ( 0.00%) 202.00 ( 1.94%)
Min alloc-odr1-512 207.00 ( 0.00%) 202.00 ( 2.42%)
Min alloc-odr1-1024 208.00 ( 0.00%) 205.00 ( 1.44%)
Min alloc-odr1-2048 213.00 ( 0.00%) 212.00 ( 0.47%)
Min alloc-odr1-4096 218.00 ( 0.00%) 216.00 ( 0.92%)
Min alloc-odr1-8192 341.00 ( 0.00%) 219.00 ( 35.78%)
Note that order-0 allocations are unaffected but higher orders get a
small boost from this patch and a large reduction in system CPU usage
overall as can be seen here:
4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
User 85.32 86.31
System 2221.39 2053.36
Elapsed 2368.89 2202.47
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/20160531100848.GR2527@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 3 Jun 2016 21:55:55 +0000 (14:55 -0700)]
mm, oom_reaper: do not use siglock in try_oom_reaper()
Oleg has noted that siglock usage in try_oom_reaper is both pointless
and dangerous. signal_group_exit can be checked lockless. The problem
is that sighand becomes NULL in __exit_signal so we can crash.
Fixes: 3ef22dfff239 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path") Link: http://lkml.kernel.org/r/1464679423-30218-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>