Sascha Hauer [Thu, 20 Jun 2013 15:34:31 +0000 (17:34 +0200)]
ARM: i.MX6: call ksz9021 phy fixup for all i.MX6 boards
In current U-Boot the sabrelite, nitrogen6x and titanium all need
the same fixup for the ksz9021 phy. Instead of limiting the fixup
to a single board apply them for all.
This part of bias settings are essential for WM8962 to power up. Without it
"wm8962 0-001a: DC servo timed out" might be prompted due to power-up failure
that happens to FLL if being used.
The driver's also bringing the bias down in the suspend path so it needs to be
powered up in the resume path for symmetry.
According to dapm_pre_sequence_async(), DAPM would call pm_runtime_get_sync()
to let driver finish the bias settings in pm_runtime_resume() before the bias
level being set to STANDBY. So no need to worry about disordered settings for
VMID of WM8962.
Signed-off-by: Nicolin Chen <b42378@freescale.com> Signed-off-by: Mark Brown <broonie@linaro.org> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
ARM: dts: imx: remove old DMA binding data from gpmi node
After mxs-dma driver adopts generic DMA device tree binding, gpmi
channel interrupt number is defined in DMA controller node, and
channel ID is listed in "dmas" property. So the DMA channel interrupt
number in gpmi node "interrupts" property and fsl,gpmi-dma-channel which
are used by old customized DMA binding can be removed now.
ASoC: fsl: move fsl_ssi to generic DMA DT bindings
It removes the temporary custom DMA bindings from fsl_ssi driver and use
the generic one instead. It leaves imx-ssi drive unchanged regarding
those pcm flags by updating imx_pcm_dma_init() to let SSI driver decide
the flags.
With imx-pcm-dma moving to generic dmaengine pcm driver and the removal
of imx-pcm-audio/imx-fiq-pcm-audio platform device use, now imx-pcm
driver contains a few functions that are only used by imx-pcm-fiq.c.
Move these functions into imx-pcm-fiq.c and remove imx-pcm.c completely.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Rather than instantiating imx-fiq-pcm-audio to call imx_pcm_fiq_init(),
imx-ssi can just directly call it to save the use of imx-fiq-pcm-audio.
With this change, imx-ssi becomes not only a cpu DAI but also a platform
device, so updates platform device setup in eukrea-tlv320, phycore-ac97
and wm1133-ev1 accordingly.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Rather than instantiating imx-pcm-audio to call imx_pcm_dma_init(),
imx-ssi can just directly call it to save the use of imx-pcm-audio.
With this change, imx-ssi becomes not only a cpu DAI but also a
platform device, so updates platform device setup in imx-mc13783 and
mx27vis-aic32x4 accordingly.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Rather than instantiating imx-pcm-audio to call imx_pcm_dma_init(),
fsl_ssi can just directly call it to save the use of imx-pcm-audio.
With this change, fsl_ssi becomes not only a cpu DAI but also a platform
device, so updates platform device setup in imx-sgtl5000 accordingly.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Update imx-sdma driver to adopt generic DMA device tree bindings. It
calls of_dma_controller_register() with imx-sdma specific of_dma_xlate
to get the generic DMA device tree helper support. The #dma-cells for
imx-sdma must be 3, which includes request ID, peripheral type and
priority.
The existing way of requesting channel, clients directly call
dma_request_channel(), still work there, and will be removed after
all imx-sdma clients get converted to generic DMA device tree helper.
ENGR00269945: ARM: dts: add uart3 support for imx6qdl-sabreauto
On imx6qdl-sabreauto board, the pin function UART3_CTS is steered by the
GPIO4 of MAX7310 Expander B, while UART3_RXD and UART3_TXD are steered
by GPIO3 of MAX7310 Expander C. And both GPIOs need to be pulled high
to assert UART3 on the board.
ENGR00269945: ARM: dts: add max7310 support for imx6qdl-sabreauto
On imx6qdl-sabreauto board, there are three IO expanders implemented by
max7310, which are all controlled by I2C3. And GPIO5_4 is steering the
I2C3_SDA availability, while GPIO1_15 is used to reset max7310.
ENGR00269945: pinctrl: support pinctrl setting assertion via gpios
It's pretty common that on some reference design or validation boards,
one pin could be used by two devices on board, and the pin route is
controlled by a GPIO. So to assert the pin for given device, not only
the pinmux controller in SoC needs to be set up properly but also the
GPIO needs to be pulled up/down.
The patch adds support of a device tree property "pinctrl-assert-gpios"
under client device node. It plays pretty much like a board level pin
multiplexer, and steers the pin route by controlling the GPIOs. When
client device has the property represent in its node, pinctrl device
tree mapping function will firstly pull up/down the GPIOs to assert the
pins for the device at board level.
The pca953x type of devices, e.g. max7310, may have a reset which needs
to be handled to get the device start working. Add a device_reset()
call for that, and defer the probe if the reset controller for that is
not ready yet.
ENGR00269945: select ARCH_HAS_RESET_CONTROLLER for IMX
Move ARCH_HAS_RESET_CONTROLLER from HAVE_IMX_SRC to ARCH_MXC to have it
selected for the whole IMX family instead of SRC (System Reset
Controller), since GPIO could be another reset controller in many cases.
ENGR00269945: reset: add dummy device_reset() for !CONFIG_RESET_CONTROLLER build
Add dummy device_reset() function for !CONFIG_RESET_CONTROLLER build,
so that we do not have to add #ifdef CONFIG_RESET_CONTROLLER in every
single client device drivers that call the function.
ENGR00269945: reset: register gpio-reset driver in arch_initcall
It's a little bit late to register gpio-reset driver at module_init
time, because gpio-reset provides reset control via gpio for other
devices which are mostly probed at module_init time too. And it
becomes even worse, when the gpio comes from IO expander on I2C bus,
e.g. pca953x. In that case, gpio-reset needs to be ready before I2C
bus driver which is generally ready at subsys_initcall time. Let's
register gpio-reset driver in arch_initcall() to have it ready early
enough.
The defer probe mechanism is not used here, because a reset controller
driver should be reasonably registered early than other devices. More
importantly, defer probe doe not help in some nasty cases, e.g. the
gpio-pca953x device itself needs a reset from gpio-reset driver start
working.
ENGR00269945: reset: handle cansleep case in gpio-reset
Some gpio reset may be backed by a gpio that can sleep, e.g. pca953x
gpio output. For such gpio, gpio_set_value_cansleep() should be
called. Otherwise, the WARN_ON(chip->can_sleep) in gpiod_set_value()
will be hit. Add a gpio_cansleep() check to handle cansleep gpio.
Philipp Zabel [Thu, 30 May 2013 09:09:00 +0000 (11:09 +0200)]
reset: Add driver for gpio-controlled reset pins
This driver implements a reset controller device that toggle a gpio
connected to a reset pin of a peripheral IC. The delay between assertion
and de-assertion of the reset signal can be configured via device tree.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
ARM: dts: imx: share pad macro names between imx6q and imx6dl
The imx6q and imx6dl are two pin-to-pin compatible SoCs. The same board
design can work with either chip plugged into the socket, e.g. sabresd
and sabreauto boards.
We currently define pin groups in imx6q.dtsi and imx6dl.dtsi
respectively because the pad macro names are different between two
chips. This brings a maintenance burden on having the same label point
to the same pin group defined in two places.
The patch replaces prefix MX6Q_ and MX6DL_ with MX6QDL_ for both SoCs
pad macro names. Then the pin groups becomes completely common between
imx6q and imx6dl and can just be moved into imx6qdl.dtsi, so that the
long term maintenance of imx6q/dt pin settings becomes easier.
Unfortunately, the change brings some dramatic diff stat, but it's all
about DTS file, and the ultimate net diff stat is good.
The following error randomly appears on an imx6q board where gpio is
used to implement card-detection when mounting EXT4 rootfs during boot.
mmc1: Card removed during transfer!
mmc1: Resetting controller.
mmcblk0: unknown error -123 sending read/write command, card status 0x900
end_request: I/O error, dev mmcblk0, sector 106744
EXT4-fs error (device mmcblk0p2): ext4_find_entry:1312: inode #5011: comm swapper/0: reading directory lblock 0
It turns out that the error message comes from the card removal check
in function sdhci_card_event(). While we have a well implemented
function sdhci_do_get_cd() handling all the possible cases of
CD, the current code only checks controller internal CD case. That
causes problem for other CD cases like gpio on above imx6q board.
Improve the check by using sdhci_do_get_cd() to cover all possible CD
cases, so that above error on the imx6q board gets fixed.
arch/arm/mach-imx/system.c:101:123: error: static declaration of
‘imx_init_l2cache’ follows non-static declaration
In file included from arch/arm/mach-imx/system.c:32:0:
arch/arm/mach-imx/common.h:165:13: note: previous declaration of
‘imx_init_l2cache’ was here
arch/arm/mach-imx/system.c:101:123: warning: ‘imx_init_l2cache’ defined but
not used [-Wunused-function]
The fec/enet driver calculates MDC rate with the formula below.
ref_freq / ((MII_SPEED + 1) x 2)
The ref_freq here is the fec internal module clock, which is missing
from clk-vf610 clock driver right now. And clk-vf610 driver mistakenly
supplies RMII clock (50 MHz) as the source to fec. This results in the
situation that fec driver gets ref_freq as 50 MHz, while physically it
runs at 66 MHz (fec module clock physically sources from ipg which runs
at 66 MHz). That's why software expects MDC runs at 2.5 MHz, while the
measurement tells it runs at 3.3 MHz. And this causes the PHY KSZ8041
keeps swithing between Full and Half mode as below.
libphy: 400d0000.etherne:00 - Link is Up - 100/Full
libphy: 400d0000.etherne:00 - Link is Up - 100/Half
libphy: 400d0000.etherne:00 - Link is Up - 100/Full
libphy: 400d0000.etherne:00 - Link is Up - 100/Half
libphy: 400d0000.etherne:00 - Link is Up - 100/Full
libphy: 400d0000.etherne:00 - Link is Up - 100/Half
Add the missing module clock for ENET0 and ENET1, and correct the clock
supplying in device tree to fix above issue.
Thanks to Alison Wang <b18965@freescale.com> for debugging the issue.
Liu Ying [Thu, 4 Jul 2013 09:57:17 +0000 (17:57 +0800)]
ARM: imx6: change some clocks to fixup clocks
All the clocks controlled by the register 'CCM Serial Clock
Multiplexer Register 1' should be fixup clocks. This patch
changes those clocks from basic multiplexer or divider clocks
to fixup clocks.
Signed-off-by: Liu Ying <Ying.Liu@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Liu Ying [Thu, 4 Jul 2013 09:35:46 +0000 (17:35 +0800)]
ARM: imx: add common clock support for fixup mux
One register may have several fields to control some clocks. It
is possible that the read/write values of some fields may map to
different real functional values, so writing to the other fields
in the same register may break a working clock tree. A real case
is the aclk_podf field in the register 'CCM Serial Clock Multiplexer
Register 1' of i.MX6Q/SDL SoC. This patch introduces a fixup hook
for multiplexer clock which is called before writing a value to
clock registers to support this kind of multiplexer clocks.
Signed-off-by: Liu Ying <Ying.Liu@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Liu Ying [Thu, 4 Jul 2013 09:22:26 +0000 (17:22 +0800)]
ARM: imx: add common clock support for fixup div
One register may have several fields to control some clocks. It
is possible that the read/write values of some fields may map to
different real functional values, so writing to the other fields
in the same register may break a working clock tree. A real case
is the aclk_podf field in the register 'CCM Serial Clock Multiplexer
Register 1' of i.MX6Q/SDL SoC. This patch introduces a fixup hook
for divider clock which is called before writing a value to clock
registers to support this kind of divider clocks.
Signed-off-by: Liu Ying <Ying.Liu@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
ARM: imx: let L2 initialization be a common function
Move imx6q L2 initialization function imx6q_init_l2cache() into
system.c, and rename it imx_init_l2cache(), so that other platforms
other than imx6q can also use the function.
The MLB PLL clock's operation doesn't fit for clock framework and
it should be handled internally in MLB driver.
Remove initialization of pll8_mlb clock device but leave its
declaration in mx6q_clks to avoid affecting imx6q clock numbering.
[ shawn.guo: The MLB PLL is currently implemented as an imx pllv3
clock. But it does not really make too much sense, because the PLL
does not have ENABLE, POWERDOWN and DIV_SELECT bits.
Also commit 0e57446 (ARM i.MX6: correct MLB clock configuration)
already removes the incorrect parenting on MLB PLL, now it's safe
and reasonable to remove the PLL completely from clock framework,
and let MLB driver handle the PLL per its particular need. ]
The CCM_CBCMR register (address 0x02C4018) has different meaning
between the i.MX6 Quad/Dual and the i.MX6 Solo/DualLite.
Compared to the i.MX6 Quad/Dual, the CCM_CBCMR register in the
i.MX6 Solo/DualLite doesn't have a gpu3d_shader configuration and
moves the gpu2_core configuration at that place.
Handle these i.MX6 Quad/Dual vs. i.MX6 Solo/DualLite clock differences
by using cpu_is_mx6dl().
serial: fsl_lpuart: restore UARTCR2 after watermark setup is done
Function lpuart_setup_watermark() clears some bits in register UARTCR2
before writing FIFO configuration registers as required by hardware.
But it should restore UARTCR2 after that. Otherwise, we end up changing
UARTCR2 register when setting up watermark, and that is not really
desirable. At least, when low-level debug and earlyprint is enabled,
serial console is broken due to it.
Fix the problem by restoring UARTCR2 register at the end of function
lpuart_setup_watermark().
Add clock support for Vybrid VF610. It uses dtc macro support to
define all clock IDs in vf610-clock.h to keep clock IDs coherence
between kernel and DT.
On some platforms such as VF610, offset of mux and pad ctrl register
may be zero, and the mux_mode and config_val are in one 32-bit register.
This patch adds support to imx core pinctrl framework to handle these
cases.
mmc: sdhci: request irq after sdhci_init() is called
Generally request_irq() should be called after hardware has been
initialized into a sane state. However, sdhci driver currently calls
request_irq() before sdhci_init(). At least, the following kernel panic
seen on i.MX6 is caused by that. The sdhci controller on i.MX6 may have
noisy glitch on DAT1 line, which will trigger SDIO interrupt handling
once request_irq() is called. But at this point, the SDIO interrupt
handler host->sdio_irq_thread has not been registered yet. Thus, we
see the NULL pointer access with wake_up_process(host->sdio_irq_thread)
in mmc_signal_sdio_irq().
sdhci-pltfm: SDHCI platform and OF driver helper
mmc0: no vqmmc regulator found
mmc0: no vmmc regulator found
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 80004000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0+ #3
task: 9f860000 ti: 9f862000 task.ti: 9f862000
PC is at wake_up_process+0xc/0x44
LR is at sdhci_irq+0x378/0x93c
...
Backtrace:
[<8004f75c>] (wake_up_process+0x0/0x44) from [<803fb698>]
(sdhci_irq+0x378/0x93c)
r4:9fa68000 r3:00000001
[<803fb320>] (sdhci_irq+0x0/0x93c) from [<80075154>]
(handle_irq_event_percpu+0x54/0x19c)
[<80075100>] (handle_irq_event_percpu+0x0/0x19c) from [<800752ec>]
(handle_irq_event+0x50/0x70)
[<8007529c>] (handle_irq_event+0x0/0x70) from [<80078324>]
(handle_fasteoi_irq+0x9c/0x170)
r5:00000001 r4:9f807900
[<80078288>] (handle_fasteoi_irq+0x0/0x170) from [<80074ac0>]
(generic_handle_irq+0x28/0x38)
r5:8071fd64 r4:00000036
[<80074a98>] (generic_handle_irq+0x0/0x38) from [<8000ee34>]
(handle_IRQ+0x54/0xb4)
r4:8072ab78 r3:00000140
[<8000ede0>] (handle_IRQ+0x0/0xb4) from [<80008600>]
(gic_handle_irq+0x30/0x64)
r8:00000036 r7:a080e100 r6:9f863cd0 r5:8072acbc r4:a080e10c
r3:00000000
[<800085d0>] (gic_handle_irq+0x0/0x64) from [<8000e0c0>]
(__irq_svc+0x40/0x54)
...
---[ end trace e9af3588936b63f0 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Fix the panic by simply reverse the calling sequence between
request_irq() and sdhci_init().
Instead of explicitly calling clock initialization functions, we can
declare the functions with CLK_OF_DECLARE() and then call common
of_clk_init() to have them invoked properly.
Add clock support for i.MX6 SoloLite. It uses the dtc marco support to
define all clock IDs in imx6sl-clock.h, which will be included by both
clock driver and device tree sources, so that the data will stay sync
all the time between kernel and DT.
Currently clock providers defined in the DT are not registered
on i.MX5 platforms since of_clk_init() is not called.
This is not a problem for the SOC's own clocks, which are registered
in code, but prevents the DT being used to define clocks for external
hardware.
Fix this by calling of_clk_init() and actually using the DT to obtain
the 4 SOC fixed clocks.
These are already defined in the DT but were previously just used to
manually obtain the rate.
Fall back to the old scheme for non DT platforms.
Since the same method may be useful for other i.MX platforms
implement the imx_obtain_fixed_clock() function in common code.
Actually changing other i.MX platforms to use this should be done
later by someone with access to the appropriate hardware.
The mxc_arch_reset_init() uses static mapping and calls clk_get_sys() to
get clock. It's suitable for non-DT boot but not for DT boot where
dynamic mapping and of_clk_get() should be used instead. Create
mxc_arch_reset_init_dt() as the DT variant of mxc_arch_reset_init(),
and change DT platforms to use it.
It's inappropriate to call clk_prepare() in mxc_restart(), because the
restart routine could be called in atomic context. Move clk_get() and
clk_prepare() into mxc_arch_reset_init() and only have the atomic part
clk_enable() be called in mxc_restart().
As a result, mxc_arch_reset_init() needs to be called after clk gets
initialized.
While there, it also changes printk(KERN_ERR ...) to pr_err() and adds
__init annotation for mxc_arch_reset_init().
If the current rate of parent clock is sufficient to provide child a
requested rate with a proper divider setting, the rate change request
should not be propagated. Instead, changing the divider setting is good
enough to get child clock run at the requested rate.
On an imx6q clock configuration illustrated below,
ahb --> ipg --> ipg_per
132M 66M 66M
calling clk_set_rate(ipg_per, 22M) with the current
clk_divider_bestdiv() implementation will result in the rate change up
to ahb level like the following, because of the unnecessary/incorrect
rate change propagation.
ahb --> ipg --> ipg_per
66M 22M 22M
Fix the problem by trying to see if the requested rate can be achieved
by simply changing the divider value, and in that case return the
divider immediately from function clk_divider_bestdiv() as the best
one, so that all those unnecessary rate change propagation can be saved.
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size
larger than early_memremap() is able to handle, which is currently
limited to 256kB. If this occurs it leads to a NULL dereference in
parse_setup_data().
To avoid this, remap the setup_data header and allow parsing functions
for individual types to handle their own data remapping.
Signed-off-by: Linn Crosetto <linn@hp.com> Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com Acked-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.
Manfred illustrates this nicely:
Assume a preemptible kernel that is preempted just after
msq = msq_obtain_object_check(ns, msqid)
in do_msgrcv(). The only lock that is held is rcu_read_lock().
Now the other thread processes IPC_RMID. When the first task is
resumed, then it will happily wait for messages on a deleted queue.
Fix this by checking for if the queue has been deleted after taking the
lock.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reported-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).
But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.
The fix is simple:
Non-alter operations must update sem_otime.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: Jia He <jiakernel@gmail.com> Tested-by: Jia He <jiakernel@gmail.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly. This means that simple semop() operations
can run in parallel with the proc interface. Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.
But it is dangerous and therefore should be fixed.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Operations that need access to the whole array must guarantee that there
are no simple operations ongoing. Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.
If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.
The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.
I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.
The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.
Details of the bug:
Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).
Pseudo-Trace:
Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.
Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;
/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);
/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<
Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"
locknum = sops->sem_num;
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition. Manfred illustrates this nicely,
for instance with shared mem and selinux:
-> do_shmat calls rcu_read_lock()
-> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
-> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
-> selinux_shm_shmat calls ipc_has_perm()
-> ipc_has_perm accesses ipc_perms->security
This patch delays the freeing of the security structures after all RCU
readers are done. Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept. Linus states:
"... the old behavior was suspect for another reason too: having the
security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."
I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.
After previous cleanups and optimizations, this function is no longer
heavily used and we don't have a good reason to keep it. Update the few
remaining callers and get rid of it.