The following error randomly appears on an imx6q board where gpio is
used to implement card-detection when mounting EXT4 rootfs during boot.
mmc1: Card removed during transfer!
mmc1: Resetting controller.
mmcblk0: unknown error -123 sending read/write command, card status 0x900
end_request: I/O error, dev mmcblk0, sector 106744
EXT4-fs error (device mmcblk0p2): ext4_find_entry:1312: inode #5011: comm swapper/0: reading directory lblock 0
It turns out that the error message comes from the card removal check
in function sdhci_card_event(). While we have a well implemented
function sdhci_do_get_cd() handling all the possible cases of
CD, the current code only checks controller internal CD case. That
causes problem for other CD cases like gpio on above imx6q board.
Improve the check by using sdhci_do_get_cd() to cover all possible CD
cases, so that above error on the imx6q board gets fixed.
Vincent Stehlé [Wed, 10 Jul 2013 09:45:46 +0000 (11:45 +0200)]
ARM: imx: fix imx_init_l2cache storage class
This fixes the following compilation error:
arch/arm/mach-imx/system.c:101:123: error: static declaration of
‘imx_init_l2cache’ follows non-static declaration
In file included from arch/arm/mach-imx/system.c:32:0:
arch/arm/mach-imx/common.h:165:13: note: previous declaration of
‘imx_init_l2cache’ was here
arch/arm/mach-imx/system.c:101:123: warning: ‘imx_init_l2cache’ defined but
not used [-Wunused-function]
Signed-off-by: Vincent Stehlé <vincent.stehle@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
The fec/enet driver calculates MDC rate with the formula below.
ref_freq / ((MII_SPEED + 1) x 2)
The ref_freq here is the fec internal module clock, which is missing
from clk-vf610 clock driver right now. And clk-vf610 driver mistakenly
supplies RMII clock (50 MHz) as the source to fec. This results in the
situation that fec driver gets ref_freq as 50 MHz, while physically it
runs at 66 MHz (fec module clock physically sources from ipg which runs
at 66 MHz). That's why software expects MDC runs at 2.5 MHz, while the
measurement tells it runs at 3.3 MHz. And this causes the PHY KSZ8041
keeps swithing between Full and Half mode as below.
libphy: 400d0000.etherne:00 - Link is Up - 100/Full
libphy: 400d0000.etherne:00 - Link is Up - 100/Half
libphy: 400d0000.etherne:00 - Link is Up - 100/Full
libphy: 400d0000.etherne:00 - Link is Up - 100/Half
libphy: 400d0000.etherne:00 - Link is Up - 100/Full
libphy: 400d0000.etherne:00 - Link is Up - 100/Half
Add the missing module clock for ENET0 and ENET1, and correct the clock
supplying in device tree to fix above issue.
Thanks to Alison Wang <b18965@freescale.com> for debugging the issue.
Liu Ying [Thu, 4 Jul 2013 09:57:17 +0000 (17:57 +0800)]
ARM: imx6: change some clocks to fixup clocks
All the clocks controlled by the register 'CCM Serial Clock
Multiplexer Register 1' should be fixup clocks. This patch
changes those clocks from basic multiplexer or divider clocks
to fixup clocks.
Signed-off-by: Liu Ying <Ying.Liu@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Liu Ying [Thu, 4 Jul 2013 09:35:46 +0000 (17:35 +0800)]
ARM: imx: add common clock support for fixup mux
One register may have several fields to control some clocks. It
is possible that the read/write values of some fields may map to
different real functional values, so writing to the other fields
in the same register may break a working clock tree. A real case
is the aclk_podf field in the register 'CCM Serial Clock Multiplexer
Register 1' of i.MX6Q/SDL SoC. This patch introduces a fixup hook
for multiplexer clock which is called before writing a value to
clock registers to support this kind of multiplexer clocks.
Signed-off-by: Liu Ying <Ying.Liu@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Liu Ying [Thu, 4 Jul 2013 09:22:26 +0000 (17:22 +0800)]
ARM: imx: add common clock support for fixup div
One register may have several fields to control some clocks. It
is possible that the read/write values of some fields may map to
different real functional values, so writing to the other fields
in the same register may break a working clock tree. A real case
is the aclk_podf field in the register 'CCM Serial Clock Multiplexer
Register 1' of i.MX6Q/SDL SoC. This patch introduces a fixup hook
for divider clock which is called before writing a value to clock
registers to support this kind of divider clocks.
Signed-off-by: Liu Ying <Ying.Liu@freescale.com> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
ARM: imx: let L2 initialization be a common function
Move imx6q L2 initialization function imx6q_init_l2cache() into
system.c, and rename it imx_init_l2cache(), so that other platforms
other than imx6q can also use the function.
The MLB PLL clock's operation doesn't fit for clock framework and
it should be handled internally in MLB driver.
Remove initialization of pll8_mlb clock device but leave its
declaration in mx6q_clks to avoid affecting imx6q clock numbering.
[ shawn.guo: The MLB PLL is currently implemented as an imx pllv3
clock. But it does not really make too much sense, because the PLL
does not have ENABLE, POWERDOWN and DIV_SELECT bits.
Also commit 0e57446 (ARM i.MX6: correct MLB clock configuration)
already removes the incorrect parenting on MLB PLL, now it's safe
and reasonable to remove the PLL completely from clock framework,
and let MLB driver handle the PLL per its particular need. ]
The CCM_CBCMR register (address 0x02C4018) has different meaning
between the i.MX6 Quad/Dual and the i.MX6 Solo/DualLite.
Compared to the i.MX6 Quad/Dual, the CCM_CBCMR register in the
i.MX6 Solo/DualLite doesn't have a gpu3d_shader configuration and
moves the gpu2_core configuration at that place.
Handle these i.MX6 Quad/Dual vs. i.MX6 Solo/DualLite clock differences
by using cpu_is_mx6dl().
serial: fsl_lpuart: restore UARTCR2 after watermark setup is done
Function lpuart_setup_watermark() clears some bits in register UARTCR2
before writing FIFO configuration registers as required by hardware.
But it should restore UARTCR2 after that. Otherwise, we end up changing
UARTCR2 register when setting up watermark, and that is not really
desirable. At least, when low-level debug and earlyprint is enabled,
serial console is broken due to it.
Fix the problem by restoring UARTCR2 register at the end of function
lpuart_setup_watermark().
Add clock support for Vybrid VF610. It uses dtc macro support to
define all clock IDs in vf610-clock.h to keep clock IDs coherence
between kernel and DT.
On some platforms such as VF610, offset of mux and pad ctrl register
may be zero, and the mux_mode and config_val are in one 32-bit register.
This patch adds support to imx core pinctrl framework to handle these
cases.
mmc: sdhci: request irq after sdhci_init() is called
Generally request_irq() should be called after hardware has been
initialized into a sane state. However, sdhci driver currently calls
request_irq() before sdhci_init(). At least, the following kernel panic
seen on i.MX6 is caused by that. The sdhci controller on i.MX6 may have
noisy glitch on DAT1 line, which will trigger SDIO interrupt handling
once request_irq() is called. But at this point, the SDIO interrupt
handler host->sdio_irq_thread has not been registered yet. Thus, we
see the NULL pointer access with wake_up_process(host->sdio_irq_thread)
in mmc_signal_sdio_irq().
sdhci-pltfm: SDHCI platform and OF driver helper
mmc0: no vqmmc regulator found
mmc0: no vmmc regulator found
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 80004000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0+ #3
task: 9f860000 ti: 9f862000 task.ti: 9f862000
PC is at wake_up_process+0xc/0x44
LR is at sdhci_irq+0x378/0x93c
...
Backtrace:
[<8004f75c>] (wake_up_process+0x0/0x44) from [<803fb698>]
(sdhci_irq+0x378/0x93c)
r4:9fa68000 r3:00000001
[<803fb320>] (sdhci_irq+0x0/0x93c) from [<80075154>]
(handle_irq_event_percpu+0x54/0x19c)
[<80075100>] (handle_irq_event_percpu+0x0/0x19c) from [<800752ec>]
(handle_irq_event+0x50/0x70)
[<8007529c>] (handle_irq_event+0x0/0x70) from [<80078324>]
(handle_fasteoi_irq+0x9c/0x170)
r5:00000001 r4:9f807900
[<80078288>] (handle_fasteoi_irq+0x0/0x170) from [<80074ac0>]
(generic_handle_irq+0x28/0x38)
r5:8071fd64 r4:00000036
[<80074a98>] (generic_handle_irq+0x0/0x38) from [<8000ee34>]
(handle_IRQ+0x54/0xb4)
r4:8072ab78 r3:00000140
[<8000ede0>] (handle_IRQ+0x0/0xb4) from [<80008600>]
(gic_handle_irq+0x30/0x64)
r8:00000036 r7:a080e100 r6:9f863cd0 r5:8072acbc r4:a080e10c
r3:00000000
[<800085d0>] (gic_handle_irq+0x0/0x64) from [<8000e0c0>]
(__irq_svc+0x40/0x54)
...
---[ end trace e9af3588936b63f0 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Fix the panic by simply reverse the calling sequence between
request_irq() and sdhci_init().
Instead of explicitly calling clock initialization functions, we can
declare the functions with CLK_OF_DECLARE() and then call common
of_clk_init() to have them invoked properly.
Add clock support for i.MX6 SoloLite. It uses the dtc marco support to
define all clock IDs in imx6sl-clock.h, which will be included by both
clock driver and device tree sources, so that the data will stay sync
all the time between kernel and DT.
Currently clock providers defined in the DT are not registered
on i.MX5 platforms since of_clk_init() is not called.
This is not a problem for the SOC's own clocks, which are registered
in code, but prevents the DT being used to define clocks for external
hardware.
Fix this by calling of_clk_init() and actually using the DT to obtain
the 4 SOC fixed clocks.
These are already defined in the DT but were previously just used to
manually obtain the rate.
Fall back to the old scheme for non DT platforms.
Since the same method may be useful for other i.MX platforms
implement the imx_obtain_fixed_clock() function in common code.
Actually changing other i.MX platforms to use this should be done
later by someone with access to the appropriate hardware.
The mxc_arch_reset_init() uses static mapping and calls clk_get_sys() to
get clock. It's suitable for non-DT boot but not for DT boot where
dynamic mapping and of_clk_get() should be used instead. Create
mxc_arch_reset_init_dt() as the DT variant of mxc_arch_reset_init(),
and change DT platforms to use it.
It's inappropriate to call clk_prepare() in mxc_restart(), because the
restart routine could be called in atomic context. Move clk_get() and
clk_prepare() into mxc_arch_reset_init() and only have the atomic part
clk_enable() be called in mxc_restart().
As a result, mxc_arch_reset_init() needs to be called after clk gets
initialized.
While there, it also changes printk(KERN_ERR ...) to pr_err() and adds
__init annotation for mxc_arch_reset_init().
If the current rate of parent clock is sufficient to provide child a
requested rate with a proper divider setting, the rate change request
should not be propagated. Instead, changing the divider setting is good
enough to get child clock run at the requested rate.
On an imx6q clock configuration illustrated below,
ahb --> ipg --> ipg_per
132M 66M 66M
calling clk_set_rate(ipg_per, 22M) with the current
clk_divider_bestdiv() implementation will result in the rate change up
to ahb level like the following, because of the unnecessary/incorrect
rate change propagation.
ahb --> ipg --> ipg_per
66M 22M 22M
Fix the problem by trying to see if the requested rate can be achieved
by simply changing the divider value, and in that case return the
divider immediately from function clk_divider_bestdiv() as the best
one, so that all those unnecessary rate change propagation can be saved.
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size
larger than early_memremap() is able to handle, which is currently
limited to 256kB. If this occurs it leads to a NULL dereference in
parse_setup_data().
To avoid this, remap the setup_data header and allow parsing functions
for individual types to handle their own data remapping.
Signed-off-by: Linn Crosetto <linn@hp.com> Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com Acked-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.
Manfred illustrates this nicely:
Assume a preemptible kernel that is preempted just after
msq = msq_obtain_object_check(ns, msqid)
in do_msgrcv(). The only lock that is held is rcu_read_lock().
Now the other thread processes IPC_RMID. When the first task is
resumed, then it will happily wait for messages on a deleted queue.
Fix this by checking for if the queue has been deleted after taking the
lock.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reported-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 0a2b9d4c7967 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).
But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.
The fix is simple:
Non-alter operations must update sem_otime.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: Jia He <jiakernel@gmail.com> Tested-by: Jia He <jiakernel@gmail.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly. This means that simple semop() operations
can run in parallel with the proc interface. Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.
But it is dangerous and therefore should be fixed.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Operations that need access to the whole array must guarantee that there
are no simple operations ongoing. Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.
If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.
The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.
I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.
The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.
Details of the bug:
Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).
Pseudo-Trace:
Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.
Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;
/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);
/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<
Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"
locknum = sops->sem_num;
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition. Manfred illustrates this nicely,
for instance with shared mem and selinux:
-> do_shmat calls rcu_read_lock()
-> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
-> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
-> selinux_shm_shmat calls ipc_has_perm()
-> ipc_has_perm accesses ipc_perms->security
This patch delays the freeing of the security structures after all RCU
readers are done. Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept. Linus states:
"... the old behavior was suspect for another reason too: having the
security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."
I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.
After previous cleanups and optimizations, this function is no longer
heavily used and we don't have a good reason to keep it. Update the few
remaining callers and get rid of it.
As suggested by Andrew, add a generic initial locking scheme used
throughout all sysv ipc mechanisms. Documenting the ids rwsem, how rcu
can be enough to do the initial checks and when to actually acquire the
kern_ipc_perm.lock spinlock.
I found that adding it to util.c was generic enough.
Clean up some of the messy do_shmat() spaghetti code, getting rid of
out_free and out_put_dentry labels. This makes shortening the critical
region of this function in the next patch a little easier to do and read.
With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
deal with the remaining SHM_LOCK and SHM_UNLOCK commands. Take the
shm_perm lock after doing the initial auditing and security checks. The
rest of the logic remains unchanged.
While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
it unnecessarily. We can do the permissions and security checks only
holding the rcu lock.
Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.
Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
of msgctl(). Since we are just moving functionality, this change still
takes the lock and it will be properly lockless in the next patch.
This is the third and final patchset that deals with reducing the amount
of contention we impose on the ipc lock (kern_ipc_perm.lock). These
changes mostly deal with shared memory, previous work has already been
done for semaphores and message queues:
With these patches applied, a custom shm microbenchmark stressing shmctl
doing IPC_STAT with 4 threads a million times, reduces the execution
time by 50%. A similar run, this time with IPC_SET, reduces the
execution time from 3 mins and 35 secs to 27 seconds.
Patches 1-8: replaces blindly taking the ipc lock for a smarter
combination of rcu and ipc_obtain_object, only acquiring the spinlock
when updating.
Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.
Patch 10: is a trivial mqueue leftover cleanup
Patch 11: adds a brief lock scheme description, requested by Andrew.
This patch:
Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
to get the ipc object without acquiring the lock. Just as with other
forms of ipc, these functions are basically wrappers around
ipc_obtain_object*().
The check if the queue is full and adding current to the wait queue of
pending msgsnd() operations (ss_add()) must be atomic.
Otherwise:
- the thread that performs msgsnd() finds a full queue and decides to
sleep.
- the thread that performs msgrcv() first reads all messages from the
queue and then sleeps, because the queue is empty.
- the msgrcv() calls do not perform any wakeups, because the msgsnd()
task has not yet called ss_add().
- then the msgsnd()-thread first calls ss_add() and then sleeps.
Net result: msgsnd() and msgrcv() both sleep forever.
Observed with msgctl08 from ltp with a preemptible kernel.
Fix: Call ipc_lock_object() before performing the check.
The patch also moves security_msg_queue_msgsnd() under ipc_lock_object:
- msgctl(IPC_SET) explicitely mentions that it tries to expunge any
pending operations that are not allowed anymore with the new
permissions. If security_msg_queue_msgsnd() is called without locks,
then there might be races.
- it makes the patch much simpler.
sem_otime contains the time of the last semaphore operation that
completed successfully. Every operation updates this value, thus access
from multiple cpus can cause thrashing.
Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.
No performance improvement on a single-socket i3 - only important for
larger systems.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.
Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.
The patch restores the behavior of linux <=3.0.9: The longest waiting
operation has the highest priority.
This is done by using only one queue:
- if there are complex ops, then sma->pending_alter is used.
- otherwise, the per-semaphore queues are used.
As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce separate queues for operations that do not modify the
semaphore values. Advantages:
- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations are
always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux <=3.0.9
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
use first the full cores. HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.
Rationale:
The SysV sem code tries to move the main spinlock into a seperate
cacheline (____cacheline_aligned_in_smp). This works only if
ipc_rcu_alloc returns cacheline aligned pointers. vmalloc and kmalloc
return cacheline algined pointers, the implementation of ipc_rcu_alloc
breaks that.
While the INFO cmd doesn't take the ipc lock, the STAT commands do
acquire it unnecessarily. We can do the permissions and security checks
only holding the rcu lock.
This function now mimics semctl_nolock().
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add msq_obtain_object() and msq_obtain_object_check(), which will allow
us to get the ipc object without acquiring the lock. Just as with
semaphores, these functions are basically wrappers around
ipc_obtain_object*().
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
can be performed without acquiring the ipc object.
Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
out of msgctl(). This change still takes the lock and it will be
properly lockless in the next patch
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.
Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patchset continues the work that began in the sysv ipc semaphore
scaling series, see
https://lkml.org/lkml/2013/3/20/546
Just like semaphores used to be, sysv shared memory and msg queues also
abuse the ipc lock, unnecessarily holding it for operations such as
permission and security checks.
This patchset mostly deals with mqueues, and while shared mem can be
done in a very similar way, I want to get these patches out in the open
first. It also does some pending cleanups, mostly focused on the two
level locking we have in ipc code, taking care of ipc_addid() and
ipcctl_pre_down_nolock() - yes there are still functions that need to be
updated as well.
This patch:
Make all callers explicitly take and release the RCU read lock.
This addresses the two level locking seen in newary(), newseg() and
newqueue(). For the last two, explicitly unlock the ipc object and the
rcu lock, instead of calling the custom shm_unlock and msg_unlock
functions. The next patch will deal with the open coded locking for
->perm.lock
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current test for an attached enabled encoder fails if we have
multiple connectors aliased to the same encoder - both connectors
believe they own the enabled encoder and so we attempt to both enable
and disable DPMS on the encoder, leading to hilarity and an OOPs:
Fixes: igt/kms_flip/dpms-off-confusion Reported-and-tested-by: Wakko Warner <wakko@animx.eu.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68030
Link: http://lkml.kernel.org/r/20130928185023.GA21672@animx.eu.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add regression citation, mention the igt testcase this fixes
and slap a cc: stable on the patch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dynamic_dname() is both too much and too little for those - the
output may be well in excess of 64 bytes dynamic_dname() assumes
to be enough (thanks to ashmem feeding really long names to
shmem_file_setup()) and vsnprintf() is an overkill for those
guys.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Colin Cross <ccross@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
Implement a workaround suggested by Jakub Jelinek.
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131015062351.GA4666@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ARCompact TRAP_S insn used for breakpoints, commits before exception is
taken (updating architectural PC). So ptregs->ret contains next-PC and
not the breakpoint PC itself. This is different from other restartable
exceptions such as TLB Miss where ptregs->ret has exact faulting PC.
gdb needs to know exact-PC hence ARC ptrace GETREGSET provides for
@stop_pc which returns ptregs->ret vs. EFA depending on the
situation.
However, writing stop_pc (SETREGSET request), which updates ptregs->ret
doesn't makes sense stop_pc doesn't always correspond to that reg as
described above.
This was not an issue so far since user_regs->ret / user_regs->stop_pc
had same value and both writing to ptregs->ret was OK, needless, but NOT
broken, hence not observed.
With gdb "jump", they diverge, and user_regs->ret updating ptregs is
overwritten immediately with stop_pc, which this patch fixes.
Previously, when a signal was registered with SA_SIGINFO, parameters 2
and 3 of the signal handler were written to registers r1 and r2 before
the register set was saved. This led to corruption of these two
registers after returning from the signal handler (the wrong values were
restored).
With this patch, registers are now saved before any parameters are
passed, thus maintaining the processor state from before signal entry.
Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and
can only use atomic EX insn (reg with mem) to build higher level R-M-W
primitives. This includes a SystemC based SMP simulation model.
So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange
operation to update reader(s)/writer count.
The spinlock operation itself looks as follows:
mov reg, 1 ; 1=locked, 0=unlocked
retry:
EX reg, [lock] ; load existing, store 1, atomically
BREQ reg, 1, rety ; if already locked, retry
In single-threaded simulation, SystemC alternates between the 2 cores
with "N" insn each based scheduling. Additionally for insn with global
side effect, such as EX writing to shared mem, a core switch is
enforced too.
Given that, 2 cores doing a repeated EX on same location, Linux often
got into a livelock e.g. when both cores were fiddling with tasklist
lock (gdbserver / hackbench) for read/write respectively as the
sequence diagram below shows:
| LTP tests syscalls/process_vm_readv01 and process_vm_writev01 fail
| similarly in one testcase test_iov_invalid -> lvec->iov_base.
| Testcase expects errno EFAULT and return code -1,
| but it gets return code 1 and ERRNO is 0 what means success.
Essentially test case was passing a pointer of -1 which access_ok()
was not catching. It was doing [@addr + @sz <= TASK_SIZE] which would
pass for @addr == -1
Fixed that by rewriting as [@addr <= TASK_SIZE - @sz]