Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Robin Holt <holt@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Robin Holt [Thu, 27 Jun 2013 23:54:18 +0000 (09:54 +1000)]
reboot: arm: change reboot_mode to use enum reboot_mode
Preparing to move the parsing of reboot= to generic kernel code forces the
change in reboot_mode handling to use the enum.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Robin Holt [Thu, 27 Jun 2013 23:54:18 +0000 (09:54 +1000)]
reboot: arm: prepare reboot_mode for moving to generic kernel code
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Robin Holt [Thu, 27 Jun 2013 23:54:18 +0000 (09:54 +1000)]
reboot: arm: remove unused restart_mode fields from some arm subarchs
These restart_mode fields are not used at all. Remove them to make moving
the reboot= cmdline options to the general kernel easier.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Robin Holt [Thu, 27 Jun 2013 23:54:17 +0000 (09:54 +1000)]
reboot: unicore32: prepare reboot_mode for moving to generic kernel code
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: H. Peter Anvin <hpa@zytor.com> Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Robin Holt [Thu, 27 Jun 2013 23:54:17 +0000 (09:54 +1000)]
reboot: x86: prepare reboot_mode for moving to generic kernel code
Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Miguel Boton <mboton.lkml@gmail.com> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Robin Holt [Thu, 27 Jun 2013 23:54:17 +0000 (09:54 +1000)]
reboot: checkpatch.pl the new kernel/reboot.c file
Get the new file to pass scripts/checkpatch.pl
Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Robin Holt [Thu, 27 Jun 2013 23:54:16 +0000 (09:54 +1000)]
reboot: move shutdown/reboot related functions to kernel/reboot.c
This patch is preparatory. It moves reboot related syscall, etc functions
from kernel/sys.c to kernel/reboot.c.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove the prior patch's #define for easier backporting to the stable
releases.
Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kevin Hao [Thu, 27 Jun 2013 23:54:16 +0000 (09:54 +1000)]
kernel/resource.c: remove the unneeded assignment in function __find_resource
This line was introduced by fcb11918 ("resources: add arch hook for
preventing allocation in reserved areas"). But the struct tmp was already
assigned to *new in the above line, so this seems superfluous. Just
remove it.
Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ingo Molnar [Thu, 27 Jun 2013 23:54:15 +0000 (09:54 +1000)]
relay: fix timer madness
When I'm using below ktap script to tracing all event tracepoints, without
this patch, the system will hang in few seconds, the patch indeed fix the
problem as the changelog pointed.
This patch is old, I can found the original patch discussion in 2007.
http://marc.info/?l=linux-kernel&m=118544794717162&w=2 (In that mail
thread, the patch didn't fix that problem, but it fix the problem I
encountered now)
Ingo's original changelog:
Remove timer calls (!!!) from deep within the tracing infrastructure.
This was totally bogus code that can cause lockups and worse.
Poll the buffer every 2 jiffies for now.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
Power-up timing
The DS2408 is sensitive to the power-on slew rate and can inadvertently
power up with a test mode feature enabled. When this occurs, the P0 port
does not respond to the Channel Access Write command. For most reliable
operation, it is recommended to disable the test mode after every power-on
reset using the Disable Test Mode sequence shown below. The 64-bit ROM
code must be transmitted in the same bit sequence as with the Match ROM
command, i.e., least significant bit first. This precaution is
recommended in parasite power mode (VCC pin connected to GND) as well as
with VCC power.
Disable Test Mode:
RST,PD,96h,<64-bit DS2408 ROM Code>,3Ch,RST,PD
Jan Luebbe [Thu, 27 Jun 2013 23:54:13 +0000 (09:54 +1000)]
pps-gpio: add device-tree binding and support
Instead of allocating a struct pps_gpio_platform_data in the DT case,
store the necessary information in struct pps_gpio_device_data itself.
This avoids an additional allocation and the ifdef. It also gets rid of
some indirection.
Also use dev_err instead of pr_err in the changed code.
Signed-off-by: Jan Luebbe <jlu@pengutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rodolfo Giometti <giometti@enneenne.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Paul Clements [Thu, 27 Jun 2013 23:54:11 +0000 (09:54 +1000)]
nbd: correct disconnect behavior
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes). This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.
This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect. This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Acked-by: Rob Landley <rob@landley.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Belczyk [Thu, 27 Jun 2013 23:54:11 +0000 (09:54 +1000)]
nbd: remove bogus BUG_ON in NBD_CLEAR_QUE
The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job
is now done by two other ioctls). We should stop trying to make bogus
assertions in it. Also, user-level code should remove calls to
NBD_CLEAR_QUE, ASAP.
Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl> Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wu Fengguang [Thu, 27 Jun 2013 23:54:10 +0000 (09:54 +1000)]
drivers/rapidio/rio-scan.c: make functions static
sparse warnings:
drivers/rapidio/rio-scan.c:1143:5: sparse: symbol 'rio_enum_mport' was not declared. Should it be static?
drivers/rapidio/rio-scan.c:1246:5: sparse: symbol 'rio_disc_mport' was not declared. Should it be static?
Remove the driver for Tsi500 Parallel RapidIO switch because this device
has not been available for several years. Since the first introduction of
Tsi500, the parallel RapidIO interface was replaced by the serial RapidIO
(sRIO) and therefore there is no value in keeping this driver.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
partitions/msdos: enumerate also AIX LVM partitions
Graft AIX partitions enumeration into partitions/msdos.c
There is already a AIX disks detection logic in msdos.c. When an AIX disk
has been found, and if configured to, call the aix partitions recognizer.
This avoids removal of AIX disks protection from msdos.c, avoids code
duplication, and ensures that AIX partitions enumeration is called before
plain msdos partitions enumeration.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
partitions-add-aix-lvm-partition-support-files: add the AIX_PARTITION entry
This is the final patch enabling a user to select AIX lvm partitions
detection.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
partitions-add-aix-lvm-partition-support-files: compile aix.c if configured
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ERROR: spaces required around that '+=' (ctx:WxV)
#137: FILE: block/partitions/aix.c:113:
+ totalreadcount +=copied;
^
ERROR: do not use assignment in if condition
#235: FILE: block/partitions/aix.c:211:
+ if (vgda_sector && (d = read_part_sector(state, vgda_sector, §))) {
ERROR: do not use assignment in if condition
#244: FILE: block/partitions/aix.c:220:
+ if (numlvs && (d = read_part_sector(state, vgda_sector + 1, §))) {
WARNING: line over 80 characters
#252: FILE: block/partitions/aix.c:228:
+ for (i = 0; foundlvs < numlvs && i < state->limit; i += 1) {
WARNING: line over 80 characters
#294: FILE: block/partitions/aix.c:270:
+ (i + 1 - lp_ix) * pp_blocks_size + psn_part1,
WARNING: line over 80 characters
#295: FILE: block/partitions/aix.c:271:
+ lvip[lv_ix].pps_per_lv * pp_blocks_size);
WARNING: line over 80 characters
#296: FILE: block/partitions/aix.c:272:
+ snprintf(tmp, sizeof(tmp), " <%s>\n", n[lv_ix].name);
WARNING: printk() should include KERN_ facility level
#306: FILE: block/partitions/aix.c:282:
+ printk("partition %s (%u pp's found) is not contiguous\n",
WARNING: kfree(NULL) is safe this check is probably not required
#311: FILE: block/partitions/aix.c:287:
+ if (n)
+ kfree(n);
total: 5 errors, 9 warnings, 291 lines checked
NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
scripts/cleanfile
./patches/partitions-add-aix-lvm-partition-support-files.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix a problem in the discovering of small (1 pp) partitions in presence of
discontiguous partitions.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
AIX LVM permits to make "logical volumes" which are made of multiple
slices of multiple disks. The new code allows only access to the "logical
volumes" which are made of one slice on the probed disk, a slice being a
contiguous disk area. The code also detects "logical volumes" made of
multiple slices on the probed disk, but can not describe them to the
partition layer, because the partition layer generic code does not support
that. When such non-contiguous "logical volumes" are detected, a
diagnostic message is printed.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
partitions/msdos.c: end-of-line whitespace and semicolon cleanup
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dan Carpenter [Thu, 27 Jun 2013 23:54:07 +0000 (09:54 +1000)]
mwave: fix info leak in mwave_ioctl()
Smatch complains that on 64 bit systems, there is a hole in the
MW_ABILITIES struct between ->component_count and ->component_list[]. It
leaks stack information from the mwave_ioctl() function.
I've added a memset() to initialize the struct to zero.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg KH <greg@kroah.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Manfred Spraul [Thu, 27 Jun 2013 23:54:07 +0000 (09:54 +1000)]
ipc/sem.c: replace shared sem_otime with per-semaphore value
sem_otime contains the time of the last semaphore operation that completed
successfully. Every operation updates this value, thus access from
multiple cpus can cause thrashing.
Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.
No performance improvement on a single-socket i3 - only important
for larger systems.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Manfred Spraul [Thu, 27 Jun 2013 23:54:06 +0000 (09:54 +1000)]
ipc/sem.c: always use only one queue for alter operations
There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.
Since one of the queues must be processed first, this causes an odd
priorization of the wakeups:
Right now, complex operations have priority over simple ops.
The patch restores the behavior of linux <=3.0.9: The longest
waiting operation has the highest priority.
This is done by using only one queue:
- if there are complex ops, then sma->pending_alter is used.
- otherwise, the per-semaphore queues are used.
As a side effect, do_smart_update_queue() becomes much simpler:
No more goto logic.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Manfred Spraul [Thu, 27 Jun 2013 23:54:06 +0000 (09:54 +1000)]
ipc/sem: separate wait-for-zero and alter tasks into seperate queues
Introduce separate queues for operations that do not modify the semaphore
values. Advantages:
- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations
are always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux <=3.0.9
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
use first the full cores. HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.
Rationale:
The SysV sem code tries to move the main spinlock into a seperate cacheline
(____cacheline_aligned_in_smp). This works only if ipc_rcu_alloc returns
cacheline aligned pointers.
vmalloc and kmalloc return cacheline algined pointers, the implementation
of ipc_rcu_alloc breaks that.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Davidlohr Bueso [Thu, 27 Jun 2013 23:54:03 +0000 (09:54 +1000)]
ipc,msg: make msgctl_nolock lockless
While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
it unnecessarily. We can do the permissions and security checks only
holding the rcu lock.
This function now mimics semctl_nolock().
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Davidlohr Bueso [Thu, 27 Jun 2013 23:54:03 +0000 (09:54 +1000)]
ipc,msg: introduce lockless functions to obtain the ipc object
Add msq_obtain_object() and msq_obtain_object_check(), which will allow us
to get the ipc object without acquiring the lock. Just as with
semaphores, these functions are basically wrappers around
ipc_obtain_object*().
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Davidlohr Bueso [Thu, 27 Jun 2013 23:54:02 +0000 (09:54 +1000)]
ipc,msg: introduce msgctl_nolock
Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands can
be performed without acquiring the ipc object.
Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT out
of msgctl(). This change still takes the lock and it will be properly
lockless in the next patch
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Davidlohr Bueso [Thu, 27 Jun 2013 23:54:02 +0000 (09:54 +1000)]
ipc: move locking out of ipcctl_pre_down_nolock
This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.
Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The issue was caused because we were allocating memory in GFP_KERNEL
context after calling rcu_read_lock. This patch restores the
rcu_read_lock call into ipc_addid() and thus maintains the original
behavior.
Davidlohr Bueso [Thu, 27 Jun 2013 23:54:01 +0000 (09:54 +1000)]
ipc: move rcu lock out of ipc_addid
This patchset continues the work that began in the sysv ipc semaphore
scaling series: https://lkml.org/lkml/2013/3/20/546
Just like semaphores used to be, sysv shared memory and msg queues also
abuse the ipc lock, unnecessarily holding it for operations such as
permission and security checks. This patchset mostly deals with mqueues,
and while shared mem can be done in a very similar way, I want to get
these patches out in the open first. It also does some pending cleanups,
mostly focused on the two level locking we have in ipc code, taking care
of ipc_addid() and ipcctl_pre_down_nolock() - yes there are still
functions that need to be updated as well.
This patch:
Make all callers explicitly take and release the RCU read lock.
This addresses the two level locking seen in newary(), newseg() and
newqueue(). For the last two, explicitly unlock the ipc object and the
rcu lock, instead of calling the custom shm_unlock and msg_unlock
functions. The next patch will deal with the open coded locking for
->perm.lock
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jean Delvare [Thu, 27 Jun 2013 23:54:00 +0000 (09:54 +1000)]
idr: print a stack dump after ida_remove warning
We print a dump stack after idr_remove warning. This is useful to find
the faulty piece of code. Let's do the same for ida_remove, as it would
be equally useful there.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:53:59 +0000 (09:53 +1000)]
s390: remove setting for saved_max_pfn
The only user of saved_max_pfn in s390 is read_oldmem interface but we
have removed that interface, so saved_max_pfn is now unneeded in s390, and
we needn't set it anymore.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:53:59 +0000 (09:53 +1000)]
ia64: remove setting for saved_max_pfn
The only user of saved_max_pfn in ia64 is read_oldmem interface but we
have removed that interface, so saved_max_pfn is now unneeded in ia64, and
we needn't set it anymore.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave@sr71.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:53:59 +0000 (09:53 +1000)]
powerpc: Remove savemaxmem parameter setup
saved_max_pfn is used to know the amount of memory that the previous
kernel used. And for powerpc, we set saved_max_pfn by passing the kernel
commandline parameter "savemaxmem=".
The only user of saved_max_pfn in powerpc is read_oldmem interface. Since
we have removed read_oldmem, we don't need this parameter anymore.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:53:58 +0000 (09:53 +1000)]
mips: remove savemaxmem parameter setup
saved_max_pfn is used to know the amount of memory that the previous
kernel used. And for powerpc, we set saved_max_pfn by passing the kernel
commandline parameter "savemaxmem=".
The only user of saved_max_pfn in mips is read_oldmem interface. Since we
have removed read_oldmem, so we don't need this parameter anymore.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zhang Yanfei [Thu, 27 Jun 2013 23:53:57 +0000 (09:53 +1000)]
/dev/oldmem: Remove the interface
/dev/oldmem provides the interface for us to access the "old memory" in
the dump-capture kernel. Unfortunately, no one actually uses this
interface.
And this interface could actually cause some real problems if used on ia64
where the cached/uncached accesses are mixed. See the discussion from the link: https://lkml.org/lkml/2013/4/12/386.
So Eric suggested that we should remove /dev/oldmem as an unused piece of
code.
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Hansen <dave@sr71.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:57 +0000 (09:53 +1000)]
wait: introduce prepare_to_wait_event()
Add the new helper, prepare_to_wait_event() which should only be used by
wait_event_common/etc.
prepare_to_wait_event() returns -ERESTARTSYS if signal_pending_state() is
true, otherwise it calls prepare_to_wait(). This allows to uninline the
signal-pending checks in wait_event_*.
Also, it can initialize wait->private/func. We do not care they were
already initialized, the values are the same. This also shaves a couple
of insns from the inlined code.
Unlike the previous change, this patch "reliably" shrinks the size of
generated code for every wait_event*() call,
1. 4c663cfc "fix false timeouts when using wait_event_timeout()"
is not enough, wait(wq, true, 0) still returns zero.
__wait_event_timeout() was already fixed but we need the same
logic in wait_event_timeout() if the fast-path check succeeds.
2. wait_event_timeout/__wait_event_timeout interface do not match
wait_event(), you can't use __wait_event_timeout() instead of
wait_event_timeout() if you do not need the fast-path check.
Same for wait_event_interruptible/__wait_event_interruptible,
so this patch cleanups rtlx.c, ip_vs_sync.c, and af_irda.c:
- __wait_event_interruptible(wq, cond, ret);
+ ret = __wait_event_interruptible(wq, cond);
3. wait_event_* macros duplicate the same code.
This patch adds a single helper wait_event_common() which hopefully
does everything right. Compiler optimizes out the "dead" code when
we do not need signal_pending/schedule_timeout.
In particular, wait_even_timeout(true, non_const_timeout) should
generate more code in the non-void context because the patch adds
the additional code to fix the 1st problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Imre Deak <imre.deak@intel.com> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:56 +0000 (09:53 +1000)]
fs/exec.c:de_thread: mt-exec should update ->real_start_time
924b42d5 ("Use boot based time for process start time and boot time in
/proc") updated copy_process/do_task_stat but forgot about de_thread().
This breaks "ps axOT" if a sub-thread execs.
Note: I think that task->start_time should die.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:56 +0000 (09:53 +1000)]
kernel/fork.c:copy_process(): consolidate the lockless CLONE_THREAD checks
copy_process() does a lot of "chaotic" initializations and checks
CLONE_THREAD twice before it takes tasklist. In particular it sets
"p->group_leader = p" and then changes it again under tasklist if
!thread_group_leader(p).
This looks a bit confusing, lets create a single "if (CLONE_THREAD)" block
which initializes ->exit_signal, ->group_leader, and ->tgid.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:56 +0000 (09:53 +1000)]
kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists
copy_process() adds the new child to thread_group/init_task.tasks list and
then does attach_pid(child, PIDTYPE_PID). This means that the lockless
next_thread() or next_task() can see this thread with the wrong pid. Say,
"ls /proc/pid/task" can list the same inode twice.
We could move attach_pid(child, PIDTYPE_PID) up, but in this case
find_task_by_vpid() can find the new thread before it was fully
initialized.
And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch
copy_process() initializes child->pids[*].pid first, then calls
attach_pid() to insert the task into the pid->tasks list.
attach_pid() no longer need the "struct pid*" argument, it is always
called after pid_link->pid was already set.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move the "if (clone_flags & CLONE_THREAD)" code down under "if
(likely(p->pid))" and turn it into into the "else" branch. This makes the
process/thread initialization more symmetrical and removes one check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eric Paris [Thu, 27 Jun 2013 23:53:55 +0000 (09:53 +1000)]
fork: reorder permissions when violating number of processes limits
When a task is attempting to violate the RLIMIT_NPROC limit we have a
check to see if the task is sufficiently priviledged. The check first
looks at CAP_SYS_ADMIN, then CAP_SYS_RESOURCE, then if the task is uid=0.
A result is that tasks which are allowed by the uid=0 check are first
checked against the security subsystem. This results in the security
subsystem auditting a denial for sys_admin and sys_resource and then the
task passing the uid=0 check.
This patch rearranges the code to first check uid=0, since if we pass that
we shouldn't hit the security system at all. We then check sys_resource,
since it is the smallest capability which will solve the problem. Lastly
we check the fallback everything cap_sysadmin. We don't want to give this
capability many places since it is so powerful.
This will eliminate many of the false positive/needless denial messages we
get when a root task tries to violate the nproc limit. (note that
kthreads count against root, so on a sufficiently large machine we can
actually get past the default limits before any userspace tasks are
launched.)
Signed-off-by: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:54 +0000 (09:53 +1000)]
fs/proc/uptime.c:uptime_proc_show(): use get_monotonic_boottime()
Change uptime_proc_show() to use get_monotonic_boottime() instead of
do_posix_clock_monotonic_gettime() + monotonic_to_bootbased().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:53 +0000 (09:53 +1000)]
fs/exec.c:de_thread(): use change_pid() rather than detach_pid/attach_pid
de_thread() can use change_pid() instead of detach + attach. This looks
better and this ensures that, say, next_thread() can never see a task with
->pid == NULL.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:52 +0000 (09:53 +1000)]
coredump: kill call_count, add core_name_size
Imho, "atomic_t call_count" is ugly and should die. It buys nothing and
in fact it can grow more than necessary, expand doesn't check if it was
already incremented by another task.
Kill it, and introduce "static int core_name_size" updated by
expand_corename(). This is obviously racy too but harmless, and
core_name_size never grows for no reason.
We do not bother to to calculate the "right" new size, we simply do
kmalloc(size_we_need) and use ksize() to rely on kmalloc_index's decision.
Finally change format_corename() to use expand_corename(), krealloc(NULL)
is fine.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The usage of cn_escape() looks really annoying, imho this sequence needs a
wrapper. And it is buggy. If cn_printf() does expand_corename()
cn_escape() writes to the freed memory.
Introduce cn_esc_printf() which hopefully does this all right. It records
the index before cn_vprintf(), not "char *" which is no longer valid (in
general) after krealloc().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:52 +0000 (09:53 +1000)]
coredump: cn_vprintf() has no reason to call vsnprintf() twice
cn_vprintf() looks really overcomplicated and sub-optimal. We do not need
vsnprintf(NULL) to calculate the size we need, we can simply try to print
into the current buffer and expand/retry only if necessary.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:51 +0000 (09:53 +1000)]
coredump: format_corename() can leak cn->corename
do_coredump() assumes that format_corename() can only fail if
expand_corename() fails and frees cn->corename. This is not true, for
example cn_print_exe_file() can fail and in this case nobody frees
cn->corename.
Change do_coredump() to always do kfree(cn->corename) after it calls
format_corename() (NULL is fine), change expand_corename() to do nothing
if kmalloc() fails.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:51 +0000 (09:53 +1000)]
usermodehelper: kill the sub_info->path[0] check
call_usermodehelper_exec() does nothing but returns success if path[0] ==
0. The only user which needs this strange feature is request_module(), it
can check modprobe_path[0] itself like other users do if they want to
detect the "disabled by admin" case.
Kill it. Not only it looks strange, it can confuse other callers. And
this allows us to revert 264b83c0 ("usermodehelper: check
subprocess_info->path != NULL"), do_execve(NULL) is safe.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:50 +0000 (09:53 +1000)]
signals: eventpoll: set ->saved_sigmask at the start
task_struct->saved_sigmask has no meaning unless we return with
set_restore_sigmask() and nobody except current can use it.
This means that sys_epoll_pwait() doesn't need to save ->blocked
into the local var and then memcopy it into ->saved_sigmask, we
can simply set ->saved_sigmask right before set_current_blocked().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:50 +0000 (09:53 +1000)]
signals: eventpoll: do not use sigprocmask()
sigprocmask() should die. None of the current callers actually
need this strange interface.
Change fs/eventpoll.c to use set_current_blocked(). This also
means we should not worry about SIGKILL/SIGSTOP.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Vagin [Thu, 27 Jun 2013 23:53:49 +0000 (09:53 +1000)]
ptrace: add ability to get/set signal-blocked mask
crtools uses a parasite code for dumping processes. The parasite code is
injected into a process with help PTRACE_SEIZE.
Currently crtools blocks signals from a parasite code. If a process has
pending signals, crtools wait while a process handles these signals.
This method is not suitable for stopped tasks. A stopped task can have a
few pending signals, when we will try to execute a parasite code, we will
need to drop SIGSTOP, but all other signals must remain pending, because a
state of processes must not be changed during checkpointing.
This patch adds two ptrace commands to set/get signal-blocked mask.
I think gdb can use this commands too.
Signed-off-by: Andrey Vagin <avagin@openvz.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:49 +0000 (09:53 +1000)]
ptrace/x86: flush_ptrace_hw_breakpoint() shoule clear the virtual debug registers
flush_ptrace_hw_breakpoint() destroys the counters set by ptrace, but
"leaks" ->debugreg6 and ->ptrace_dr7.
The problem is minor, but still it doesn't look right and flush_thread()
did this until 66cb5917 ("hw-breakpoints: use the new wrapper routines to
access debug registers in process/thread code"). Now that PTRACE_DETACH
does flush_ too this makes even more sense.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:49 +0000 (09:53 +1000)]
ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child)
Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child). This
frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
ensures that the tracee won't be killed by SIGTRAP triggered by the active
breakpoints.
Test-case:
unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
{
unsigned long dr7;
was perfectly valid, now PTRACE_POKEUSER(DR7) fails if DR0 was not
previously initialized by PTRACE_POKEUSER(DR0).
Change ptrace_write_dr7() to do ptrace_register_breakpoint(addr => 0) if
!bp && !disabled. This fixes watchpoint-zeroaddr from ptrace-tests, see
https://bugzilla.redhat.com/show_bug.cgi?id=660204.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:47 +0000 (09:53 +1000)]
ptrace/x86: dont delay "disable" till second pass in ptrace_write_dr7()
ptrace_write_dr7() skips ptrace_modify_breakpoint(disabled => true) unless
second_pass, this buys nothing but complicates the code and means that we
always do the main loop twice even if "disabled" was never true.
The comment says:
Don't unregister the breakpoints right-away,
unless all register_user_hw_breakpoint()
requests have succeeded.
Firstly, we do not do register_user_hw_breakpoint(), it was removed by 24f1e32c ("hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf
events").
We are going to restore register_user_hw_breakpoint() (see the next patch)
but this doesn't matter, after 44234adc "hw-breakpoints: Modify
breakpoints without unregistering them" perf_event_disable() can not hurt,
hw_breakpoint_del() does not free the slot.
Remove the "second_pass" check from the main loop and simplify the code.
Since we have to check "bp != NULL" anyway, the patch also removes the
same check in ptrace_modify_breakpoint() and moves the comment into
ptrace_write_dr7().
With this patch the second pass is only needed to restore the saved
old_dr7. This should never fail, so the patch adds WARN_ON() to catch the
potential problems as Frederic suggested.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:47 +0000 (09:53 +1000)]
ptrace/x86: simplify the "disable" logic in ptrace_write_dr7()
ptrace_write_dr7() looks unnecessarily overcomplicated. We can factor out
ptrace_modify_breakpoint() and do not do "continue" twice, just we need to
pass the proper "disabled" argument to ptrace_modify_breakpoint().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 27 Jun 2013 23:53:47 +0000 (09:53 +1000)]
ptrace: revert "Prepare to fix racy accesses on task breakpoints"
This reverts commit bf26c018490c2fce ("Prepare to fix racy accesses on
task breakpoints").
The patch was fine but we can no longer race with SIGKILL after 9899d11f
("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL"),
the __TASK_TRACED tracee can't be woken up and ->ptrace_bps[] can't go
away.
Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers, we
can kill them and remove task->ptrace_bp_refcnt.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>