Oleg Nesterov [Fri, 3 Jan 2014 03:10:31 +0000 (14:10 +1100)]
exec:check_unsafe_exec: kill the dead -EAGAIN and clear_in_exec logic
fs_struct->in_exec == T means that this ->fs is used by a single process
(thread group), and one of the treads does do_execve().
To avoid the mt-exec races this code has the following complications:
1. check_unsafe_exec() returns -EBUSY if ->in_exec was
already set by another thread.
2. do_execve_common() records "clear_in_exec" to ensure
that the error path can only clear ->in_exec if it was
set by current.
However, after 9b1bf12d5d51 "signals: move cred_guard_mutex from
task_struct to signal_struct" we do not need these complications:
1. We can't race with our sub-thread, this is called under
per-process ->cred_guard_mutex. And we can't race with
another CLONE_FS task, we already checked that this fs
is not shared.
We can remove the dead -EAGAIN logic.
2. "out_unmark:" in do_execve_common() is either called
under ->cred_guard_mutex, or after de_thread() which
kills other threads, so we can't race with sub-thread
which could set ->in_exec. And if ->fs is shared with
another process ->in_exec should be false anyway.
We can clear in_exec unconditionally.
This also means that check_unsafe_exec() can be void.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Fri, 3 Jan 2014 03:10:30 +0000 (14:10 +1100)]
exec:check_unsafe_exec: use while_each_thread() rather than next_thread()
next_thread() should be avoided, change check_unsafe_exec() to use
while_each_thread().
Nobody except signal->curr_target actually needs next_thread-like code,
and we need to change (fix) this interface. This particular code is fine,
p == current. But in general the code like this can loop forever if p
exits and next_thread(t) can't reach the unhashed thread.
This also saves 32 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Daeseok Youn [Fri, 3 Jan 2014 03:10:30 +0000 (14:10 +1100)]
kernel/fork.c: remove redundant NULL check in dup_mm()
current->mm doesn't need a NULL check in dup_mm(). Becasue dup_mm() is
used only in copy_mm() and current->mm is checked whether it is NULL or
not in copy_mm() before calling dup_mm().
Oleg Nesterov [Fri, 3 Jan 2014 03:10:29 +0000 (14:10 +1100)]
proc: fix ->f_pos overflows in first_tid()
1. proc_task_readdir()->first_tid() path truncates f_pos to int, this
is wrong even on 64bit.
We could check that f_pos < PID_MAX or even INT_MAX in
proc_task_readdir(), but this patch simply checks the potential
overflow in first_tid(), this check is nop on 64bit. We do not care if
it was negative and the new unsigned value is huge, all we need to
ensure is that we never wrongly return !NULL.
2. Remove the 2nd "nr != 0" check before get_nr_threads(),
nr_threads == 0 is not distinguishable from !pid_task() above.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Sameer Nanda <snanda@chromium.org> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Fri, 3 Jan 2014 03:10:29 +0000 (14:10 +1100)]
proc: don't (ab)use ->group_leader in proc_task_readdir() paths
proc_task_readdir() does not really need "leader", first_tid() has to
revalidate it anyway. Just pass proc_pid(inode) to first_tid() instead,
it can do pid_task(PIDTYPE_PID) itself and read ->group_leader only if
necessary.
The patch also extracts the "inode is dead" code from
pid_delete_dentry(dentry) into the new trivial helper,
proc_inode_is_dead(inode), proc_task_readdir() uses it to return -ENOENT
if this dir was removed.
This is a bit racy, but the race is very inlikely and the getdents() after
openndir() can see the empty "." + ".." dir only once.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Sameer Nanda <snanda@chromium.org> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Fri, 3 Jan 2014 03:10:29 +0000 (14:10 +1100)]
proc: change first_tid() to use while_each_thread() rather than next_thread()
Rerwrite the main loop to use while_each_thread() instead of
next_thread(). We are going to fix or replace while_each_thread(),
next_thread() should be avoided whenever possible.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Sameer Nanda <snanda@chromium.org> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Fri, 3 Jan 2014 03:10:28 +0000 (14:10 +1100)]
proc: fix the potential use-after-free in first_tid()
proc_task_readdir() verifies that the result of get_proc_task() is
pid_alive() and thus its ->group_leader is fine too. However this is not
necessarily true after rcu_read_unlock(), we need to recheck this again
after first_tid() does rcu_read_lock(). Otherwise
leader->thread_group.next (used by next_thread()) can be invalid if the
rcu grace period expires in between.
The race is subtle and unlikely, but still it is possible afaics. To
simplify lets ignore the "likely" case when tid != 0, f_version can be
cleared by proc_task_operations->llseek().
Suppose we have a main thread M and its subthread T. Suppose that f_pos
== 3, iow first_tid() should return T. Now suppose that the following
happens between rcu_read_unlock() and rcu_read_lock():
1. T execs and becomes the new leader. This removes M from
->thread_group but next_thread(M) is still T.
2. T creates another thread X which does exec as well, T
goes away.
3. X creates another subthread, this increments nr_threads.
4. first_tid() does next_thread(M) and returns the already
dead T.
Note also that we need 2. and 3. only because of get_nr_threads() check,
and this check was supposed to be optimization only.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Sameer Nanda <snanda@chromium.org> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
get_task_state() and task_state_array[] look confusing and suboptimal, it
is not clear what it can actually report to user-space and
task_state_array[] blows .data for no reason.
1. state = (tsk->state & TASK_REPORT) | tsk->exit_state is not
clear. TASK_REPORT is self-documenting but it is not clear
what ->exit_state can add.
Move the potential exit_state's (EXIT_ZOMBIE and EXIT_DEAD)
into TASK_REPORT and use it to calculate the final result.
2. With the change above it is obvious that task_state_array[]
has the unused entries just to make BUILD_BUG_ON() happy.
Change this BUILD_BUG_ON() to use TASK_REPORT rather than
TASK_STATE_MAX and shrink task_state_array[].
3. Turn the "while (state)" loop into fls(state).
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Fri, 3 Jan 2014 03:10:28 +0000 (14:10 +1100)]
coredump: set_dumpable: fix the theoretical race with itself
set_dumpable() updates MMF_DUMPABLE_MASK in a non-trivial way to ensure
that get_dumpable() can't observe the intermediate state, but this all
can't help if multiple threads call set_dumpable() at the same time.
And in theory commit_creds()->set_dumpable(SUID_DUMP_ROOT) racing with
sys_prctl()->set_dumpable(SUID_DUMP_DISABLE) can result in SUID_DUMP_USER.
Change this code to update both bits atomically via cmpxchg().
Note: this assumes that it is safe to mix bitops and cmpxchg. IOW, if,
say, an architecture implements cmpxchg() using the locking (like
arch/parisc/lib/bitops.c does), then it should use the same locks for
set_bit/etc.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alex Kelly <alex.page.kelly@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Vasily Kulikov <segoon@openwall.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kmod: run usermodehelpers only on cpus allowed for kthreadd V2
usermodehelper() threads can currently run on all processors. This is an
issue for low latency cores. Spawnig a new thread causes cpu holdoffs in
the range of hundreds of microseconds to a few milliseconds. Not good for
cores on which processes run that need to react as fast as possible.
kthreadd threads can be restricted using taskset to a limited set of
processors. Then the kernel thread pool will not fork processes on those
anymore thereby protecting those processors from additional latencies.
Make usermodehelper() threads obey the limitations that kthreadd is
restricted to. Kthreadd is not the parent of usermodehelper threads so we
need to explicitly get the allowed processors for kthreadd.
Before this patch there is no way to limit the cpus that usermodehelper
can run on since the affinity is set when the thread is spawned to all
processors.
Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-ramdisk_blocksize doesn't exist anymore
-Module parameters added to documentation
Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namjae Jeon [Fri, 3 Jan 2014 03:10:25 +0000 (14:10 +1100)]
fat: fallback to buffered write in case of fallocatded region on direct IO
For normal cases of direct IO write, trying to seek to location greater
than file size, makes it fall back to buffered write to fill that region.
Similarly, in case for write in Fallocated region, make it fall to
buffered write.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namjae Jeon [Fri, 3 Jan 2014 03:10:25 +0000 (14:10 +1100)]
fat: zero out seek range on _fat_get_block
For normal buffered write operations, normally if we try to write to an
offset > than file size, it does a cont_expand_zero till that offset.
Now, in case of fallocated regions, since the blocks are already
allocated. So, make it zero out that buffers for those blocks till the
seek'ed offset.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namjae Jeon [Fri, 3 Jan 2014 03:10:25 +0000 (14:10 +1100)]
fat: add fat_fallocate operation
Implement preallocation via the fallocate syscall on VFAT partitions.
This patch is based on an earlier patch of the same name which had some
issues detailed below and did not get accepted. Refer
https://lkml.org/lkml/2007/12/22/130.
a) The preallocated space was not persistent when the
FALLOC_FL_KEEP_SIZE flag was set. It will deallocate cluster at evict time.
b) There was no need to zero out the clusters when the flag was set
Instead of doing an expanding truncate, just allocate clusters and add
them to the fat chain. This reduces preallocation time.
Compatibility with windows:
There are no issues when FALLOC_FL_KEEP_SIZE is not set
because it just does an expanding truncate. Thus reading from the
preallocated area on windows returns null until data is written to it.
When a file with preallocated area using the FALLOC_FL_KEEP_SIZE was
written to on windows, the windows driver freed-up the preallocated
clusters and allocated new clusters for the new data. The freed up
clusters gets reflected in the free space available for the partition
which can be seen from the Volume properties.
The windows chkdsk tool also does not report any errors on a
disk containing files with preallocated space.
And there is also no issue using linux fat fsck.
because discard preallocated clusters at repair time.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sougata Santra [Fri, 3 Jan 2014 03:10:25 +0000 (14:10 +1100)]
HFS+ resource fork lookup breaks opendir() library function. Since
opendir first calls open() with O_DIRECTORY flag set. O_DIRECTORY means
"refuse to open if not a directory". The open system call in the kernel
does a check for inode->i_op->lookup and returns -ENOTDIR. So if
hfsplus_file_lookup is set it allows opendir() for plain files.
Also resource fork lookup in HFS+ does not work. Since it is never
invoked after VFS permission checking. It will always return with
-EACCES.
When we call opendir() on a file, it does not return NULL. opendir()
library call is based on open with O_DIRECTORY flag passed and then
layered on top of getdents() system call. O_DIRECTORY means "refuse to
open if not a directory".
The open() system call in the kernel does a check for: do_sys_open()
-->..--> can_lookup() i.e it only checks inode->i_op->lookup and returns
ENOTDIR if this function pointer is not set.
In OSX, we can open "file/rsrc" to get the resource fork of "file". This
behavior is emulated inside hfsplus on Linux, which means that to some
degree every file acts like a directory. That is the reason lookup()
inode operations is supported for files, and it is possible to do a lookup
on this specific name. As a result of this open succeeds without
returning ENOTDIR for HFS+
Please see the LKML discussion thread on this issue:
http://marc.info/?l=linux-fsdevel&m=122823343730412&w=2
I tried to test file/rsrc lookup in HFS+ driver and the feature does not
work. From OSX:
$ touch test
$ echo "1234" > test/..namedfork/rsrc
$ ls -l test..namedfork/rsrc
--rw-r--r-- 1 tuxera staff 5 10 dec 12:59 test/..namedfork/rsrc
[sougata@ultrabook tmp]$ id
uid=1000(sougata) gid=1000(sougata) groups=1000(sougata),5(tty),18(dialout),1001(vboxusers)
[sougata@ultrabook tmp]$ mount
/dev/sdb1 on /mnt/tmp type hfsplus (rw,relatime,umask=0,uid=1000,gid=1000,nls=utf8)
I guess now that permission checking happens in vfs generic_permission() ?
So it turns out that even though the lookup() inode_operation exists for
HFS+ files. It cannot really get invoked ?. So if we can disable this
feature to make opendir() work for HFS+.
Signed-off-by: Sougata Santra <sougata@tuxera.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dmitry Monakhov [Fri, 3 Jan 2014 03:10:24 +0000 (14:10 +1100)]
fs/pipe.c: skip file_update_time on frozen fs
Pipe has no data associated with fs so it is not good idea to block
pipe_write() if FS is frozen, but we can not update file's time on such
filesystem. Let's use same idea as we use in touch_time().
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If hpet_register_irq_handler() fails, cmos_do_probe() will incorrectly
return 0.
Reported-by: Julia Lawall <julia.lawall@lip6.fr> Cc: John Stultz <john.stultz@linaro.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rob Herring <robh+dt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
v2: Explicitly check for both dev->of_node and dev->parent->of_node.
This covers the MFD case, without the MFD core having to set
child MFD devices' of_node pointer to the same node as the top-
level MFD device, which causes problems such as:
http://www.spinics.net/lists/arm-kernel/msg295854.html
Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stephen Warren [Fri, 3 Jan 2014 03:10:24 +0000 (14:10 +1100)]
rtc: honor device tree /alias entries when assigning IDs
Assign RTC device IDs based on device tree /aliases entries if present,
falling back to the existing numbering scheme if there is no /aliases
entry (which includes when the system isn't booted using DT), or there is
a numbering conflict.
This is useful in systems with multiple RTC devices, to ensure that the
best RTC device is selected as /dev/rtc0, which provides the overall
system time.
For example, Tegra has an on-SoC RTC that is not battery backed, typically
coupled with an off-SoC RTC that is battery backed. Only the latter is
useful for populating the system time, yet the former is useful e.g. for
wakeup timing, since the time is not lost when the system is sleeps.
Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drivers/rtc/Kconfig: disable RTC_DRV_CMOS on Atari
On ARAnyM (emulating an Atari Falcon, which doesn't have an RTC IRQ, as
the Second Multi Function Peripheral MFP 68901 is available on Atari TT
only), rtc-cmos doesn't work well:
- The date is of by 32 years (2045 instead of 2013):
rtc_cmos rtc_cmos: setting system clock to 2045-12-02 10:56:17 UTC
(2395824977)
- The hwclock utility doesn't work:
hwclock: ioctl() to /dev/rtc to turn on update interrupts failed
unexpectedly, errno=5: Input/output error.
As rtc-generic works fine for the RTC part, and nvram works for the NVRAM
part, we'll continue on using that.
Heiko Stuebner [Fri, 3 Jan 2014 03:10:23 +0000 (14:10 +1100)]
rtc: hym8563: include clkout code only if COMMON_CLK active
The contents of clk-provide.h, struct clk_hw etc, are only available if
CONFIG_COMMON_CLK is selected. Therefore IS_ENABLED(COMMON_CLK) is not
sufficient and real preprocessor conditions are necessary to keep the code
in question from being compiled on non-COMMON_CLK systems.
Signed-off-by: Heiko Stuebner <heiko@sntech.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Stuebner [Fri, 3 Jan 2014 03:10:23 +0000 (14:10 +1100)]
rtc: add hym8563 rtc-driver
The Haoyu Microelectronics HYM8563 provides rtc and alarm functions as
well as a clock output of up to 32kHz.
Signed-off-by: Heiko Stuebner <heiko@sntech.de> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Ian Campbell <ijc+devicetree@hellion.org.uk> Cc: Grant Likely <grant.likely@linaro.org> Cc: Mike Turquette <mturquette@linaro.org> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Stuebner [Fri, 3 Jan 2014 03:10:22 +0000 (14:10 +1100)]
dt-bindings: add hym8563 binding
Add binding documentation for the hym8563 rtc chip.
Signed-off-by: Heiko Stuebner <heiko@sntech.de> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Ian Campbell <ijc+devicetree@hellion.org.uk> Cc: Grant Likely <grant.likely@linaro.org> Cc: Mike Turquette <mturquette@linaro.org> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch allows the driver to be enabled with devicetree.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The driver core clears the driver data to NULL after device_release or on
probe failure. Thus, it is not needed to manually clear the device driver
data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ian Kent [Fri, 3 Jan 2014 03:10:20 +0000 (14:10 +1100)]
autofs: fix symlinks aren't checked for expiry
The autofs4 module doesn't consider symlinks for expire as it did in the
older autofs v3 module (so it's actually a long standing regression).
The user space daemon has focused on the use of bind mounts instead of
symlinks for a long time now and that's why this has not been noticed.
But with the future addition of amd map parsing to automount(8), not to
mention amd itself (of am-utils), symlink expiry will be needed.
The direct and offset mount types can't be symlinks and the tree mounts of
version 4 were always real mounts so only indirect mounts need expire
symlinks.
Since the current users of the autofs4 module haven't reported this as a
problem to date this patch probably isn't a candidate for backport to
stable.
Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Miklos Szeredi [Fri, 3 Jan 2014 03:10:20 +0000 (14:10 +1100)]
autofs4: translate pids to the right namespace for the daemon
The PID and the TGID of the process triggering the mount are sent to the
daemon. Currently the global pid values are sent (ones valid in the
initial pid namespace) but this is wrong if the autofs daemon itself is
not running in the initial pid namespace.
So send the pid values that are valid in the namespace of the autofs daemon.
The namespace to use is taken from the oz_pgrp pid pointer, which was set
at mount time to the mounting process' pid namespace.
If the pid translation fails (the triggering process is in an unrelated
pid namespace) then the automount fails with ENOENT.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Biederman <ebiederm@xmission.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
autofs4: allow autofs to work outside the initial PID namespace
Enable autofs4 to work in a "container". oz_pgrp is converted from pid_t
to struct pid and this is stored at mount time based on the "pgrp=" option
or if the option is missing then the current pgrp.
The "pgrp=" option is interpreted in the PID namespace of the current
process. This option is flawed in that it doesn't carry the namespace
information, so it should be deprecated. AFAICS the autofs daemon always
sends the current pgrp, which is the default anyway.
The oz_pgrp is also set from the AUTOFS_DEV_IOCTL_SETPIPEFD_CMD ioctl.
This ioctl sets oz_pgrp to the current pgrp. It is not allowed to change
the pid namespace.
oz_pgrp is used mainly to determine whether the process traversing the
autofs mount tree is the autofs daemon itself or not. This function now
compares the pid pointers instead of the pid_t values.
One other use of oz_pgrp is in autofs4_show_options. There is shows the
virtual pid number (i.e. the one that is valid inside the PID namespace
of the calling process)
For debugging printk convert oz_pgrp to the value in the initial pid
namespace.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Biederman <ebiederm@xmission.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 3 Jan 2014 03:10:19 +0000 (14:10 +1100)]
fs/ramfs/file-nommu.c: make ramfs_nommu_get_unmapped_area() and ramfs_nommu_mmap() static
Since commit 853ac43ab194f "shmem: unify regular and tiny shmem",
ramfs_nommu_get_unmapped_area() and ramfs_nommu_mmap() are not directly
referenced outside of file-nommu.c. Thus make them static.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We observed this problem has been occurring since 2.6.30 with
fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
/*
* Generate 16 random bytes for userspace PRNG seeding.
*/
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
The patch introduces a wrapper around get_random_int() which has lower
overhead than calling get_random_bytes() directly.
With this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878
Analyzed by John Sobecki.
This has been applied on a specific Oracle kernel and has been running on
the customer's production environment (the original bug reporter) for
several months; it has worked fine until now.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <aedilger@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnn@arndb.de> Cc: John Sobecki <john.sobecki@oracle.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 3 Jan 2014 03:10:18 +0000 (14:10 +1100)]
checkpatch: improve space before tab --fix option
This test should remove all the spaces before a tab not just one space.
Substitute a tab for each 8 space block before a tab and remove less than
8 spaces before a tab.
This SPACE_BEFORE_TAB test is done after CODE_INDENT.
If there are spaces used at the beginning of a line that should be
converted to tabs, please make sure that the CODE_INDENT test and
conversion is done before this SPACE_BEFORE_TAB test and conversion.
Reported-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
case blocks should end in a break/return/goto/continue.
If a fall-through is used, it should have a comment showing that it is
intentional. Ideally that comment should be something like:
"/* fall-through */"
Add a test to look for missing break statements.
This looks only at the context lines before an inserted case so it's
possible to have false positives when the context contains a close brace
and the break is before the brace and not part of the patch context.
Looking at recent patches, this is a pretty rare occurrence. The normal
kernel style uses a break as the last line of the previous block.
Signed-off-by: Joe Perches <joe@perche.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 3 Jan 2014 03:10:16 +0000 (14:10 +1100)]
checkpatch: more comprehensive split strings warning
The current checkpatch test for split strings does not find several cases
that should be found.
For instance:
/* Else poor success; go back to mode in "active" table */
} else {
IWL_DEBUG_RATE(mvm,
- "LQ: GOING BACK TO THE OLD TABLE suc=%d cur-tpt=%d old-tpt=%d\n",
+ "GOING BACK TO THE OLD TABLE: SR %d "
+ "cur-tpt %d old-tpt %d\n",
window->success_ratio,
window->average_tpt,
lq_sta->last_tpt);
does not currently emit a warning.
Improve the test to find these cases.
Add more exceptions to reduce false positives for assembly and octal/hex
string constants.
Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ard Biesheuvel [Fri, 3 Jan 2014 03:10:16 +0000 (14:10 +1100)]
firmware/dmi_scan: generalize for use by other archs
This patch makes a couple of changes to the SMBIOS/DMI scanning
code so it can be used on other archs (such as ARM and arm64):
(a) wrap the calls to ioremap()/iounmap(), this allows the use of a
flavor of ioremap() more suitable for random unaligned access;
(b) allow the non-EFI fallback probe into hardcoded physical address
0xF0000 to be disabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Grant Likely <grant.likely@linaro.org> Cc: Ingo Molnar <mingo@elte.hu>
Cc "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Marian Chereji [Fri, 3 Jan 2014 03:10:16 +0000 (14:10 +1100)]
lib: Add CRC64 ECMA module
Add implementation of CRC64 ECMA checksum.
We have an IP Acceleration driver for Freescale network processors which
is using this CRC64. However, it still needs some work in order for it to
become upstreamable.
Signed-off-by: Marian Chereji <marian.chereji@freescale.com> Reviewed-by: Varvara Andrei-B21317 <andrei.varvara@freescale.com> Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Fri, 3 Jan 2014 03:10:16 +0000 (14:10 +1100)]
test: fix sparse warnings in user_copy tests
Sparse fix for "test: check copy_to/from_user boundary validation":
To keep sparse happy with the horrible things being done with the user
memory pointers, declare both __user and non-__user cases ahead of time to
avoid needing to do the casts later.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Fri, 3 Jan 2014 03:10:15 +0000 (14:10 +1100)]
test: check copy_to/from_user boundary validation
To help avoid an architecture failing to correctly check kernel/user
boundaries when handling copy_to_user, copy_from_user, put_user, or
get_user, perform some simple tests and fail to load if any of them behave
unexpectedly.
Specifically, this is to make sure there is a way to notice if things like
what was fixed in 8404663f81 ("ARM: 7527/1: uaccess: explicitly check
__user pointer when !CPU_USE_DOMAINS") ever regresses again, for any
architecture.
Additionally, adds new "user" selftest target, which loads this module.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Fri, 3 Jan 2014 03:10:15 +0000 (14:10 +1100)]
test: add minimal module for verification testing
This is a pair of test modules I'd like to see in the tree. Instead of
putting these in lkdtm, where I've been adding various tests that trigger
crashes, these don't make sense there since they need to be either
distinctly separate, or their pass/fail state don't need to crash the
machine.
These live in lib/ for now, along with a few other in-kernel test modules,
and use the slightly more common "test_" naming convention, instead of
"test-". We should likely standardize on the former:
The first is entirely a no-op module, designed to allow simple testing of
the module loading and verification interface. It's useful to have a
module that has no other uses or dependencies so it can be reliably used
for just testing module loading and verification.
The second is a module that exercises the user memory access functions, in
an effort to make sure that we can quickly catch any regressions in
boundary checking (e.g. like what was recently fixed on ARM).
This patch (of 2):
When doing module loading verification tests (for example, with module
signing, or LSM hooks), it is very handy to have a module that can be
built on all systems under test, isn't auto-loaded at boot, and has no
device or similar dependencies. This creates the "test_module.ko" module
for that purpose, which only reports its load and unload to printk.
Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Most mobile phones have Ambient Light Sensors and it changes brightness
according to the lux. It means it changes backlight brightness frequently
by just writing sysfs node, so it generates uevent.
Usually there's no user to use this backlight changes. But it forks udev
worker threads and it takes about 5ms. The main problem is that it hurts
other process activities. so remove it.
Kay said
"Uevents are for the major, low-frequent, global device state-changes,
not for carrying-out any sort of measurement data. Subsystems which
need that should use other facilities like poll()-able sysfs file or
any other subscription-based, client-tracking interface which does not
cause overhead if it isn't used. Uevents are not the right thing to
use here, and upstream udev should not paper-over broken kernel
subsystems."
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Jingoo Han <jg1.han@samsung.com> Cc: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 3 Jan 2014 03:10:12 +0000 (14:10 +1100)]
get_maintainer: add commit author information to --rolestats
get_maintainer currently uses "Signed-off-by" style lines to find
interested parties to send patches to when the MAINTAINERS file does not
have a specific section entry with a matching file pattern.
Add statistics for commit authors and lines added and deleted to the
information provided by --rolestats.
These statistics are also emitted whenever --rolestats and --git are
selected even when there is a specified maintainer.
This can have the effect of expanding the number of people that are shown
as possible "maintainers" of a particular file because "authors",
"added_lines", and "removed_lines" are also used as criterion for the
--max-maintainers option separate from the "commit_signers".
The first "--git-max-maintainers" values of each criterion
are emitted. Any "ties" are not shown.
For example: (forcedeth does not have a named maintainer)
Old output:
$ ./scripts/get_maintainer.pl -f drivers/net/ethernet/nvidia/forcedeth.c
"David S. Miller" <davem@davemloft.net> (commit_signer:8/10=80%)
Jiri Pirko <jiri@resnulli.us> (commit_signer:2/10=20%)
Patrick McHardy <kaber@trash.net> (commit_signer:2/10=20%)
Larry Finger <Larry.Finger@lwfinger.net> (commit_signer:1/10=10%)
Peter Zijlstra <peterz@infradead.org> (commit_signer:1/10=10%)
netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
linux-kernel@vger.kernel.org (open list)
New output:
$ ./scripts/get_maintainer.pl -f drivers/net/ethernet/nvidia/forcedeth.c
"David S. Miller" <davem@davemloft.net> (commit_signer:8/10=80%)
Jiri Pirko <jiri@resnulli.us> (commit_signer:2/10=20%,authored:2/10=20%,removed_lines:3/33=9%)
Patrick McHardy <kaber@trash.net> (commit_signer:2/10=20%,authored:2/10=20%,added_lines:12/95=13%,removed_lines:10/33=30%)
Larry Finger <Larry.Finger@lwfinger.net> (commit_signer:1/10=10%,authored:1/10=10%,added_lines:35/95=37%)
Peter Zijlstra <peterz@infradead.org> (commit_signer:1/10=10%)
"Peter Hüwe" <PeterHuewe@gmx.de> (authored:1/10=10%,removed_lines:15/33=45%)
Joe Perches <joe@perches.com> (authored:1/10=10%)
Neil Horman <nhorman@tuxdriver.com> (added_lines:40/95=42%)
Bill Pemberton <wfp5p@virginia.edu> (removed_lines:3/33=9%)
netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
linux-kernel@vger.kernel.org (open list)
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 3 Jan 2014 03:10:11 +0000 (14:10 +1100)]
printk/cache: Mark printk_once test variable __read_mostly
Add #include <linux/cache.h> to define __read_mostly.
Convert cache.h to use uapi/linux/kernel.h instead
of linux/kernel.h to avoid recursive #includes.
Convert the ALIGN macro to __ALIGN_KERNEL.
printk_once only sets the bool variable tested
once so mark it __read_mostly.
Neaten the alignment so it matches the rest of the
pr_<level>_once #defines too.
Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Du, Changbin [Fri, 3 Jan 2014 03:10:11 +0000 (14:10 +1100)]
dynamic-debug-howto.txt: update since new wildcard support
Add the usage of using new feature wildcard support.
Signed-off-by: Du, Changbin <changbin.du@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Du, Changbin [Fri, 3 Jan 2014 03:10:11 +0000 (14:10 +1100)]
dynamic_debug: add wildcard support to filter files/functions/modules
Add wildcard '*'(matches zero or more characters) and '?' (matches one
character) support when qurying debug flags.
Now we can open debug messages using keywords. eg:
1. open debug logs in all usb drivers
echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
2. open debug logs for usb xhci code
echo "file *xhci* +p" > <debugfs>/dynamic_debug/control
Signed-off-by: Du, Changbin <changbin.du@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Du, Changbin [Fri, 3 Jan 2014 03:10:10 +0000 (14:10 +1100)]
lib/parser.c: add match_wildcard function
match_wildcard function is a simple implementation of wildcard
matching algorithm. It only supports two usual wildcardes:
'*' - matches zero or more characters
'?' - matches one character
This algorithm is safe since it is non-recursive.
Signed-off-by: Du, Changbin <changbin.du@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Gustavo Padovan [Fri, 3 Jan 2014 03:10:10 +0000 (14:10 +1100)]
drivers/misc/ti-st/st_core.c: fix NULL dereference on protocol type check
If the type we receive is greater than ST_MAX_CHANNELS we can't rely on
type as vector index since we would be accessing unknown memory when we use the type
as index.
Kees Cook [Fri, 3 Jan 2014 03:10:10 +0000 (14:10 +1100)]
stack protector: provide -fstack-protector-strong build option
This changes the stack protector config option into a choice of "None",
"Regular", and "Strong". For "Strong", the kernel is built with
-fstack-protector-strong (gcc 4.9 and later). This options increases the
coverage of the stack protector without the heavy performance hit of
-fstack-protector-all.
For reference, the stack protector options available in gcc are:
-fstack-protector-all:
Adds the stack-canary saving prefix and stack-canary checking suffix to
_all_ function entry and exit. Results in substantial use of stack space
for saving the canary for deep stack users (e.g. historically xfs), and
measurable (though shockingly still low) performance hit due to all the
saving/checking. Really not suitable for sane systems, and was entirely
removed as an option from the kernel many years ago.
-fstack-protector:
Adds the canary save/check to functions that define an 8
(--param=ssp-buffer-size=N, N=8 by default) or more byte local char
array. Traditionally, stack overflows happened with string-based
manipulations, so this was a way to find those functions. Very few
total functions actually get the canary; no measurable performance or
size overhead.
-fstack-protector-strong
Adds the canary for a wider set of functions, since it's not just those
with strings that have ultimately been vulnerable to stack-busting. With
this superset, more functions end up with a canary, but it still remains
small compared to all functions with no measurable change in performance.
Based on the original design document, a function gets the canary when it
contains any of:
- local variable's address used as part of the RHS of an assignment or
function argument
- local variable is an array (or union containing an array), regardless
of array type or length
- uses register local variables
https://docs.google.com/a/google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU
Comparison of "size" and "objdump" output when built with gcc-4.9 in
three configurations:
- defconfig 11430641 text size
36110 function bodies
- defconfig + CONFIG_CC_STACKPROTECTOR 11468490 text size (+0.33%)
1015 of 36110 functions stack-protected (2.81%)
- defconfig + CONFIG_CC_STACKPROTECTOR_STRONG via this patch 11692790 text size (+2.24%)
7401 of 36110 functions stack-protected (20.5%)
With -strong, ARM's compressed boot code now triggers stack protection, so
a static guard was added. Since this is only used during decompression
and was never used before, the exposure here is very small. Once it
switches to the full kernel, the stack guard is back to normal.
Chrome OS has been using -fstack-protector-strong for its kernel builds
for the last 8 months with no problems.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Fri, 3 Jan 2014 03:10:10 +0000 (14:10 +1100)]
stack protector: create HAVE_CC_STACKPROTECTOR for centralized use
Instead of duplicating the CC_STACKPROTECTOR Kconfig and Makefile logic in
each architecture, switch to using HAVE_CC_STACKPROTECTOR and keep
everything in one place. This retains the x86-specific bug verification
scripts.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Roman Gushchin [Fri, 3 Jan 2014 03:10:10 +0000 (14:10 +1100)]
kernel/smp.c: remove cpumask_ipi
After 9a46ad6 ("smp: make smp_call_function_many() use logic similar to
smp_call_function_single()"), cfd->cpumask is accessed only in
smp_call_function_many(). So there is no more need to copy it into
cfd->cpumask_ipi before putting csd into the list. The cpumask_ipi field
is obsolete and can be removed.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Wang YanQing <udknight@gmail.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Elder [Fri, 3 Jan 2014 03:10:09 +0000 (14:10 +1100)]
remove extra definitions of U32_MAX
Now that the definition is centralized in <linux/kernel.h>, the
definitions of U32_MAX (and related) elsewhere in the kernel can be
removed.
Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Sage Weil <sage@inktank.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Elder [Fri, 3 Jan 2014 03:10:09 +0000 (14:10 +1100)]
kernel.h: define u8, s8, u32, etc. limits
Create constants that define the maximum and minimum values
representable by the kernel types u8, s8, u16, s16, and so on.
Signed-off-by: Alex Elder <elder@linaro.org> Cc: Sage Weil <sage@inktank.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Elder [Fri, 3 Jan 2014 03:10:09 +0000 (14:10 +1100)]
conditionally define U32_MAX
The symbol U32_MAX is defined in several spots. Change these definitions
to be conditional. This is in preparation for the next patch, which
centralizes the definition in <linux/kernel.h>.
Signed-off-by: Alex Elder <elder@linaro.org> Cc: Sage Weil <sage@inktank.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>