Ed Cashin [Thu, 29 Nov 2012 03:19:07 +0000 (14:19 +1100)]
aoe: manipulate aoedev network stats under lock
With this bugfix in place the calculation of the criterion for "lateness"
is performed under lock. Without the lock, there is a chance that one of
the non-atomic operations performed on the round trip time statistics
could be incomplete, such that an incorrect lateness criterion would be
calculated.
Without this change, the effect of the bug would be rare unecessary but
benign retransmissions.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:07 +0000 (14:19 +1100)]
aoe: err device: include MAC addresses for unexpected responses
The /dev/etherd/err character device provides low-level information about
normal but sometimes interesting AoE command retransmits and "unexpected
responses", i.e., responses for packets that have already been
retransmitted.
This change adds MAC addresses to the messages about unexpected responses,
so that when they occur, it's more easy to determine the network paths to
which they belong.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:07 +0000 (14:19 +1100)]
aoe: improve network congestion handling
The aoe driver already had some congestion handling, but it was limited in
its ability to cope with the kind of congestion that can arise on more
complex networks such as those involving paths through multiple ethernet
switches.
Some of the lessons from TCP's history of development can be applied to
improving the congestion control and avoidance on AoE storage networks.
These changes use familar concepts from Van Jacobson's "Congestion
Avoidance and Control" paper from '88, without adding significant
overhead.
This patch depends on an upcoming patch that covers the failover case when
AoE commands being retransmitted are transferred from one retransmit queue
to another. Another upcoming patch increases the timing accuracy.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:06 +0000 (14:19 +1100)]
aoe: provide ATA identify device content to user on request
Make the aoe driver follow expected behavior when the user uses ioctl to
get the ATA device identify information, allowing access to model, serial
number, etc.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:05 +0000 (14:19 +1100)]
aoe: "payload" sysfs file exports per-AoE-command data transfer size
The userland aoetools package includes an "aoe-stat" command that can
display a "payload size" column when the aoe driver exports this
information. Users can quickly see what amount of user data is
transferred inside each AoE command on the network, network headers
excluded.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:05 +0000 (14:19 +1100)]
aoe: support larger I/O requests via aoe_maxsectors module param
The GPFS filesystem is an example of an aoe user that requires the aoe
driver to support I/O request sizes larger than the default. Most users
will not need large I/O request sizes, because they would need to be split
up into multiple AoE commands anyway.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:04 +0000 (14:19 +1100)]
aoe: support the forgetting (flushing) of a user-specified AoE target
Users sometimes want to cause the aoe driver to forget a particular
previously discovered device when it is no longer online. The aoetools
provide an "aoe-flush" command that users run to perform this
administrative task. The changes below provide the support needed in the
driver.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:04 +0000 (14:19 +1100)]
aoe: update cap on outstanding commands based on config query response
The ATA over Ethernet config query response contains a "buffer count"
field reflecting the AoE target's capacity to buffer incoming AoE
commands.
By taking the current value of this field into accound, we increase
performance throughput or avoid network congestion, when the value
has increased or decreased, respectively.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:04 +0000 (14:19 +1100)]
aoe: avoid using skb member after dev_queue_xmit
After calling dev_queue_xmit it is no longer safe to access the
members of the skb.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ed Cashin [Thu, 29 Nov 2012 03:19:03 +0000 (14:19 +1100)]
Documentation/sparse.txt: document context annotations for lock checking
The context feature of sparse is used with the Linux kernel sources to
check for imbalanced uses of locks. Document the annotations defined in
include/linux/compiler.h that tell sparse what to expect when a lock is
held on function entry, exit, or both.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Christopher Li <sparse@chrisli.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Josh Triplett [Thu, 29 Nov 2012 03:19:02 +0000 (14:19 +1100)]
linux/compiler.h: add __must_hold macro for functions called with a lock held
linux/compiler.h has macros to denote functions that acquire or release
locks, but not to denote functions called with a lock held that return
with the lock still held. Add a __must_hold macro to cover that case.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Ed Cashin <ecashin@coraid.com> Tested-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Manfred Spraul [Thu, 29 Nov 2012 03:19:02 +0000 (14:19 +1100)]
ipc/sem.c: alternatives to preempt_disable()
ipc/sem.c uses a custom wakeup scheme that relies on preempt_disable().
On -RT, this causes increased latencies and debug warnings.
The patch adds two additional schemes:
- one built around a completion - could be better for -RT kernels
- one built around a spinlock - unfortunately it's broken
- and the current one
My preferred solution would be the spinlock implementation: RT would use
premptible spinlocks, mainline normal spinlocks. Thus both get the
optimal implementation without any special code in ipc/sem.c.
Unfortunately, I don't see how it could be fixed.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Carlos Alberto Lopez Perez <clopez@igalia.com> Cc: Rob Landley <rob@landley.net> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ipc: add more comments to message copying related code
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remvoe the redundant and confusing fill_copy(). Also add copy_msg() check
for error. In this case exit from the function have to be done instead of
break, because further code interprets any error as EAGAIN.
Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
disabled.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ipc: convert prepare_copy() from macro to function
This code works if CONFIG_CHECKPOINT_RESTORE is disabled.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Passing and checking of msgflg to free_copy() is redundant. This patch
sets copy to NULL on declaration instead and checks for non-NULL in
free_copy().
Note: in case of copy allocation failure, error is returned immediately.
So no need to check for IS_ERR() in free_copy().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This update fixes coding style problems (80-characters line and others).
Also, it fixes test to work with new IPC sysctls (instead of using
experimental API logic, which was throwed away and replaced by sysctls).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MSG_COPY feature was developed for Checkpoint/Restart In User space project
and thus wrapped in CONFIG_CHECKPOINT_RESTORE macro. But code look a bit ugly.
So this patch is an attempt to cleanup do_msgrcv() a bit and make it looks
better.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch is required for checkpoint/restore in userspace.
c/r requires some way to get all pending IPC messages without deleting
them from the queue (checkpoint can fail and in this case tasks will be
resumed, so queue have to be valid).
To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
was introduced. If this flag was specified, then mtype is interpreted as
number of the message to copy.
If MSG_COPY is set, then kernel will allocate dummy message with passed
size, and then use new copy_msg() helper function to copy desired message
(instead of unlinking it from the queue).
Notes:
1) Return -ENOSYS if MSG_COPY is specified, but
CONFIG_CHECKPOINT_RESTORE is not set.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
WARNING: line over 80 characters
#33: FILE: include/linux/msg.h:39:
+ long (*msg_fill)(void __user *, struct msg_msg *, size_t ));
ERROR: space prohibited before that close parenthesis ')'
#33: FILE: include/linux/msg.h:39:
+ long (*msg_fill)(void __user *, struct msg_msg *, size_t ));
WARNING: line over 80 characters
#94: FILE: ipc/compat.c:368:
+ return do_msgrcv(first, uptr, second, msgtyp, third, compat_do_msg_fill);
ERROR: space prohibited before that close parenthesis ')'
#142: FILE: ipc/msg.c:774:
+ long (*msg_handler)(void __user *, struct msg_msg *, size_t ))
total: 2 errors, 2 warnings, 165 lines checked
./patches/ipc-message-queue-receive-cleanup.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add 3 new variables and sysctls to tune them (by one "next_id" variable
for messages, semaphores and shared memory respectively). This variable
can be used to set desired id for next allocated IPC object. By default
it's equal to -1 and old behaviour is preserved. If this variable is
non-negative, then desired idr will be extracted from it and used as a
start value to search for free IDR slot.
Notes:
1) this patch doesn't guarantee that the new object will have desired
id. So it's up to user space how to handle new object with wrong id.
2) After a sucessful id allocation attempt, "next_id" will be set back
to -1 (if it was non-negative).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 29 Nov 2012 03:18:56 +0000 (14:18 +1100)]
exec: use -ELOOP for max recursion depth
To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.
This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:
The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.
Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 29 Nov 2012 03:18:55 +0000 (14:18 +1100)]
exec: do not leave bprm->interp on stack
If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.
Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching binfmt
modules. Unfortunately, the logic in binfmt_script and binfmt_misc does
not expect to get restarted. They leave bprm->interp pointing to their
local stack. This means on restart bprm->interp is left pointing into
unused stack memory which can then be copied into the userspace argv
areas.
After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.
This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.
For a proof of concept, see DoTest.sh from:
http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alan Cox [Thu, 29 Nov 2012 03:18:55 +0000 (14:18 +1100)]
fork: unshare: remove dead code
If new_nsproxy is set we will always call switch_task_namespaces and then
set new_nsproxy back to NULL so the reassignment and fall through check
are redundant
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Artem Bityutskiy [Thu, 29 Nov 2012 03:18:55 +0000 (14:18 +1100)]
proc: pid/status: show all supplementary groups
We display a list of supplementary group for each process in
/proc/<pid>/status. However, we show only the first 32 groups, not all of
them.
Although this is rare, but sometimes processes do have more than 32
supplementary groups, and this kernel limitation breaks user-space apps
that rely on the group list in /proc/<pid>/status.
Number 32 comes from the internal NGROUPS_SMALL macro which defines the
length for the internal kernel "small" groups buffer. There is no
apparent reason to limit to this value.
This patch removes the 32 groups printing limit.
The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX,
which is currently set to 65536. And this is the maximum count of groups
we may possibly print.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 29 Nov 2012 03:18:54 +0000 (14:18 +1100)]
/proc/pid/status: add "Seccomp" field
It is currently impossible to examine the state of seccomp for a given
process. While attaching with gdb and attempting "call
prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
reliable. If the process is in seccomp mode 1, this query will kill the
process (prctl not allowed), if the process is in mode 2 with prctl not
allowed, it will similarly be killed, and in weird cases, if prctl is
filtered to return errno 0, it can look like seccomp is disabled.
When reviewing the state of running processes, there should be a way to
externally examine the seccomp mode. ("Did this build of Chrome end up
using seccomp?" "Did my distro ship ssh with seccomp enabled?")
This adds the "Seccomp" line to /proc/$pid/status.
Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Morris <jmorris@namei.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Thu, 29 Nov 2012 03:18:54 +0000 (14:18 +1100)]
procfs-add-vmflags-field-in-smaps-output-v4-fix
remove unneeded brakes per sfr, avoid using bloaty for_each_set_bit()
Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cyrill Gorcunov [Thu, 29 Nov 2012 03:18:54 +0000 (14:18 +1100)]
procfs: add VmFlags field in smaps output
During c/r sessions we've found that there is no way at the moment to
fetch some VMA associated flags, such as mlock() and madvise().
This leads us to a problem -- we don't know if we should call for mlock()
and/or madvise() after restore on the vma area we're bringing back to
life.
This patch intorduces a new field into "smaps" output called VmFlags,
where all set flags associated with the particular VMA is shown as two
letter mnemonics.
[ Strictly speaking for c/r we only need mlock/madvise bits but it has been
said that providing just a few flags looks somehow inconsistent. So all
flags are here now. ]
This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
other applications may start to use these fields.
The data is encoded in a somewhat awkward two letters mnemonic form, to
encourage userspace to be prepared for fields being added or removed in
the future.
[a.p.zijlstra@chello.nl: props to use for_each_set_bit]
[sfr@canb.auug.org.au: props to use array instead of struct]
[akpm@linux-foundation.org: overall redesign and simplification] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I suggest to hide non-existent capabilities. Here is two reasons.
* It's logically and easier for using.
* It helps to checkpoint-restore capabilities of tasks, because tasks
can be restored on another kernel, where CAP_LAST_CAP is bigger.
Signed-off-by: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Reviewed-by: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 29 Nov 2012 03:18:53 +0000 (14:18 +1100)]
ptrace: introduce PTRACE_O_EXITKILL
Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.
Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.
Note that the new option is not equal to the last-option << 1. Because
currently all options have an event, and the new one starts the eventless
group. It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.
Suggested by Amnon Shiloh.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Amnon Shiloh <u3557@miso.sublimeip.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Chris Evans <scarybeasts@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eldad Zack [Thu, 29 Nov 2012 03:18:53 +0000 (14:18 +1100)]
simple_strto*: annotate function as obsolete
Update the documentation for simple_strto* to reflect that it has been
obsoleted and advise the usage of kstrto*.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Joe Perches <joe@perches.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eldad Zack [Thu, 29 Nov 2012 03:18:52 +0000 (14:18 +1100)]
kstrto*: add documentation
As Bruce Fields pointed out, kstrto* is currently lacking kerneldoc
comments. This patch adds kerneldoc comments to common variants of
kstrto*: kstrto(u)l, kstrto(u)ll and kstrto(u)int.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Joe Perches <joe@perches.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Reisner [Thu, 29 Nov 2012 03:18:52 +0000 (14:18 +1100)]
fs/fat: strip "cp" prefix from codepage in display
Option parsing code expects an unsigned integer for the codepage option,
but prefixes and stores this option with "cp" before passing to
load_nls(). This makes the displayed option in /proc an invalid one.
Strip the prefix when printing so that the displayed option is valid for
reuse.
Signed-off-by: Dave Reisner <dreisner@archlinux.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Thu, 29 Nov 2012 03:18:51 +0000 (14:18 +1100)]
fat: ix mount option parsing
parse_options() is supposed to return value < 0 on error however we
returned 0 (success) in a lot of cases. This actually was not a problem
in practice because match_token() used by parse_options() is clever and
catches most of the problems for us.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Thu, 29 Nov 2012 03:18:51 +0000 (14:18 +1100)]
fat: provide option for setting timezone offset
So far FAT either offsets time stamps by sys_tz.minuteswest or leaves them
as they are (when tz=UTC mount option is used). However in some cases it
is useful if one can specify time stamp offset on his own (e.g. when time
zone of the camera connected is different from time zone of the computer,
or when HW clock is in UTC and thus sys_tz.minuteswest == 0).
So provide a mount option time_offset= which allows user to specify offset
in minutes that should be applied to time stamps on the filesystem.
akpm: this code would work incorrectly when used via `mount -o remount',
because cached inodes would not be updated. But fatfs's fat_remount() is
basically a no-op anyway.
Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hfsplus: add support of manipulation by attributes file
Add support of manipulation by attributes file.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hfsplus: rework functionality of getting, setting and deleting of extended attributes
Rework functionality of getting, setting and deleting of extended attributes.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hfsplus: add functionality of manipulating by records in attributes tree
Add functionality of manipulating by records in attributes tree.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hfsplus: add on-disk layout declarations related to attributes tree
Add all necessary on-disk layout declarations related to attributes file.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
WARNING: space prohibited between function name and open parenthesis '('
#66: FILE: include/uapi/linux/xattr.h:21:
+#define XATTR_MAC_OSX_PREFIX_LEN (sizeof (XATTR_MAC_OSX_PREFIX) - 1)
total: 0 errors, 1 warnings, 9 lines checked
./patches/hfsplus-add-osx-prefix-for-handling-namespace-of-mac-os-x-extended-attributes.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hfsplus: add osx.* prefix for handling namespace of Mac OS X extended attributes
hfsplus: reworked support of extended attributes.
Current mainline implementation of hfsplus file system driver treats as
extended attributes only two fields (fdType and fdCreator) of user_info
field in file description record (struct hfsplus_cat_file). It is
possible to get or set only these two fields as extended attributes. But
HFS+ treats as com.apple.FinderInfo extended attribute an union of
user_info and finder_info fields as for file (struct hfsplus_cat_file) as
for folder (struct hfsplus_cat_folder). Moreover, current mainline
implementation of hfsplus file system driver doesn't support special
metadata file - attributes tree.
Mac OS X 10.4 and later support extended attributes by making use of the
HFS+ filesystem Attributes file B*-tree feature which allows for named
forks. Mac OS X supports only inline extended attributes, limiting their
size to 3802 bytes. Any regular file may have a list of extended
attributes. HFS+ supports an arbitrary number of named forks. Each
attribute is denoted by a name and the associated data. The name is a
null-terminated Unicode string. It is possible to list, to get, to set,
and to remove extended attributes from files or directories.
It exists some peculiarity during getting of extended attributes list by
means of getfattr utility. The getfattr utility expects prefix "user."
before any extended attribute's name. So, it ignores any names that don't
contained such prefix. Such behavior of getfattr utility results in
unexpected empty output of extended attributes list even in the case when
file (or folder) contains extended attributes. It needs to use empty
string as regular expression pattern for names matching (getfattr
--match="").
For support of extended attributes in HFS+:
1. It was added necessary on-disk layout declarations related to Attributes
tree into hfsplus_raw.h file.
2. It was added attributes.c file with implementation of functionality of
manipulation by records in Attributes tree.
3. It was reworked hfsplus_listxattr, hfsplus_getxattr, hfsplus_setxattr
functions in ioctl.c. Moreover, it was added hfsplus_removexattr method.
This patch:
Add osx.* prefix for handling namespace of Mac OS X extended attributes.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alan Cox [Thu, 29 Nov 2012 03:18:48 +0000 (14:18 +1100)]
hfsplus: avoid crash on failed block map free
If the read fails we kmap an error code. This doesn't end well. Instead
print a critical error and pray. This mirrors the rest of the fs
behaviour with critical error cases.
Acked-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tushar Behera [Thu, 29 Nov 2012 03:18:47 +0000 (14:18 +1100)]
drivers/rtc/rtc-s3c.c: convert to use devm_* API
rtc-s3c driver is modified to use devm_request_and_ioremap() (combining
request_mem_region and ioremap), devm_clk_get() and devm_request_irq()
APIs. Since this removes the necessity of freeing the related resources
the return path is also simplified.
Signed-off-by: Tushar Behera <tushar.behera@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
err_nores label redirects to a simple return statement. Move the return
statement to caller location and remove the label.
Signed-off-by: Tushar Behera <tushar.behera@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Thierry Reding [Thu, 29 Nov 2012 03:18:46 +0000 (14:18 +1100)]
rtc: add NXP PCF8523 support
Add an RTC driver for PCF8523 chips by NXP Semiconductors. No support is
currently provided for the alarm and interrupt functions. Only the time
and date functionality is implemented.
Deepak Sikri [Thu, 29 Nov 2012 03:18:45 +0000 (14:18 +1100)]
rtc: rtc-spear: Provide flag for no support of UIE mode
The applications can set the RTC hardware to trigger interrupts in one
of three modes:
* AIE: Alarm interrupt
* UIE: Update interrupt (ie: once per second)
* PIE: Periodic interrupt (sub-second irqs)
The above defined 3 modes are to be supported in the RTC HW in form of
interrupts. The SPEAr RTC hardware does not support the later two modes.
There have been refinements in the RTC core in mainline related to
use of timer queue infrastructure to manage events in RTC. Please refer
the below mentioned patch for details:
* RTC: Rework RTC code to use timerqueue for events
* SHA ID: 6610e0893b8bc6f59b14fed7f089c5997f035f88
There have been provisions added to support hardware that do not have
support the UIE mode. Please refer the following patch.
* rtc: Provide flag for rtc devices that don't support UIE
* SHA ID: 4a649903f91232d02284d53724b0a45728111767
The patch makes use of the provision defined in the above patch to
update the hardware status of UIE mode.
Deepak Sikri [Thu, 29 Nov 2012 03:18:45 +0000 (14:18 +1100)]
rtc: rtc-spear: Add clk_{un}prepare() support
clk_{un}prepare is mandatory for platforms using common clock framework.
Because for SPEAr we don't do anything in clk_{un}prepare() calls, just
call them once in probe/remove.
Viresh Kumar [Thu, 29 Nov 2012 03:18:44 +0000 (14:18 +1100)]
rtc: rtc-spear: use devm_*() routines
Free the rtc-spear driver from tension of freeing resources :) devm_*
derivatives of multiple routines are used while allocating resources,
which would be freed automatically by kernel.
In case of error, test_init() needs to call platform_device_del() instead
of platform_device_unregister(). Otherwise, we may call
platform_device_put() twice.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Roland Stigge [Thu, 29 Nov 2012 03:18:42 +0000 (14:18 +1100)]
drivers/rtc/rtc-imxdi: support for i.MX53
Enable support for i.MX53 in addition to i.MX25 by enabling the driver on
ARCH_MXC generally.
Signed-off-by: Roland Stigge <stigge@antcom.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vaibhav Hiremath [Thu, 29 Nov 2012 03:18:42 +0000 (14:18 +1100)]
rtc: omap: add runtime pm support
OMAP1 RTC driver is used in multiple devices like, OMAPL138 and AM33XX.
Driver currently doesn't handle any clocks, which may be right for OMAP1
architecture but in case of AM33XX, the clock/module needs to be enabled
in order to access the registers.
So convert this driver to runtime pm, which internally handles rest.
[afzal@ti.com: handle error path] Signed-off-by: Vaibhav Hiremath <hvaibhav@ti.com> Signed-off-by: Afzal Mohammed <afzal@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Sekhar Nori <nsekhar@ti.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <zonque@gmail.com> Cc: Vaibhav Hiremath <hvaibhav@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Afzal Mohammed [Thu, 29 Nov 2012 03:18:42 +0000 (14:18 +1100)]
rtc: omap: depend on am33xx
rtc-omap driver can be reused for AM33xx RTC. Provide dependency in
Kconfig.
Signed-off-by: Afzal Mohammed <afzal@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Sekhar Nori <nsekhar@ti.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <zonque@gmail.com> Cc: Vaibhav Hiremath <hvaibhav@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Afzal Mohammed [Thu, 29 Nov 2012 03:18:42 +0000 (14:18 +1100)]
rtc: omap: dt support
Enhance rtc-omap driver with DT capability
Signed-off-by: Afzal Mohammed <afzal@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Sekhar Nori <nsekhar@ti.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <zonque@gmail.com> Cc: Vaibhav Hiremath <hvaibhav@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Afzal Mohammed [Thu, 29 Nov 2012 03:18:41 +0000 (14:18 +1100)]
ARM: davinci: remove rtc kicker release
rtc-omap driver is now capable of handling kicker mechanism, hence remove
kicker handling at platform level, instead provide proper device name so
that driver can handle kicker mechanism by itself
Signed-off-by: Afzal Mohammed <afzal@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Sekhar Nori <nsekhar@ti.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <zonque@gmail.com> Cc: Vaibhav Hiremath <hvaibhav@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Afzal Mohammed [Thu, 29 Nov 2012 03:18:41 +0000 (14:18 +1100)]
rtc: omap: kicker mechanism support
OMAP RTC IP can have kicker feature. This prevents spurious writes to
register. To write to registers kicker lock has to be released.
Procedure to do it as follows,
1. write to kick0 register, 0x83e70b13
2. write to kick1 register, 0x95a4f1e0
Writing value other than 0x83e70b13 to kick0 enables write locking, more
details about kicker mechanism can be found in section 20.3.3.5.3 of
AM335X TRM @www.ti.com/am335x
Here id table information is added and is used to distinguish those that
require kicker handling and the ones that doesn't need it. There are more
features in the newer IP's compared to legacy ones other than kicker,
which driver currently doesn't handle, supporting additional features
would be easier with the addition of id table.
Older IP (of OMAP1) doesn't have revision register as per TRM, so revision
register can't be relied always to find features, hence id table is being
used.
While at it, replace __raw_writeb/__raw_readb with writeb/readb; this
driver is used on ARMv7 (AM335X SoC)
Signed-off-by: Afzal Mohammed <afzal@ti.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Sekhar Nori <nsekhar@ti.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <zonque@gmail.com> Cc: Vaibhav Hiremath <hvaibhav@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We observed this problem has been occurring since 2.6.30 with
fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
/*
* Generate 16 random bytes for userspace PRNG seeding.
*/
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
The patch introduces a wrapper around get_random_int() which has lower
overhead than calling get_random_bytes() directly.
With this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878
Analyzed by John Sobecki.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <aedilger@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnn@arndb.de> Cc: John Sobecki <john.sobecki@oracle.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alan Cox [Thu, 29 Nov 2012 03:18:40 +0000 (14:18 +1100)]
binfmt_elf: fix corner case kfree of uninitialized data
If elf_core_dump() is called and fill_note_info() fails in the kmalloc()
then it returns 0 but has not yet initialised all the needed fields. As a
result we do a kfree(randomness) after correctly skipping the thread data.
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Paton J. Lewis [Thu, 29 Nov 2012 03:18:40 +0000 (14:18 +1100)]
epoll: support for disabling items, and a self-test app
It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait). Therefore the
deleting thread does not know when it is safe to delete resources
pertaining to the associated epoll item because another thread might be
using those resources.
The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before
another thread is done handling an event returned by epoll_wait.
This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which disables
an epoll item. If epoll_ctl returns -EBUSY in this case, then another
thread may handling a return from epoll_wait for this item. Otherwise if
epoll_ctl returns 0, then it is safe to delete the epoll item. This
allows multiple threads to use a mutex to determine when it is safe to
delete an epoll item and its associated resources, which allows epoll
items to be deleted both efficiently and without error in a multi-threaded
environment. Note that EPOLL_CTL_DISABLE is only useful in conjunction
with EPOLLONESHOT, and using EPOLL_CTL_DISABLE on an epoll item without
EPOLLONESHOT returns -EINVAL.
This patch also adds a new test_epoll self-test program to both
demonstrate the need for this feature and test it.
Signed-off-by: Paton J. Lewis <palewis@adobe.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@redhat.com> Cc: Paul Holland <pholland@adobe.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Thu, 29 Nov 2012 03:18:39 +0000 (14:18 +1100)]
checkpatch: allow control over line length warning, default remains 80
Some projects might want a longer line length so allow a command line
--max-line-length=n control over the long line warnings. The default line
length is 80.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Constantine Shulyupin <const@makelinux.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tao Ma [Thu, 29 Nov 2012 03:18:38 +0000 (14:18 +1100)]
checkpatch: remove reference to feature-removal-schedule.txt
In 9c0ece069, Linus removes feature-removal-schedule.txt from
Documentation, but there is still some reference to this file. So remove
them.
Signed-off-by: Tao Ma <boyu.mt@taobao.com> Acked-by: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Thu, 29 Nov 2012 03:18:38 +0000 (14:18 +1100)]
checkpatch: warn about using CONFIG_EXPERIMENTAL
This config item has not carried much meaning for a while now and is
almost always enabled by default. As agreed during the Linux kernel
summit, it is being removed. This will discourage future addition of
CONFIG_EXPERIMENTAL while it is being phased out.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Thu, 29 Nov 2012 03:18:37 +0000 (14:18 +1100)]
checkpatch: warn on unnecessary line continuations
When the previous line is not a line continuation and the current line has
a line continuation but is not a #define, emit a warning.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Constantine Shulyupin <const@makelinux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>