Chuck Lever [Fri, 9 Aug 2013 16:47:51 +0000 (12:47 -0400)]
NFS: When displaying session slot numbers, use "%u" consistently
Clean up, since slot and sequence numbers are all unsigned anyway.
Among other things, squelch compiler warnings:
linux/fs/nfs/nfs4proc.c: In function ‘nfs4_setup_sequence’:
linux/fs/nfs/nfs4proc.c:703:2: warning: signed and unsigned type in
conditional expression [-Wsign-compare]
and
linux/fs/nfs/nfs4session.c: In function ‘nfs4_alloc_slot’:
linux/fs/nfs/nfs4session.c:151:31: warning: signed and unsigned type in
conditional expression [-Wsign-compare]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
then the asynchronous nature of the sillyrename operation means that
we can end up getting EBUSY for the rmdir() in the above test. Fix
that by ensuring that we wait for any in-progress sillyrenames
before sending the rmdir() to the server.
1) the nfs_client's rpc_client was using KRB5i/p (now tried by default)
2) the current user doesn't have valid kerberos credentials
This situation is quite common - as of now a sec=sys mount would use
krb5i for the nfs_client's rpc_client and a user would hardly be faulted
for not having run kinit.
The solution is to use the machine cred when trying to use an integrity
protected auth flavor for SECINFO.
Older servers may not support using the machine cred or an integrity
protected auth flavor for SECINFO in every circumstance, so we fall back
to using the user's cred and the filesystem's auth flavor in this case.
We run into another problem when running against linux nfs servers -
they return NFS4ERR_WRONGSEC when using integrity auth flavor (unless the
mount is also that flavor) even though that is not a valid error for
SECINFO*. Even though it's against spec, handle WRONGSEC errors on SECINFO
by falling back to using the user cred and the filesystem's auth flavor.
Andy Adamson [Wed, 14 Aug 2013 15:59:17 +0000 (11:59 -0400)]
SUNRPC refactor rpcauth_checkverf error returns
Most of the time an error from the credops crvalidate function means the
server has sent us a garbage verifier. The gss_validate function is the
exception where there is an -EACCES case if the user GSS_context on the client
has expired.
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Wed, 14 Aug 2013 15:59:16 +0000 (11:59 -0400)]
NFS avoid expired credential keys for buffered writes
We must avoid buffering a WRITE that is using a credential key (e.g. a GSS
context key) that is about to expire or has expired. We currently will
paint ourselves into a corner by returning success to the applciation
for such a buffered WRITE, only to discover that we do not have permission when
we attempt to flush the WRITE (and potentially associated COMMIT) to disk.
Use the RPC layer credential key timeout and expire routines which use a
a watermark, gss_key_expire_timeo. We test the key in nfs_file_write.
If a WRITE is using a credential with a key that will expire within
watermark seconds, flush the inode in nfs_write_end and send only
NFS_FILE_SYNC WRITEs by adding nfs_ctx_key_to_expire to nfs_need_sync_write.
Note that this results in single page NFS_FILE_SYNC WRITEs.
Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: removed a pr_warn_ratelimited() for now] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Wed, 14 Aug 2013 15:59:15 +0000 (11:59 -0400)]
SUNRPC new rpc_credops to test credential expiry
This patch provides the RPC layer helper functions to allow NFS to manage
data in the face of expired credentials - such as avoiding buffered WRITEs
and COMMITs when the gss context will expire before the WRITEs are flushed
and COMMITs are sent.
These helper functions enable checking the expiration of an underlying
credential key for a generic rpc credential, e.g. the gss_cred gss context
gc_expiry which for Kerberos is set to the remaining TGT lifetime.
A new rpc_authops key_timeout is only defined for the generic auth.
A new rpc_credops crkey_to_expire is only defined for the generic cred.
A new rpc_credops crkey_timeout is only defined for the gss cred.
Set a credential key expiry watermark, RPC_KEY_EXPIRE_TIMEO set to 240 seconds
as a default and can be set via a module parameter as we need to ensure there
is time for any dirty data to be flushed.
If key_timeout is called on a credential with an underlying credential key that
will expire within watermark seconds, we set the RPC_CRED_KEY_EXPIRE_SOON
flag in the generic_cred acred so that the NFS layer can clean up prior to
key expiration.
Checking a generic credential's underlying credential involves a cred lookup.
To avoid this lookup in the normal case when the underlying credential has
a key that is valid (before the watermark), a notify flag is set in
the generic credential the first time the key_timeout is called. The
generic credential then stops checking the underlying credential key expiry, and
the underlying credential (gss_cred) match routine then checks the key
expiration upon each normal use and sets a flag in the associated generic
credential only when the key expiration is within the watermark.
This in turn signals the generic credential key_timeout to perform the extra
credential lookup thereafter.
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson [Wed, 14 Aug 2013 15:59:13 +0000 (11:59 -0400)]
SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresult
The NFS layer needs to know when a key has expired.
This change also returns -EKEYEXPIRED to the application, and the informative
"Key has expired" error message is displayed. The user then knows that
credential renewal is required.
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
SUNRPC: rpcauth_create needs to know about rpc_clnt clone status
Ensure that we set rpc_clnt->cl_parent before calling rpc_client_register
so that rpcauth_create can find any existing RPCSEC_GSS caches for this
transport.
Trond Myklebust [Mon, 26 Aug 2013 19:38:11 +0000 (15:38 -0400)]
SUNRPC: Add a framework to clean up management of rpc_pipefs directories
The current system requires everyone to set up notifiers, manage directory
locking, etc.
What we really want to do is have the rpc_client create its directory,
and then create all the entries.
This patch will allow the RPCSEC_GSS and NFS code to register all the
objects that they want to have appear in the directory, and then have
the sunrpc code call them back to actually create/destroy their pipefs
dentries when the rpc_client creates/destroys the parent.
Trond Myklebust [Mon, 26 Aug 2013 20:05:11 +0000 (16:05 -0400)]
RPCSEC_GSS: Fix an Oopsable condition when creating/destroying pipefs objects
If an error condition occurs on rpc_pipefs creation, or the user mounts
rpc_pipefs and then unmounts it, then the dentries in struct gss_auth
need to be reset to NULL so that a second call to gss_pipes_dentries_destroy
doesn't try to free them again.
Trond Myklebust [Mon, 26 Aug 2013 23:23:04 +0000 (19:23 -0400)]
SUNRPC: Replace clnt->cl_principal
The clnt->cl_principal is being used exclusively to store the service
target name for RPCSEC_GSS/krb5 callbacks. Replace it with something that
is stored only in the RPCSEC_GSS-specific code.
After reclaiming state that was lost, the NFS client tries to reclaim
any locks, and then checks that each one has NFS_LOCK_INITIALIZED set
(which means that the server has confirmed the lock).
However if the client holds a delegation, nfs_reclaim_locks() simply aborts
(or more accurately it called nfs_lock_reclaim() and that returns without
doing anything).
This is because when a delegation is held, the server doesn't need to
know about locks.
So if a delegation is held, NFS_LOCK_INITIALIZED is not expected, and
its absence is certainly not an error.
So don't print the warnings if NFS_DELGATED_STATE is set.
Trond Myklebust [Tue, 20 Aug 2013 16:29:27 +0000 (12:29 -0400)]
NFS: Remove the NFSv4 "open optimisation" from nfs_permission
Ever since commit 6168f62cb (Add ACCESS operation to OPEN compound)
the NFSv4 atomic open has primed the access cache, and so nfs_permission
will no longer do an RPC call on the wire.
Andy Adamson [Mon, 22 Jul 2013 22:41:23 +0000 (18:41 -0400)]
NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE
Increase NFS4_DEF_SLOT_TABLE_SIZE which is used as the client ca_maxreequests
value in CREATE_SESSION. Current non-dynamic session slot server
implementations use the client ca_maxrequests as a maximum slot number: 64
session slots can handle most workloads.
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 24 Jul 2013 16:28:37 +0000 (12:28 -0400)]
NFS: Never use user credentials for lease renewal
Never try to use a non-UID 0 user credential for lease management,
as that credential can change out from under us. The server will
block NFSv4 lease recovery with NFS4ERR_CLID_INUSE.
Since the mechanism to acquire a credential for lease management
is now the same for all minor versions, replace the minor version-
specific callout with a single function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Wed, 24 Jul 2013 16:28:28 +0000 (12:28 -0400)]
NFS: Use root's credential for lease management when keytab is missing
Commit 05f4c350 "NFS: Discover NFSv4 server trunking when mounting"
Fri Sep 14 17:24:32 2012 introduced Uniform Client String support,
which forces our NFS client to establish a client ID immediately
during a mount operation rather than waiting until a user wants to
open a file.
Normally machine credentials (eg. from a keytab) are used to perform
a mount operation that is protected by Kerberos. Before 05fc350,
SETCLIENTID used a machine credential, or fell back to a regular
user's credential if no keytab is available.
On clients that don't have a keytab, performing SETCLIENTID early
means there's no user credential to fall back on, since no regular
user has kinit'd yet. 05f4c350 seems to have broken the ability
to mount with sec=krb5 on clients that don't have a keytab in
kernels 3.7 - 3.10.
To address this regression, commit 4edaa308 (NFS: Use "krb5i" to
establish NFSv4 state whenever possible), Sat Mar 16 15:56:20 2013,
was merged in 3.10. This commit forces the NFS client to fall back
to AUTH_SYS for lease management operations if no keytab is
available.
Neil Brown noticed that, since root is required to kinit to do a
sec=krb5 mount when a client doesn't have a keytab, we can try to
use root's Kerberos credential before AUTH_SYS.
Now, when determining a principal and flavor to use for lease
management, the NFS client tries in this order:
1. Flavor: AUTH_GSS, krb5i
Principal: service principal (via keytab)
2. Flavor: AUTH_GSS, krb5i
Principal: user principal established for UID 0 (via kinit)
3. Flavor: AUTH_SYS
Principal: UID 0 / GID 0
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Fri, 2 Aug 2013 15:39:32 +0000 (11:39 -0400)]
nfs: verify open flags before allowing an atomic open
Currently, you can open a NFSv4 file with O_APPEND|O_DIRECT, but cannot
fcntl(F_SETFL,...) with those flags. This flag combination is explicitly
forbidden on NFSv3 opens, and it seems like it should also be on NFSv4.
Reported-by: Chao Ye <cye@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4: Fix nfs4_init_uniform_client_string for net namespaces
Commit 6f2ea7f2a (NFS: Add nfs4_unique_id boot parameter) introduces a
boot parameter that allows client administrators to set a string
identifier for use by the EXCHANGE_ID and SETCLIENTID arguments in order
to make them more globally unique.
Unfortunately, that uniquifier is no longer globally unique in the presence
of net namespaces, since each container expects to be able to set up their
own lease when mounting a new NFSv4/4.1 partition.
The fix is to add back in the container-specific hostname in addition to
the unique id.
Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Fri, 12 Jul 2013 16:31:45 +0000 (12:31 -0400)]
NFS: Fix return type of nfs4_end_drain_session() stub
Clean up: when NFSv4.1 support is compiled out,
nfs4_end_drain_session() becomes a stub. Make the synopsis of the
stub match the synopsis of the real version of the function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4: encode_attrs should not backfill the bitmap and attribute length
The attribute length is already calculated in advance. There is no
reason why we cannot calculate the bitmap in advance too so that
we don't have to play pointer games.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha architecture fixes from Matt Turner:
"This contains mostly clean ups and fixes but also an implementation of
atomic64_dec_if_positive() and a pair of new syscalls"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: Use handle_percpu_irq for the timer interrupt
alpha: Force the user-visible HZ to a constant 1024.
alpha: Don't if-out dp264_device_interrupt.
alpha: Use __builtin_alpha_rpcc
alpha: Fix type compatibility warning for marvel_map_irq
alpha: Generate dwarf2 unwind info for various kernel entry points.
alpha: Implement atomic64_dec_if_positive
alpha: Improve atomic_add_unless
alpha: Modernize lib/mpi/longlong.h
alpha: Add kcmp and finit_module syscalls
alpha: locks: remove unused arch_*_relax operations
alpha: kernel: typo issue, using '1' instead of '11'
alpha: kernel: using memcpy() instead of strcpy()
alpha: Convert print_symbol to %pSR
Merge tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes and cleanups from Steven Rostedt:
"This contains fixes, optimizations and some clean ups
Some of the fixes need to go back to 3.10. They are minor, and deal
mostly with incorrect ref counting in accessing event files.
There was a couple of optimizations that should have perf perform a
bit better when accessing trace events.
And some various clean ups. Some of the clean ups are necessary to
help in a fix to a theoretical race between opening a event file and
deleting that event"
* tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()
tracing: Kill trace_array->waiter
tracing: Do not (ab)use trace_seq in event_id_read()
tracing: Simplify the iteration logic in f_start/f_next
tracing: Add ref_data to function and fgraph tracer structs
tracing: Miscellaneous fixes for trace_array ref counting
tracing: Fix error handling to ensure instances can always be removed
tracing/kprobe: Wait for disabling all running kprobe handlers
tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()
tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty
tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty
tracing: Typo fix on ring buffer comments
tracing: Use trace_seq_puts()/trace_seq_putc() where possible
tracing: Use correct config guard CONFIG_STACK_TRACER
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
"These are fixes collected over the last week, they fixes several
problems caused by the x86_pkg_temp_thermal introduced in 3.11-rc1.
Specifics:
- the x86_pkg_temp_thermal driver causes crash on systems with no
package MSR support as there is a bug in the logic to check
presence of DTHERM and PTS feature together. Added a change so
that when there is no PTS support, module doesn't get loaded.
- fix krealloc() misuse in pkg_temp_thermal_device_add().
If krealloc() returns NULL, it doesn't free the original. Thus if
we want to exit because of the krealloc() failure, we must make
sure the original one is freed.
- The error code path of the x86 package temperature thermal driver's
initialization routine makes an unbalanced call to
get_online_cpus(), which causes subsequent CPU offline operations,
and consequently system suspend, to permanently block in
cpu_hotplug_begin() on systems where get_core_online() returns an
error code.
Remove the extra get_online_cpus() to fix the problem"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
Thermal: Fix lockup of cpu_down()
Thermal: x86_pkg_temp: Limit number of pkg temp zones
Thermal: x86_pkg_temp: fix krealloc() misuse in in pkg_temp_thermal_device_add()
Thermal: x86 package temp thermal crash
Merge tag 'gpio-for-v3.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull gpio fixes from Linus Walleij:
"A first round of GPIO fixes for the v3.11 series:
- OMAP device tree boot fix
- Handle an error condition in the MSM driver
The OMAP patches have been around since around the merge window, but
since they first caused more breakage I let them boil in -next for a
while. These should be fine now"
* tag 'gpio-for-v3.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
drivers: gpio: msm: Fix the error condition for reading ngpio
gpio/omap: fix build error when OF_GPIO is not defined.
gpio/omap: auto request GPIO as input if used as IRQ via DT
gpio/omap: don't create an IRQ mapping for every GPIO on DT
Merge branch 'for-3.11/drivers' of git://git.kernel.dk/linux-block
Pull block IO driver bits from Jens Axboe:
"As I mentioned in the core block pull request, due to real life
circumstances the driver pull request would be late. Now it looks
like -rc2 late... On the plus side, apart form the rsxx update, these
are all things that I could argue could go in later in the cycle as
they are fixes and not features. So even though things are late, it's
not ALL bad.
The pull request contains:
- Updates to bcache, all bug fixes, from Kent.
- A pile of drbd bug fixes (no big features this time!).
- xen blk front/back fixes.
- rsxx driver updates, some of them deferred form 3.10. So should be
well cooked by now"
* 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits)
bcache: Allocation kthread fixes
bcache: Fix GC_SECTORS_USED() calculation
bcache: Journal replay fix
bcache: Shutdown fix
bcache: Fix a sysfs splat on shutdown
bcache: Advertise that flushes are supported
bcache: check for allocation failures
bcache: Fix a dumb race
bcache: Use standard utility code
bcache: Update email address
bcache: Delete fuzz tester
bcache: Document shrinker reserve better
bcache: FUA fixes
drbd: Allow online change of al-stripes and al-stripe-size
drbd: Constants should be UPPERCASE
drbd: Ignore the exit code of a fence-peer handler if it returns too late
drbd: Fix rcu_read_lock balance on error path
drbd: fix error return code in drbd_init()
drbd: Do not sleep inside rcu
bcache: Refresh usage docs
...
Steven Rostedt [Tue, 16 Jul 2013 18:02:28 +0000 (14:02 -0400)]
Thermal: Fix lockup of cpu_down()
Commit f1a18a105 "Thermal: CPU Package temperature thermal" had code
that did a get_online_cpus(), run a loop and then do a
put_online_cpus(). The problem is that the loop had an error exit that
would skip the put_online_cpus() part.
In the error exit part of the function, it also did a get_online_cpus(),
run a loop and then put_online_cpus(). The only way to get to the error
exit part is with get_online_cpus() already performed. If this error
condition is hit, the system will be prevented from taking CPUs offline.
The process taking the CPU offline will lock up hard.
Removing the get_online_cpus() removes the lockup as the hotplug CPU
refcount is back to zero.
This was bisected with ktest.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Merge tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI video support fixes from Rafael Wysocki:
"I'm sending a separate pull request for this as it may be somewhat
controversial. The breakage addressed here is not really new and the
fixes may not satisfy all users of the affected systems, but we've had
so much back and forth dance in this area over the last several weeks
that I think it's time to actually make some progress.
The source of the problem is that about a year ago we started to tell
BIOSes that we're compatible with Windows 8, which we really need to
do, because some systems shipping with Windows 8 are tested with it
and nothing else, so if we tell their BIOSes that we aren't compatible
with Windows 8, we expose our users to untested BIOS/AML code paths.
However, as it turns out, some Windows 8-specific AML code paths are
not tested either, because Windows 8 actually doesn't use the ACPI
methods containing them, so if we declare Windows 8 compatibility and
attempt to use those ACPI methods, things break. That occurs mostly
in the backlight support area where in particular the _BCM and _BQC
methods are plain unusable on some systems if the OS declares Windows
8 compatibility.
[ The additional twist is that they actually become usable if the OS
says it is not compatible with Windows 8, but that may cause
problems to show up elsewhere ]
Investigation carried out by Matthew Garrett indicates that what
Windows 8 does about backlight is to leave backlight control up to
individual graphics drivers. At least there's evidence that it does
that if the Intel graphics driver is used, so we've decided to follow
Windows 8 in that respect and allow i915 to control backlight (Daniel
likes that part).
The first commit from Aaron Lu makes ACPICA export the variable from
which we can infer whether or not the BIOS believes that we are
compatible with Windows 8.
The second commit from Matthew Garrett prepares the ACPI video driver
by making it initialize the ACPI backlight even if it is not going to
be used afterward (that is needed for backlight control to work on
Thinkpads).
The third commit implements the actual workaround making i915 take
over backlight control if the firmware thinks it's dealing with
Windows 8 and is based on the work of multiple developers, including
Matthew Garrett, Chun-Yi Lee, Seth Forshee, and Aaron Lu.
The final commit from Aaron Lu makes us follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled by
GUI.
Hopefully, this approach will allow us to avoid using blacklists of
systems that should not declare Windows 8 compatibility just to avoid
backlight control problems in the future.
- Change from Aaron Lu makes ACPICA export a variable which can be
used by driver code to determine whether or not the BIOS believes
that we are compatible with Windows 8.
- Change from Matthew Garrett makes the ACPI video driver initialize
the ACPI backlight even if it is not going to be used afterward
(that is needed for backlight control to work on Thinkpads).
- Fix from Rafael J Wysocki implements Windows 8 backlight support
workaround making i915 take over bakclight control if the firmware
thinks it's dealing with Windows 8. Based on the work of multiple
developers including Matthew Garrett, Chun-Yi Lee, Seth Forshee,
and Aaron Lu.
- Fix from Aaron Lu makes the kernel follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled
by GUI"
* tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: no automatic brightness changes by win8-compatible firmware
ACPI / video / i915: No ACPI backlight if firmware expects Windows 8
ACPI / video: Always call acpi_video_init_brightness() on init
ACPICA: expose OSI version
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext[34] tmpfile bugfix from Ted Ts'o:
"Fix regression caused by commit af51a2ac36d1f which added ->tmpfile()
support (along with a similar fix for ext3)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext3: fix a BUG when opening a file with O_TMPFILE flag
ext4: fix a BUG when opening a file with O_TMPFILE flag
Zheng Liu [Sun, 21 Jul 2013 02:03:20 +0000 (22:03 -0400)]
ext3: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always(). We can use the following program to trigger it:
Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink. So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk>
Zheng Liu [Sun, 21 Jul 2013 01:58:38 +0000 (21:58 -0400)]
ext4: fix a BUG when opening a file with O_TMPFILE flag
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always(). We can use the following program to trigger it:
Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink. So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.
Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Tested-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Merge tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging tree fixes from Greg KH:
"Here are a few iio driver fixes for 3.11-rc2. They are still spread
across drivers/iio and drivers/staging/iio so they are coming in
through this tree.
I've also removed the drivers/staging/csr/ driver as the developers
who originally sent it to me have moved on to other companies, and CSR
still will not send us the specs for the device, making the driver
pretty much obsolete and impossible to fix up. Deleting it now
prevents people from sending in lots of tiny codingsyle fixes that
will never go anywhere.
It also helps to offset the large lustre filesystem merge that
happened in 3.11-rc1 in the overall 3.11.0 diffstat. :)"
* tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: csr: remove driver
iio: lps331ap: Fix wrong in_pressure_scale output value
iio staging: fix lis3l02dq, read error handling
staging:iio:ad7291: add missing .driver_module to struct iio_info
iio: ti_am335x_adc: add missing .driver_module to struct iio_info
iio: mxs-lradc: Remove useless check in read_raw
iio: mxs-lradc: Fix misuse of iio->trig
iio: inkern: fix iio_convert_raw_to_processed_unlocked
iio: Fix iio_channel_has_info
iio:trigger: device_unregister->device_del to avoid double free
iio: dac: ad7303: fix error return code in ad7303_probe()
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"The sget() one is a long-standing bug and will need to go into -stable
(in fact, it had been originally caught in RHEL6), the other two are
3.11-only"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: constify dentry parameter in d_count()
livelock avoidance in sget()
allow O_TMPFILE to work with O_WRONLY
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"Fixes for 3.11-rc2, sent at 5pm, in the professoinal style. :-)"
I'm not sure I like this new level of "professionalism".
9-5, people, 9-5.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: call ext4_es_lru_add() after handling cache miss
ext4: yield during large unlinks
ext4: make the extent_status code more robust against ENOMEM failures
ext4: simplify calculation of blocks to free on error
ext4: fix error handling in ext4_ext_truncate()
Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix a regression against NFSv4 FreeBSD servers when creating a new
file
- Fix another regression in rpc_client_register()
* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix a regression against the FreeBSD server
SUNRPC: Fix another issue with rpc_client_register()
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next
Pull btrfs fixes from Josef Bacik:
"I'm playing the role of Chris Mason this week while he's on vacation.
There are a few critical fixes for btrfs here, all regressions and
have been tested well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next:
Btrfs: fix wrong write offset when replacing a device
Btrfs: re-add root to dead root list if we stop dropping it
Btrfs: fix lock leak when resuming snapshot deletion
Btrfs: update drop progress before stopping snapshot dropping
gpio/omap: fix build error when OF_GPIO is not defined.
The OMAP GPIO driver check if the chip has an associated
Device Tree node using the struct gpio_chip of_node member.
But this is only build if CONFIG_OF_GPIO is defined which
leads to the following error when using omap1_defconfig:
linux/drivers/gpio/gpio-omap.c: In function 'omap_gpio_chip_init':
linux/drivers/gpio/gpio-omap.c:1080:17: error: 'struct gpio_chip' has no member named 'of_node'
linux/drivers/gpio/gpio-omap.c: In function 'omap_gpio_irq_map':
linux/drivers/gpio/gpio-omap.c:1116:16: error: 'struct gpio_chip' has no member named 'of_node'
Reported-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
gpio/omap: auto request GPIO as input if used as IRQ via DT
When an OMAP GPIO is used as an IRQ line, a call to gpio_request()
has to be made to initialize the OMAP GPIO bank before a driver
request the IRQ. Otherwise the call to request_irq() fails.
Drives should not be aware of this neither care wether an IRQ line
is a GPIO or not. They should just request the IRQ and this has to
be handled by the irq_chip driver.
With the current OMAP GPIO DT binding, if we define:
The GPIO is correctly mapped as an IRQ but a call to gpio_request()
is never made. Since a call to the custom IRQ domain .map function
handler is made for each GPIO used as an IRQ, the GPIO can be setup
and configured as input there automatically.
Changes since v3:
- Use bank->chip.of_node instead of_have_populated_dt() to check
DT or legacy boot as suggested by Jean-Christophe PLAGNIOL-VILLARD
- Add a comment that this is just a temporary solution until and
that it has to be removed once is handled by the IRQ core.
Changes since v2:
- Only make the call to gpio_request_one() conditional in the DT
case as suggested by Grant Likely.
Changes since v1:
- Split the irq domain mapping function handler and the GPIO
request in two different patches.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Tested-by: Enric Balletbo i Serra <eballetbo@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
gpio/omap: don't create an IRQ mapping for every GPIO on DT
When a GPIO is defined as an interrupt line using Device
Tree, a call to irq_create_of_mapping() is made that calls
irq_create_mapping(). So, is not necessary to do the mapping
for all OMAP GPIO lines and explicitly call irq_create_mapping()
on the driver probe() when booting with Device Tree.
Add a custom IRQ domain .map function handler that will be
called by irq_create_mapping() to map the GPIO lines used as IRQ.
This also allows to execute needed setup code such as configuring
a GPIO as input and enabling the GPIO bank.
Changes since v3:
- Use bank->chip.of_node instead of_have_populated_dt() to check
DT or legacy boot as suggested by Jean-Christophe PLAGNIOL-VILLARD
Changes since v2:
- Unconditionally do the IRQ setup in the .map() function and
only call irq_create_mapping() in the gpio chip init to avoid
code duplication as suggested by Grant Likely.
Changes since v1:
- Split the addition of the .map function handler and the
automatic gpio request in two different patches.
- Add GPIO IRQ setup logic to the irq domain mapping function.
- Only call irq_create_mapping for every GPIO on legacy boot.
- Only setup a GPIO IRQ on the .map function for DeviceTree boot.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Tested-by: Enric Balletbo i Serra <eballetbo@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Al Viro [Fri, 19 Jul 2013 23:13:55 +0000 (03:13 +0400)]
livelock avoidance in sget()
Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive,
->s_active is 1. Along comes two more processes, trying to mount the same
thing; sget() in each is picking that superblock, bumping ->s_count and
trying to grab ->s_umount. ->s_active is 3 now. Original mount(2)
finally gets to deactivate_locked_super() on failure; ->s_active is 2,
superblock is still ->fs_supers because shutdown will *not* happen until
->s_active hits 0. ->s_umount is dropped and now we have two processes
chasing each other:
s_active = 2, A acquired ->s_umount, B blocked
A sees that the damn thing is stillborn, does deactivate_locked_super()
s_active = 1, A drops ->s_umount, B gets it
A restarts the search and finds the same superblock. And bumps it ->s_active.
s_active = 2, B holds ->s_umount, A blocked on trying to get it
... and we are in the earlier situation with A and B switched places.
The root cause, of course, is that ->s_active should not grow until we'd
got MS_BORN. Then failing ->mount() will have deactivate_locked_super()
shut the damn thing down. Fortunately, it's easy to do - the key point
is that grab_super() is called only for superblocks currently on ->fs_supers,
so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
bump ->s_active; we must never increment ->s_count for superblocks past
->kill_sb(), but grab_super() is never called for those.
The bug is pretty old; we would've caught it by now, if not for accidental
exclusion between sget() for block filesystems; the things like cgroup or
e.g. mtd-based filesystems don't have anything of that sort, so they get
bitten. The right way to deal with that is obviously to fix sget()...
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
"Special thanks goes to Toralf Föster for continuously testing UML and
reporting issues!"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove dead code
um: siginfo cleanup
uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases
um: Fix wait_stub_done() error handling
um: Mark stub pages mapping with VM_PFNMAP
um: Fix return value of strnlen_user()
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 3.11. Half of then is for Netlogic the remainder
touches things across arch/mips.
Nothing really dramatic and by rc1 standards MIPS will be in fairly
good shape with this applied. Tested by building all MIPS defconfigs
of which with this pull request four platforms won't build. And yes,
it boots also on my favorite test systems"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: kvm: Kconfig: Drop HAVE_KVM dependency from VIRTUALIZATION
MIPS: Octeon: Fix DT pruning bug with pip ports
MIPS: KVM: Mark KVM_GUEST (T&E KVM) as BROKEN_ON_SMP
MIPS: tlbex: fix broken build in v3.11-rc1
MIPS: Netlogic: Add XLP PIC irqdomain
MIPS: Netlogic: Fix USB block's coherent DMA mask
MIPS: tlbex: Fix typo in r3000 tlb store handler
MIPS: BMIPS: Fix thinko to release slave TP from reset
MIPS: Delete dead invocation of exception_exit().
Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
Pull arm64 fixes from Catalin Marinas:
- Post -rc1 update to the common reboot infrastructure.
- Fixes (user cache maintenance fault handling, !COMPAT compilation,
CPU online and interrupt hanlding).
* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: use common reboot infrastructure
arm64: mm: don't treat user cache maintenance faults as writes
arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler()
arm64: Only enable local interrupts after the CPU is marked online
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"An update for the BFP jit to the latest and greatest, two patches to
get kdump working again, the random-abort ptrace extention for
transactional execution, the z90crypt module alias for ap and a tiny
cleanup"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Alias for new zcrypt device driver base module
s390/kdump: Allow copy_oldmem_page() copy to virtual memory
s390/kdump: Disable mmap for s390
s390/bpf,jit: add pkt_type support
s390/bpf,jit: address randomize and write protect jit code
s390/bpf,jit: use generic jit dumper
s390/bpf,jit: call module_free() from any context
s390/qdio: remove unused variable
s390/ptrace: PTRACE_TE_ABORT_RAND