All we really want is the ability to retrieve the root file handle. We no
longer need the ability to walk down the path, since that is now done in
nfs_follow_remote_path().
NFS: Add helper functions for allocating filehandles and fattr structs
NFS Filehandles and struct fattr are really too large to be allocated on
the stack. This patch adds in a couple of helper functions to allocate them
dynamically instead.
Kevin Coffman [Wed, 17 Mar 2010 17:03:05 +0000 (13:03 -0400)]
gss_krb5: Use confounder length in wrap code
All encryption types use a confounder at the beginning of the
wrap token. In all encryption types except arcfour-hmac, the
confounder is the same as the blocksize. arcfour-hmac has a
blocksize of one, but uses an eight byte confounder.
Add an entry to the crypto framework definitions for the
confounder length and change the wrap/unwrap code to use
the confounder length rather than assuming it is always
the blocksize.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Kevin Coffman [Wed, 17 Mar 2010 17:03:04 +0000 (13:03 -0400)]
gssd_krb5: More arcfour-hmac support
For the arcfour-hmac support, the make_seq_num and get_seq_num
functions need access to the kerberos context structure.
This will be used in a later patch.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Kevin Coffman [Wed, 17 Mar 2010 17:02:59 +0000 (13:02 -0400)]
gss_krb5: add support for new token formats in rfc4121
This is a step toward support for AES encryption types which are
required to use the new token formats defined in rfc4121.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
[SteveD: Fixed a typo in gss_verify_mic_v2()] Signed-off-by: Steve Dickson <steved@redhat.com>
[Trond: Got rid of the TEST_ROTATE/TEST_EXTRA_COUNT crap] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
gss_krb5: Add upcall info indicating supported kerberos enctypes
The text based upcall now indicates which Kerberos encryption types are
supported by the kernel rpcsecgss code. This is used by gssd to
determine which encryption types it should attempt to negotiate
when creating a context with a server.
The server principal's database and keytab encryption types are
what limits what it should negotiate. Therefore, its keytab
should be created with only the enctypes listed by this file.
Currently we support des-cbc-crc, des-cbc-md4 and des-cbc-md5
Kevin Coffman [Wed, 17 Mar 2010 17:02:54 +0000 (13:02 -0400)]
gss_krb5: handle new context format from gssd
For encryption types other than DES, gssd sends down context information
in a new format. This new format includes the information needed to
support the new Kerberos GSS-API tokens defined in rfc4121.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Kevin Coffman [Wed, 17 Mar 2010 17:02:53 +0000 (13:02 -0400)]
gss_krb5: import functionality to derive keys into the kernel
Import the code to derive Kerberos keys from a base key into the
kernel. This will allow us to change the format of the context
information sent down from gssd to include only a single key.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Kevin Coffman [Wed, 17 Mar 2010 17:02:52 +0000 (13:02 -0400)]
gss_krb5: add ability to have a keyed checksum (hmac)
Encryption types besides DES may use a keyed checksum (hmac).
Modify the make_checksum() function to allow for a key
and take care of enctype-specific processing such as truncating
the resulting hash.
Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Linus Torvalds [Fri, 14 May 2010 18:49:42 +0000 (11:49 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: don't leak user struct on inotify release
inotify: race use after free/double free in inotify inode marks
inotify: clean up the inotify_add_watch out path
Inotify: undefined reference to `anon_inode_getfd'
Manual merge to remove duplicate "select ANON_INODES" from Kconfig file
Sergei Shtylyov [Thu, 13 May 2010 18:51:51 +0000 (22:51 +0400)]
DA830: fix USB 2.0 clock entry
DA8xx OHCI driver fails to load due to failing clk_get() call for the USB 2.0
clock. Arrange matching USB 2.0 clock by the clock name instead of the device.
(Adding another CLK() entry for "ohci.0" device won't do -- in the future I'll
also have to enable USB 2.0 clock to configure CPPI 4.1 module, in which case
I won't have any device at all.)
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Pavel Emelyanov [Wed, 12 May 2010 22:34:07 +0000 (15:34 -0700)]
inotify: don't leak user struct on inotify release
inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user. The problem is that free_uid() is
never called on it.
Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Tue, 11 May 2010 21:17:40 +0000 (17:17 -0400)]
inotify: race use after free/double free in inotify inode marks
There is a race in the inotify add/rm watch code. A task can find and
remove a mark which doesn't have all of it's references. This can
result in a use after free/double free situation.
Task A Task B
------------ -----------
inotify_new_watch()
allocate a mark (refcnt == 1)
add it to the idr
inotify_rm_watch()
inotify_remove_from_idr()
fsnotify_put_mark()
refcnt hits 0, free
take reference because we are on idr
[at this point it is a use after free]
[time goes on]
refcnt may hit 0 again, double free
The fix is to take the reference BEFORE the object can be found in the
idr.
Signed-off-by: Eric Paris <eparis@redhat.com> Cc: <stable@kernel.org>
Linus Torvalds [Fri, 14 May 2010 14:29:29 +0000 (07:29 -0700)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Fix module loading on system with WB cache
microblaze: export assembly functions used by modules
microblaze: Remove powerpc code from Microblaze port
microblaze: Remove compilation warnings in cache macro
microblaze: export assembly functions used by modules
microblaze: fix get_user/put_user side-effects
microblaze: re-enable interrupts before calling schedule
Redirecting directly to lsm, here's the patch discussed on lkml:
http://lkml.org/lkml/2010/4/22/219
The mmap_min_addr value is useful information for an admin to see without
being root ("is my system vulnerable to kernel NULL pointer attacks?") and
its setting is trivially easy for an attacker to determine by calling
mmap() in PAGE_SIZE increments starting at 0, so trying to keep it private
has no value.
Only require CAP_SYS_RAWIO if changing the value, not reading it.
Comment from Serge :
Me, I like to write my passwords with light blue pen on dark blue
paper, pasted on my window - if you're going to get my password, you're
gonna get a headache.
Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
(cherry picked from commit 822cceec7248013821d655545ea45d1c6a9d15b3)
Linus Torvalds [Thu, 13 May 2010 21:36:19 +0000 (14:36 -0700)]
Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Keep index within boundaries in kvmppc_44x_emul_tlbwe()
KVM: VMX: blocked-by-sti must not defer NMI injections
KVM: x86: Call vcpu_load and vcpu_put in cpuid_update
KVM: SVM: Fix wrong intercept masks on 32 bit
KVM: convert ioapic lock to spinlock
serial: imx.c: fix CTS trigger level lower to avoid lost chars
The imx CTS trigger level is left at its reset value that is 32
chars. Since the RX FIFO has 32 entries, when CTS is raised, the
FIFO already is full. However, some serial port devices first empty
their TX FIFO before stopping when CTS is raised, resulting in lost
chars.
This patch sets the trigger level lower so that other chars arrive
after CTS is raised, there is still room for 16 of them.
Alan Cox [Tue, 4 May 2010 19:42:36 +0000 (20:42 +0100)]
tty: Fix unbalanced BKL handling in error path
Arnd noted:
After the "retry_open:" label, we first get the tty_mutex
and then the BKL. However a the end of tty_open, we jump
back to retry_open with the BKL still held. If we run into
this case, the tty_open function will be left with the BKL
still held.
Jan Kara [Thu, 13 May 2010 10:52:57 +0000 (12:52 +0200)]
vfs: Fix O_NOFOLLOW behavior for paths with trailing slashes
According to specification
mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY)
should return success but currently it returns ELOOP. This is a
regression caused by path lookup cleanup patch series.
Fix the code to ignore O_NOFOLLOW in case the provided path has trailing
slashes.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 13 May 2010 14:35:26 +0000 (07:35 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: ice1724 - Fix ESI Maya44 capture source control
ALSA: pcm - Use pgprot_noncached() for MIPS non-coherent archs
ALSA: virtuoso: fix Xonar D1/DX front panel microphone
ALSA: hda - Add hp-dv4 model for IDT 92HD71bx
ALSA: hda - Fix mute-LED GPIO pin for HP dv series
ALSA: hda: Fix 0 dB for Lenovo models using Conexant CX20549 (Venice)
Linus Torvalds [Thu, 13 May 2010 14:28:43 +0000 (07:28 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ad7877 - keep dma rx buffers in seperate cache lines
Input: psmouse - reset all types of mice before reconnecting
Input: elantech - use all 3 bytes when checking version
Input: iforce - fix Guillemot Jet Leader 3D entry
Input: iforce - add Guillemot Jet Leader Force Feedback
Mark Brown [Fri, 2 Apr 2010 12:08:39 +0000 (13:08 +0100)]
mfd: Clean up after WM83xx AUXADC interrupt if it arrives late
In certain circumstances, especially under heavy load, the AUXADC
completion interrupt may be detected after we've timed out waiting for
it. That conversion would still succeed but the next conversion will
see the completion that was signalled by the interrupt for the previous
conversion and therefore not wait for the AUXADC conversion to run,
causing it to report failure.
Provide a simple, non-invasive cleanup by using try_wait_for_completion()
to ensure that the completion is not signalled before we wait. Since
the AUXADC is run within a mutex we know there can only have been at
most one AUXADC interrupt outstanding. A more involved change should
follow for the next merge window.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Michal Simek [Thu, 13 May 2010 08:55:47 +0000 (10:55 +0200)]
microblaze: Remove compilation warnings in cache macro
CC arch/microblaze/kernel/cpu/cache.o
arch/microblaze/kernel/cpu/cache.c: In function '__invalidate_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:398: warning: ISO C90 forbids mixed declarations and code
arch/microblaze/kernel/cpu/cache.c: In function '__flush_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:509: warning: ISO C90 forbids mixed declara
With dma based spi transmission, data corruption is observed
occasionally. With dma buffers located right next to msg and
xfer fields, cache lines correctly flushed in preparation for
dma usage may be polluted again when writing to fields in the
same cache line.
Make sure cache fields used with dma do not share cache lines
with fields changed during dma handling. As both fields are part
of a struct that is allocated via kzalloc, thus cache aligned,
moving the fields to the 1st position and insert padding for
alignment does the job.
Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Daniel Glöckner <dg@emlix.com> Signed-off-by: Oliver Schneidewind <osw@emlix.com> Signed-off-by: Johannes Weiner <jw@emlix.com> Acked-by: Mike Frysinger <vapier@gentoo.org>
[dtor@mail.ru - changed to use ___cacheline_aligned as suggested
by akpm] Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dmitry Torokhov [Thu, 13 May 2010 07:42:23 +0000 (00:42 -0700)]
Input: psmouse - reset all types of mice before reconnecting
Synaptics hardware requires resetting device after suspend to ram
in order for the device to be operational. The reset lives in
synaptics-specific reconnect handler, but it is not being invoked
if synaptics support is disabled and the device is handled as a
standard PS/2 device (bare or IntelliMouse protocol).
Let's add reset into generic reconnect handler as well.
The Microblaze implementations of get_user() and (MMU) put_user() evaluate
the address argument more than once. This causes unexpected side-effects for
invocations that include increment operators, i.e. get_user(foo, bar++).
This patch also removes the distinction between MMU and noMMU put_user().
Without the patch:
$ echo 1234567890 > /proc/sys/kernel/core_pattern
$ cat /proc/sys/kernel/core_pattern
12345
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Jan Kiszka [Tue, 11 May 2010 13:16:46 +0000 (15:16 +0200)]
KVM: VMX: blocked-by-sti must not defer NMI injections
As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.
Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.
Joerg Roedel [Wed, 5 May 2010 14:04:43 +0000 (16:04 +0200)]
KVM: SVM: Fix wrong intercept masks on 32 bit
This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.
Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Linus Torvalds [Thu, 13 May 2010 01:48:26 +0000 (18:48 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/perf_event: Fix oops due to perf_event_do_pending call
powerpc/swiotlb: Fix off by one in determining boundary of which ops to use
Linus Torvalds [Thu, 13 May 2010 01:47:55 +0000 (18:47 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] correct address of _stext with CONFIG_SHARED_KERNEL=y
[S390] ptrace: fix return value of do_syscall_trace_enter()
[S390] dasd: fix race between tasklet and dasd_sleep_on
Linus Torvalds [Thu, 13 May 2010 01:47:29 +0000 (18:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: preserve seq # on requeued messages after transient transport errors
ceph: fix cap removal races
ceph: zero unused message header, footer fields
ceph: fix locking for waking session requests after reconnect
ceph: resubmit requests on pg mapping change (not just primary change)
ceph: fix open file counting on snapped inodes when mds returns no caps
ceph: unregister osd request on failure
ceph: don't use writeback_control in writepages completion
ceph: unregister bdi before kill_anon_super releases device name
Linus Torvalds [Thu, 13 May 2010 01:39:45 +0000 (18:39 -0700)]
Revert "PCI: update bridge resources to get more big ranges in PCI assign unssigned"
This reverts commit 977d17bb1749517b353874ccdc9b85abc7a58c2a, because it
can cause problems with some devices not getting any resources at all
when the resource tree is re-allocated.
David Howells [Wed, 12 May 2010 14:34:03 +0000 (15:34 +0100)]
CacheFiles: Fix error handling in cachefiles_determine_cache_security()
cachefiles_determine_cache_security() is expected to return with a
security override in place. However, if set_create_files_as() fails, we
fail to do this. In this case, we should just reinstate the security
override that was set by the caller.
Furthermore, if set_create_files_as() fails, we should dispose of the
new credentials we were in the process of creating.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rwsem: Test for no active locks in __rwsem_do_wake undo code
If there are no active threasd using a semaphore, it is always correct
to unqueue blocked threads. This seems to be what was intended in the
undo code.
What was done instead, was to look for a sem count of zero - this is an
impossible situation, given that at least one thread is known to be
queued on the semaphore. The code might be correct as written, but it's
hard to reason about and it's not what was intended (otherwise the goto
out would have been unconditional).
Go for checking the active count - the alternative is not worth the
headache.
Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to memory-barriers.txt, an smp memory barrier in guest
should always be paired with an smp memory barrier in host,
and I quote "a lack of appropriate pairing is almost certainly an
error". In case of vhost, failure to flush out used index
update before looking at the interrupt disable flag
could result in missed interrupts, resulting in
networking hang under stress.
This might happen when flags read bypasses used index write.
So we see interrupts disabled and do not interrupt, at the
same time guest writes flags value to enable interrupt,
reads an old used index value, thinks that
used ring is empty and waits for interrupt.
Note: the barrier we pair with here is in
drivers/virtio/virtio_ring.c, function
vring_enable_cb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Juan Quintela <quintela@redhat.com>
Russell King [Sun, 18 Apr 2010 20:25:11 +0000 (21:25 +0100)]
Inotify: undefined reference to `anon_inode_getfd'
Fix:
fs/built-in.o: In function `sys_inotify_init1':
summary.c:(.text+0x347a4): undefined reference to `anon_inode_getfd'
found by kautobuild with arms bcmring_defconfig, which ends up with
INOTIFY_USER enabled (through the 'default y') but leaves ANON_INODES
unset. However, inotify_user.c uses anon_inode_getfd().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Eric Paris <eparis@redhat.com>
Takashi Iwai [Wed, 12 May 2010 14:43:32 +0000 (16:43 +0200)]
ALSA: ice1724 - Fix ESI Maya44 capture source control
The capture source control of maya44 was wrongly coded with the bit
shift instead of the bit mask. Also, the slot for line-in was
wrongly assigned (slot 5 instead of 4).
Reported-by: Alex Chernyshoff <alexdsp@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Wed, 12 May 2010 08:32:42 +0000 (10:32 +0200)]
ALSA: pcm - Use pgprot_noncached() for MIPS non-coherent archs
MIPS non-coherent archs need the noncached pgprot in mmap of PCM buffers.
But, since the coherency needs to be checked dynamically via
plat_device_is_coherent(), we need an ugly check dependent on MIPS
in ALSA core code.
This should be cleaned up in MIPS arch side (e.g. creating
dma_mmap_coherent()) in near future.
Clemens Ladisch [Tue, 11 May 2010 14:34:39 +0000 (16:34 +0200)]
ALSA: virtuoso: fix Xonar D1/DX front panel microphone
Commit 65c3ac885ce9852852b895a4a62212f62cb5f2e9 in 2.6.33 accidentally
left out the initialization of the AC97 codec FMIC2MIC bit, which broke
recording from the front panel microphone.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@kernel.org> Signed-off-by: Jaroslav Kysela <perex@perex.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Wed, 12 May 2010 08:16:20 +0000 (10:16 +0200)]
ALSA: hda - Add hp-dv4 model for IDT 92HD71bx
It turned out that HP dv series have inconsistent the mute-LED GPIO
mapping among various models. dv4/7 seem to use GPIO 0 while dv 5/6
seem to use GPIO 3. The previous commit 26ebe0a28986f4845b2c5bea43ac5cc0b9f27f0a
ALSA: hda - Fix mute-LED GPIO pin for HP dv series
breaks dv5/6.
This patch adds the new quirk model, hp-dv4, to handle HP dv4/7
separately from HP dv5/6.
[S390] correct address of _stext with CONFIG_SHARED_KERNEL=y
As of git commit 1844c9bc0b2fed3023551c1affe033ab38e90b9a head64.S/head31.S
are not included in head.S anymore but build as an extra object. This breaks
shared kernel support because the .org statement in head64.S/head31.S for
CONFIG_SHARED_KERNEL=y will have a different effect. The end address of the
head.text section in head.o will be added to the .org value, to compensate
for this subtract 0x11000 to get the required value of 0x100000 again.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Gerald Schaefer [Wed, 12 May 2010 07:32:12 +0000 (09:32 +0200)]
[S390] ptrace: fix return value of do_syscall_trace_enter()
strace may change the system call number, so regs->gprs[2] must not
be read before tracehook_report_syscall_entry(). This fixes a bug
where "strace -f" will hang after a vfork().
Cc: <stable@kernel.org> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Weinhuber [Wed, 12 May 2010 07:32:11 +0000 (09:32 +0200)]
[S390] dasd: fix race between tasklet and dasd_sleep_on
The various dasd_sleep_on functions use a global wait queue when
waiting for a cqr. The wait condition checks the status and devlist
fields of the cqr to determine if it is safe to continue. This
evaluation may return true, although the tasklet has not finished
processing of the cqr and the callback function has not been called
yet. When the callback is finally called, the data in the cqr may
already be invalid. The sleep_on wait condition needs a safe way to
determine if the tasklet has finished processing. Use the
callback_data field of the cqr to store a token, which is set by
the callback function itself.
Cc: <stable@kernel.org> Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Paul Mackerras [Tue, 13 Apr 2010 20:46:04 +0000 (20:46 +0000)]
powerpc/perf_event: Fix oops due to perf_event_do_pending call
Anton Blanchard found that large POWER systems would occasionally
crash in the exception exit path when profiling with perf_events.
The symptom was that an interrupt would occur late in the exit path
when the MSR[RI] (recoverable interrupt) bit was clear. Interrupts
should be hard-disabled at this point but they were enabled. Because
the interrupt was not recoverable the system panicked.
The reason is that the exception exit path was calling
perf_event_do_pending after hard-disabling interrupts, and
perf_event_do_pending will re-enable interrupts.
The simplest and cleanest fix for this is to use the same mechanism
that 32-bit powerpc does, namely to cause a self-IPI by setting the
decrementer to 1. This means we can remove the tests in the exception
exit path and raw_local_irq_restore.
This also makes sure that the call to perf_event_do_pending from
timer_interrupt() happens within irq_enter/irq_exit. (Note that
calling perf_event_do_pending from timer_interrupt does not mean that
there is a possible 1/HZ latency; setting the decrementer to 1 ensures
that the timer interrupt will happen immediately, i.e. within one
timebase tick, which is a few nanoseconds or 10s of nanoseconds.)
Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Sage Weil [Wed, 12 May 2010 04:20:38 +0000 (21:20 -0700)]
ceph: preserve seq # on requeued messages after transient transport errors
If the tcp connection drops and we reconnect to reestablish a stateful
session (with the mds), we need to resend previously sent (and possibly
received) messages with the _same_ seq # so that they can be dropped on
the other end if needed. Only assign a new seq once after the message is
queued.
Sage Weil [Wed, 12 May 2010 03:56:31 +0000 (20:56 -0700)]
ceph: fix cap removal races
The iterate_session_caps helper traverses the session caps list and tries
to grab an inode reference. However, the __ceph_remove_cap was clearing
the inode backpointer _before_ removing itself from the session list,
causing a null pointer dereference.
Clear cap->ci under protection of s_cap_lock to avoid the race, and to
tightly couple the list and backpointer state. Use a local flag to
indicate whether we are releasing the cap, as cap->session may be modified
by a racing thread in iterate_session_caps.