NFSv4 mounts ignore the rsize and wsize mount options, and always use
the default transfer size for both. This seems to be because all
NFSv4 mounts are now cloned, and the cloning logic doesn't copy the
rsize and wsize settings from the parent nfs_server.
I tested Fedora's 2.6.32.11-99 and it seems to have this problem as
well, so I'm guessing that .33, .32, and perhaps older kernels have
this issue as well.
If dentry found stale happens to be a root of disconnected tree, we
can't d_drop() it; its d_hash is actually part of s_anon and d_drop()
would simply hide it from shrink_dcache_for_umount(), leading to
all sorts of fun, including busy inodes on umount and oopsen after
that.
Bug had been there since at least 2006 (commit c636eb already has it),
so it's definitely -stable fodder.
In reflink we update the id info on the disk but forgot to update
the corresponding information in the VFS inode. Update them
accordingly when we want to preserve the attributes.
Reported-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For periodic endpoints, we must let the xHCI hardware know the maximum
payload an endpoint can transfer in one service interval. The xHCI
specification refers to this as the Maximum Endpoint Service Interval Time
Payload (Max ESIT Payload). This is used by the hardware for bandwidth
management and scheduling of packets.
For SuperSpeed endpoints, the maximum is calculated by multiplying the max
packet size by the number of bursts and the number of opportunities to
transfer within a service interval (the Mult field of the SuperSpeed
Endpoint companion descriptor). Devices advertise this in the
wBytesPerInterval field of their SuperSpeed Endpoint Companion Descriptor.
For high speed devices, this is taken by multiplying the max packet size by the
"number of additional transaction opportunities per microframe" (the high
bits of the wMaxPacketSize field in the endpoint descriptor).
For FS/LS devices, this is just the max packet size.
The other thing we must set in the endpoint context is the Average TRB
Length. This is supposed to be the average of the total bytes in the
transfer descriptor (TD), divided by the number of transfer request blocks
(TRBs) it takes to describe the TD. This gives the host controller an
indication of whether the driver will be enqueuing a scatter gather list
with many entries comprised of small buffers, or one contiguous buffer.
It also takes into account the number of extra TRBs you need for every TD.
This includes No-op TRBs and Link TRBs used to link ring segments
together. Some drivers may choose to chain an Event Data TRB on the end
of every TD, thus increasing the average number of TRBs per TD. The Linux
xHCI driver does not use Event Data TRBs.
In theory, if there was an API to allow drivers to state what their
bandwidth requirements are, we could set this field accurately. For now,
we set it to the same number as the Max ESIT payload.
The Average TRB Length should also be set for bulk and control endpoints,
but I have no idea how to guess what it should be.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A SuperSpeed interrupt or isochronous endpoint can define the number of
"burst transactions" it can handle in a service interval. This is
indicated by the "Mult" bits in the bmAttributes of the SuperSpeed
Endpoint Companion Descriptor. For example, if it has a max packet size
of 1024, a max burst of 11, and a mult of 3, the host may send 33
1024-byte packets in one service interval.
We must tell the xHCI host controller the number of multiple service
opportunities (mults) the device can handle when the endpoint is
installed. We do that by setting the Mult field of the Endpoint Context
before a configure endpoint command is sent down. The Mult field is
invalid for control or bulk SuperSpeed endpoints.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1371) fixes a small bug in ohci-hcd. The HCD already
knows how many ports the controller has; there's no need to go looking
at the root hub's usb_device structure to find out. Especially since
the root hub's maxchild value is set correctly only while the root hub
is bound to the hub driver.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1372) fixes a bug in the routine that chooses the
default configuration to install when a new USB device is detected.
The algorithm is supposed to look for a config whose first interface
is for a non-vendor-specific class. But the way it's currently
written, it will also accept a config with no interfaces at all, which
is not very useful. (Believe it or not, such things do exist.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Andrew Victor <avictor.za@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a typo here. We should be testing "*dentry" which was just
assigned instead of "dentry". This could result in dereferencing an
ERR_PTR inside either usbfs_mkdir() or usbfs_create().
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While looping over the interfaces, if usb_hcd_alloc_bandwidth() fails it calls
hcd->driver->reset_bandwidth(), so there was no need to reinstate the interface
again.
If no break occurred, the index equals config->desc.bNumInterfaces. A
subsequent usb_control_msg() failure resulted in a read from
config->interface[config->desc.bNumInterfaces] at label reset_old_alts.
In either case the last interface should be skipped.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1363b) changes the way USB remote wakeup is handled
during system sleeps. It won't be enabled unless an interface driver
specifically needs it. Also, it won't be enabled during the FREEZE or
QUIESCE phases of hibernation, when the system doesn't respond to
wakeup events anyway.
This will fix problems people have reported with certain USB webcams
that generate wakeup requests when they shouldn't, and as a result
cause system suspends to fail. See
When detaching a port from the client side (usbip --detach 0),
the event thread, on the server side, is going to deadlock.
The "eh" server thread is getting USBIP_EH_RESET event and calls:
-> stub_device_reset() -> usb_reset_device()
the USB framework is then calling back _in the same "eh" thread_ :
-> stub_disconnect() -> usbip_stop_eh() -> wait_for_completion()
the "eh" thread is being asleep forever, waiting for its own completion.
This patch checks if "eh" is the current thread, in usbip_stop_eh().
Signed-off-by: Eric Lescouet <eric@lescouet.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The driver needs specific PHY and board support code for each SFC4000
board; there is no point trying to continue if it is missing.
Currently unsupported boards can trigger an 'oops'.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The original code would wait indefinitely if MAC stats DMA failed.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The request_key() system call and request_key_and_link() should make a
link from an existing key to the destination keyring (if supplied), not
just from a new key to the destination keyring.
This can be tested by:
ring=`keyctl newring fred @s`
keyctl request2 user debug:a a
keyctl request user debug:a $ring
keyctl list $ring
If it says:
keyring is empty
then it didn't work. If it shows something like:
1 key in keyring: 1070462727: --alswrv 0 0 user: debug:a
then it did.
request_key() system call is meant to recursively search all your keyrings for
the key you desire, and, optionally, if it doesn't exist, call out to userspace
to create one for you.
If request_key() finds or creates a key, it should, optionally, create a link
to that key from the destination keyring specified.
Therefore, if, after a successful call to request_key() with a desination
keyring specified, you see the destination keyring empty, the code didn't work
correctly.
If you see the found key in the keyring, then it did - which is what the patch
is required for.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to. So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.
We never encountered thsi in testing because typically the only
operations which use more than two pages are write-like operations,
which have their own decoding logic. Something like a getattr after a
write may cross a page boundary, but it would be very unusual for it to
cross another boundary after that.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 48b32a3553a54740d236b79a90f20147a25875e3 ("reiserfs: use generic
xattr handlers") introduced a problem that causes corruption when extended
attributes are replaced with a smaller value.
The issue is that the reiserfs_setattr to shrink the xattr file was moved
from before the write to after the write.
The root issue has always been in the reiserfs xattr code, but was papered
over by the fact that in the shrink case, the file would just be expanded
again while the xattr was written.
The end result is that the last 8 bytes of xattr data are lost.
Commit 677c9b2e393a0cd203bd54e9c18b012b2c73305a ("reiserfs: remove
privroot hiding in lookup") removed the magic from the lookup code to hide
the .reiserfs_priv directory since it was getting loaded at mount-time
instead. The intent was that the entry would be hidden from the user via
a poisoned d_compare, but this was faulty.
This introduced a security issue where unprivileged users could access and
modify extended attributes or ACLs belonging to other users, including
root.
This patch resolves the issue by properly hiding .reiserfs_priv. This was
the intent of the xattr poisoning code, but it appears to have never
worked as expected. This is fixed by using d_revalidate instead of
d_compare.
This patch makes -oexpose_privroot a no-op. I'm fine leaving it this way.
The effort involved in working out the corner cases wrt permissions and
caching outweigh the benefit of the feature.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Reported-by: Matt McCutchen <matt@mattmccutchen.net> Tested-by: Matt McCutchen <matt@mattmccutchen.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a futex key happens to be located within a huge page mapped
MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
page->mapping that will never exist.
See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
about the problem.
This patch makes page->mapping a poisoned value that includes
PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to
continue but because of PAGE_MAPPING_ANON, the poisoned value is not
dereferenced or used by futex. No other part of the VM should be
dereferencing the page->mapping of a hugetlbfs page as its page cache is
not on the LRU.
This patch fixes the problem with the test case described in the bugzilla.
[akpm@linux-foundation.org: mel cant spel] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Darren Hart <darren@dvhart.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the addba timer expires but has no work to do,
it should not affect the state machine. If it does,
TX will not see the successfully established and we
can also crash trying to re-establish the session.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a signal is pending (task being killed by sigkill)
__mem_cgroup_try_charge will write NULL into &mem, and css_put will oops
on null pointer dereference.
Pid: 5299, comm: largepages Tainted: G W 2.6.34-rc3 #3 Penryn1600SLI-110dB/To Be Filled By O.E.M.
RIP: 0010:[<ffffffff810fc6cc>] [<ffffffff810fc6cc>] mem_cgroup_prepare_migration+0x7c/0xc0
Fix regression caused by commit 507e2fbaaacb6f164b4125b87c5002f95143174b
("w1: w1 temp calculation overflow fix") whereby negative temperatures for
the DS18B20 are not converted properly.
When the temperature exceeds 32767 milli-degrees the temperature overflows
to -32768 millidegrees. These are both well within the -55 - +125 degree
range for the sensor.
Current code is definitely crap: Largest pitch allowed spills into
the TILING_Y bit of the fence registers ... :(
I've rewritten the limits check under the assumption that 3rd gen hw
has a 3d pitch limit of 8kb (like 2nd gen). This is supported by an
otherwise totally misleading XXX comment.
This bug mostly resulted in tiling-corrupted pixmaps because the kernel
allowed too wide buffers to be tiled. Bug brought to the light by the
xf86-video-intel 2.11 release because that unconditionally enabled
tiling for pixmaps, relying on the kernel to check things. Tiling for
the framebuffer was not affected because the ddx does some additional
checks there ensure the buffer is within hw-limits.
v2: Instead of computing the value that would be written into the
hw fence registers and then checking the limits simply check whether
the stride is above the 8kb limit. To better document the hw, add
some WARN_ONs in i915_write_fence_reg like I've done for the i830
case (using the right limits).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27449 Tested-by: Alexander Lam <lambchop468@gmail.com> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[needed for stable as it's just a bunch of macros that other drm patches
need, it changes no code functionality besides adding support for a new
device type. - gregkh]
Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1369) fixes a problem in ehci-hcd. Some controllers
occasionally run into trouble when the driver reclaims siTDs too
quickly. This can happen while streaming audio; it causes the
controller to crash.
The patch changes siTD reclamation to work the same way as iTD
reclamation: Completed siTDs are stored on a list and not reused until
at least one frame has passed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Nate Case <ncase@xes-inc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Brandon Philips noted that I had a spacing issue in my printk for the
last r8169 patch that made it quite ugly. Fix that up and add the PFX
macro to it as well so it looks like the other r8169 printks
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If we boot into a crash-kernel the gart might still be
enabled and its caches might be dirty. This can result in
undefined behavior later. Fix it by explicitly disabling the
gart hardware before initialization and flushing the caches
after enablement.
This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.
- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode. The means that the following sequence
Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().
Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We intercept #BP while in guest debugging mode. As VM exits due to
intercepted exceptions do not necessarily come with valid
idt_vectoring, we have to update event_exit_inst_len explicitly in such
cases. At least in the absence of migration, this ensures that
re-injections of #BP will find and use the correct instruction length.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Compiling 2.6.33 with SMP enabled and HOTPLUG_CPU disabled gives me the
following link errors:
LD init/built-in.o
LD .tmp_vmlinux1
arch/powerpc/platforms/built-in.o: In function `.smp_xics_setup_cpu':
smp.c:(.devinit.text+0x88): undefined reference to `.set_cpu_current_state'
smp.c:(.devinit.text+0x94): undefined reference to `.set_default_offline_state'
arch/powerpc/platforms/built-in.o: In function `.smp_pSeries_kick_cpu':
smp.c:(.devinit.text+0x13c): undefined reference to `.set_preferred_offline_state'
smp.c:(.devinit.text+0x148): undefined reference to `.get_cpu_current_state'
smp.c:(.devinit.text+0x1a8): undefined reference to `.get_cpu_current_state'
make: *** [.tmp_vmlinux1] Error 1
The following change fixes that for me and seems to work as expected.
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to. Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.
So instead set max_phys_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.
This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.
It causes a NULL pointer exception on some configurations when CONFIG_TRACING is
enabled on 2.6.33. This patch fixes the problem (acknowledged by Randy who
reported the bug).
It did not appear to hurt previously because most of the accesses were done
through local_inc, which probably obfuscated the access enough that no compiler
optimizations were done. But with local_read() done when CONFIG_TRACING is
active, this becomes a problem. Non-CONFIG_TRACING is probably affected as well
(module.c contains local_set and local_read that use __module_ref_addr()), but I
guess nobody noticed because we've been lucky enough that the compiler did not
generate the inappropriate optimization pattern there.
This patch should be queued for the 2.6.29.x through 2.6.33.x stable branches.
(tested on 2.6.33.1 x86_64)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> CC: Eric Dumazet <dada1@cosmosbay.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Tejun Heo <tj@kernel.org> CC: Ingo Molnar <mingo@elte.hu> CC: Andrew Morton <akpm@linux-foundation.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
But it's really just moving the code around. But it's enough to say that the
problems appeared before Jul 19 01:48:54 2007, which brings us back to 2.6.23.
It should be applied to stable 2.6.23.x to 2.6.33.x (or whichever of these
stable branches are still maintained).
(tested on 2.6.33.1 x86_64)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Randy Dunlap <randy.dunlap@oracle.com> CC: Eric Dumazet <dada1@cosmosbay.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Tejun Heo <tj@kernel.org> CC: Ingo Molnar <mingo@elte.hu> CC: Andrew Morton <akpm@linux-foundation.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
access_bit_width field is u8 in ACPICA, thus 256 value written to it
becomes 0, causing divide by zero later.
Proper fix would be to remove access_bit_width at all, just because
we already have access_byte_width, which is access_bit_width / 8.
Limit access width to 64 bit for now.
https://bugzilla.kernel.org/show_bug.cgi?id=15749
fixes regression caused by the fix for:
https://bugzilla.kernel.org/show_bug.cgi?id=14667
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
xfssyncd processes a queue of work by detaching the queue and
then iterating over all the work items. It then sleeps for a
time period or until new work comes in. If new work is queued
while xfssyncd is actively processing the detached work queue,
it will not process that new work until after a sleep timeout
or the next work event queued wakes it.
Fix this by checking the work queue again before going to sleep.
Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The radix-tree code requires it's users to serialize tag updates
against other updates to the tree. While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption. Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Patrick Schreurs <patrick@news-service.com> Tested-by: Patrick Schreurs <patrick@news-service.com> Signed-off-by: Alex Elder <aelder@sgi.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The introduction of barriers to loop devices has created a new IO
order completion dependency that XFS does not handle. The loop
device implements barriers using fsync and so turns a log IO in the
XFS filesystem on the loop device into a data IO in the backing
filesystem. That is, the completion of log IOs in the loop
filesystem are now dependent on completion of data IO in the backing
filesystem.
This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring on all XFS filesystems on that CPU (due to
the shared nature of the completion queues). This then prevents the
log IO from completing because the log is waiting for data IO
completion as well.
The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.
Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
perf_events, x86: Implement Intel Westmere support
The new Intel documentation includes Westmere arch specific
event maps that are significantly different from the Nehalem
ones. Add support for this generation.
Found the CPUID model numbers on wikipedia.
Also ammend some Nehalem constraints, spotted those when looking
for the differences between Nehalem and Westmere.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <20100127221122.151865645@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf, x86: Enable Nehalem-EX support
According to Intel Software Devel Manual Volume 3B, the
Nehalem-EX PMU is just like regular Nehalem (except for the
uncore support, which is completely different).
Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Youquan Song <youquan.song@linux.intel.com>
Tx ring buffers after tx_ring->next_to_use are volatile and could
change, possibly causing a crash. Stop cleaning when we hit
tx_ring->next_to_use.
Signed-off-by: Terry Loftin <terry.loftin@hp.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Matthew Burgess <matthew@linuxfromscratch.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a problem if an "internal short scan" is in progress when a
mac80211 requested scan arrives. If this new scan request arrives within
the "next_scan_jiffies" period then driver will immediately return success
and complete the scan. The problem here is that the scan has not been
fully initialized at this time (is_internal_short_scan is still set to true
because of the currently running scan), which results in the scan
completion never to be sent to mac80211. At this time also, evan though the
internal short scan is still running the state (is_internal_short_scan)
will be set to false, so when the internal scan does complete then mac80211
will receive a scan completion.
Fix this by checking right away if a scan is in progress when a scan
request arrives from mac80211.
If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.
Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path. Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path. This
could lead to confusion in some userspace applications, since the two
values should be equal.
Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.
When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.
The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.
As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().
This patch removes the d_drop call and fixes both issues.
This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887
Reported-by: Árpád Bíró <biroa@demasz.hu> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Dustin Kirkland <kirkland@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Biostar mobo seems to give a wrong DMA position, resulting in
stuttering or skipping sounds on 2.6.34. Since the commit 7b3a177b0d4f92b3431b8dca777313a07533a710, "ALSA: pcm_lib: fix "something
must be really wrong" condition", makes the position check more strictly,
the DMA position problem is revealed more clearly now.
The fix is to use only LPIB for obtaining the position, i.e. passing
position_fix=1. This patch adds a static quirk to achieve it as default.
Reported-by: Frank Griffin <ftg@roadrunner.com> Cc: Eric Piel <Eric.Piel@tremplin-utc.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes the b43 driver just automatically fall back to PIO mode when
DMA doesn't work.
The driver already told the user to do it, so rather than have the user
reload the module with a new flag, just make the driver do it
automatically. We keep the message as an indication that something is
wrong, but now just automatically fall back to the hopefully working PIO
case.
(Some post-2.6.33 merge fixups by Larry Finger <Larry.Finger@lwfinger.net>
and yours truly... -- JWL)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If userencounter the "Fatal DMA Problem" with a BCM43XX device, and
still wish to use b43 as the driver, their only option is to rebuild
the kernel with CONFIG_B43_FORCE_PIO. This patch removes this option and
allows PIO mode to be selected with a load-time parameter for the module.
Note that the configuration variable CONFIG_B43_PIO is also removed.
Once the DMA problem with the BCM4312 devices is solved, this patch will
likely be reverted.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: John Daiker <daikerjohn@gmail.com> Cc: maximilian attems <max@stro.at> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The IPoIB UD QP reports send completions to priv->send_cq, which is
usually left unarmed; it only gets armed when the number of
outstanding send requests reaches the size of the TX queue. This
arming is done only in the send path for the UD QP. However, when
sending CM packets, the net queue may be stopped for the same reasons
but no measures are taken to recover the UD path from a lockup.
Consider this scenario: a host sends high rate of both CM and UD
packets, with a TX queue length of N. If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding
packets is N-1, and CM sends a packet (making the number of
outstanding sends equal N), the TX queue will be stopped. When all
the CM packets complete, the number of outstanding packets will still
be higher than N/2 so the TX queue will not be restarted.
Fix this by calling ib_req_notify_cq() when the queue is stopped in
the CM path.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While investigating a bug, I came across a possible bug in v9fs. The
problem is similar to the one reported for NFS by ASANO Masahiro in
http://lkml.org/lkml/2005/12/21/334.
v9fs_file_lock() will skip locks on file which has mode set to 02666.
This is a problem in cases where the mode of the file is changed after
a process has obtained a lock on the file. Such a lock will be skipped
during unlock and the machine will end up with a BUG in
locks_remove_flock().
v9fs_file_lock() should skip the check for mandatory locks when
unlocking a file.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In ocfs2_validate_gd_parent, we check bg_chain against the
cl_next_free_rec of the dinode. Actually in resize, we have
the chance of bg_chain == cl_next_free_rec. So add some
additional condition check for it.
I also rename paramter "clean_error" to "resize", since the
old one is not clearly enough to indicate that we should only
meet with this case in resize.
btw, the correpsonding bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory
inode, but never setting it on the disk copy. Thus, acls were some times not
getting propagated between nodes. This patch fixes the issue by adding a
helper function ocfs2_acl_set_mode() which does this the right way.
ocfs2_set_acl() and ocfs2_init_acl() are then updated to call
ocfs2_acl_set_mode().
Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.
Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Intel Architecture Optimization Reference Manual states that a short
load that follows a long store to the same object will suffer a store
forwading penalty, particularly if the two accesses use different addresses.
Trivially, a long load that follows a short store will also suffer a penalty.
__downgrade_write() in rwsem incurs both penalties: the increment operation
will not be able to reuse a recently-loaded rwsem value, and its result will
not be reused by any recently-following rwsem operation.
A comment in the code states that this is because 64-bit immediates are
special and expensive; but while they are slightly special (only a single
instruction allows them), they aren't expensive: a test shows that two loops,
one loading a 32-bit immediate and one loading a 64-bit immediate, both take
1.5 cycles per iteration.
Fix this by changing __downgrade_write to use the same add instruction on
i386 and on x86_64, so that it uses the same operand size as all the other
rwsem functions.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On Sun, 17 Jan 2010, Ingo Molnar wrote:
>
> FYI, -tip testing found that these changes break the UML build:
>
> kernel/built-in.o: In function `__up_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:192: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__up_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:210: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__downgrade_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:228: undefined reference to `call_rwsem_downgrade_wake'
> kernel/built-in.o: In function `__down_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:112: undefined reference to `call_rwsem_down_read_failed'
> kernel/built-in.o: In function `__down_write_nested':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:154: undefined reference to `call_rwsem_down_write_failed'
> collect2: ld returned 1 exit status
Add lib/rwsem_64.o to the UML subarch objects to fix.
LKML-Reference: <alpine.LFD.2.00.1001171023440.13231@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This one is much faster than the spinlock based fallback rwsem code,
with certain artifical benchmarks having shown 300%+ improvement on
threaded page faults etc.
Again, note the 32767-thread limit here. So this really does need that
whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
extension on top of it, but that is conceptually a totally independent
issue.
NOT TESTED! The original patch that this all was based on were tested by
KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
cleaned-up series, so caveat emptor..
Also note that it _may_ be a good idea to mark some more registers
clobbered on x86-64 in the inline asms instead of saving/restoring them.
They are inline functions, but they are only used in places where there
are not a lot of live registers _anyway_, so doing for example the
clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
worse, and would make the slow-path code smaller.
(Not that the slow-path really matters to that degree. Saving a few
unnecessary registers is the _least_ of our problems when we hit the slow
path. The instruction/cycle counting really only matters in the fast
path).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121810410.17145@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For x86-64, 32767 threads really is not enough. Change rwsem_count_t
to a signed long, so that it is 64 bits on x86-64.
This required the following changes to the assembly code:
a) %z0 doesn't work on all versions of gcc! At least gcc 4.4.2 as
shipped with Fedora 12 emits "ll" not "q" for 64 bits, even for
integer operands. Newer gccs apparently do this correctly, but
avoid this problem by using the _ASM_ macros instead of %z.
b) 64 bits immediates are only allowed in "movq $imm,%reg"
constructs... no others. Change some of the constraints to "e",
and fix the one case where we would have had to use an invalid
immediate -- in that case, we only care about the upper half
anyway, so just access the upper half.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <tip-bafaecd11df15ad5b1e598adc7736afcd38ee13d@git.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The fast version of the rwsems (the code that uses xadd) has
traditionally only worked on x86-32, and as a result it mixes different
kinds of types wildly - they just all happen to be 32-bit. We have
"long", we have "__s32", and we have "int".
To make it work on x86-64, the types suddenly matter a lot more. It can
be either a 32-bit or 64-bit signed type, and both work (with the caveat
that a 32-bit counter will only have 15 bits of effective write
counters, so it's limited to 32767 users). But whatever type you
choose, it needs to be used consistently.
This makes a new 'rwsem_counter_t', that is a 32-bit signed type. For a
64-bit type, you'd need to also update the BIAS values.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121755220.17145@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes gcc use the right register names and instruction operand sizes
automatically for the rwsem inline asm statements.
So instead of using "(%%eax)" to specify the memory address that is the
semaphore, we use "(%1)" or similar. And instead of forcing the operation
to always be 32-bit, we use "%z0", taking the size from the actual
semaphore data structure itself.
This doesn't actually matter on x86-32, but if we want to use the same
inline asm for x86-64, we'll need to have the compiler generate the proper
64-bit names for the registers (%rax instead of %eax), and if we want to
use a 64-bit counter too (in order to avoid the 15-bit limit on the
write counter that limits concurrent users to 32767 threads), we'll need
to be able to generate instructions with "q" accesses rather than "l".
Since this header currently isn't enabled on x86-64, none of that matters,
but we do want to use the xadd version of the semaphores rather than have
to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
when you have lots of threads all taking page faults, and the fallback
rwsem code that uses a spinlock performs abysmally badly in that case.
[ hpa: modified the patch to skip size suffixes entirely when they are
redundant due to register operands. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Set a new DM_UEVENT_GENERATED_FLAG when returning from ioctls to
indicate that a uevent was actually generated. This tells the userspace
caller that it may need to wait for the event to be processed.
Signed-off-by: Peter Rajnoha <prajnoha@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
arch/x86/built-in.o: In function `store_cache_disable':
intel_cacheinfo.c:(.text+0xc509): undefined reference to `amd_get_nb_id'
arch/x86/built-in.o: In function `show_cache_disable':
intel_cacheinfo.c:(.text+0xc7d3): undefined reference to `amd_get_nb_id'
when CONFIG_CPU_SUP_AMD is not enabled because the amd_get_nb_id
helper is defined in AMD-specific code but also used in generic code
(intel_cacheinfo.c). Reorganize the L3 cache index disable code under
CONFIG_CPU_SUP_AMD since it is AMD-only anyway.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100218184210.GF20473@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The show/store_cache_disable routines depend unnecessarily on NUMA's
cpu_to_node and the disabling of cache indices broke when !CONFIG_NUMA.
Remove that dependency by using a helper which is always correct.
While at it, enable L3 Cache Index disable on rev D1 Istanbuls which
sport the feature too.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100218184339.GG20473@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We need to know the valid L3 indices interval when disabling them over
/sysfs. Do that when the core is brought online and add boundary checks
to the sysfs .store attribute.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1264172467-25155-6-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>