When the addba timer expires but has no work to do,
it should not affect the state machine. If it does,
TX will not see the successfully established and we
can also crash trying to re-establish the session.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a signal is pending (task being killed by sigkill)
__mem_cgroup_try_charge will write NULL into &mem, and css_put will oops
on null pointer dereference.
Pid: 5299, comm: largepages Tainted: G W 2.6.34-rc3 #3 Penryn1600SLI-110dB/To Be Filled By O.E.M.
RIP: 0010:[<ffffffff810fc6cc>] [<ffffffff810fc6cc>] mem_cgroup_prepare_migration+0x7c/0xc0
Fix regression caused by commit 507e2fbaaacb6f164b4125b87c5002f95143174b
("w1: w1 temp calculation overflow fix") whereby negative temperatures for
the DS18B20 are not converted properly.
When the temperature exceeds 32767 milli-degrees the temperature overflows
to -32768 millidegrees. These are both well within the -55 - +125 degree
range for the sensor.
This patch (as1369) fixes a problem in ehci-hcd. Some controllers
occasionally run into trouble when the driver reclaims siTDs too
quickly. This can happen while streaming audio; it causes the
controller to crash.
The patch changes siTD reclamation to work the same way as iTD
reclamation: Completed siTDs are stored on a list and not reused until
at least one frame has passed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Nate Case <ncase@xes-inc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Brandon Philips noted that I had a spacing issue in my printk for the
last r8169 patch that made it quite ugly. Fix that up and add the PFX
macro to it as well so it looks like the other r8169 printks
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If we boot into a crash-kernel the gart might still be
enabled and its caches might be dirty. This can result in
undefined behavior later. Fix it by explicitly disabling the
gart hardware before initialization and flushing the caches
after enablement.
This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.
- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode. The means that the following sequence
Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().
Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Errata:
Certain conditions on the scsi bus may casue the 53C1030 to incorrectly signal
a SCSI_DATA_UNDERRUN to the host.
Workaround 1:
For an Errata on LSI53C1030 When the length of request data
and transfer data are different with result of command (READ or VERIFY),
DID_SOFT_ERROR is set.
Workaround 2:
For potential trouble on LSI53C1030. It is checked whether the length of
request data is equal to the length of transfer and residual.
MEDIUM_ERROR is set by incorrect data.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After dma-mapping an SG list provided by the SCSI midlayer, iser has
to make sure the mapped SG is "aligned for RDMA" in the sense that its
possible to produce one mapping in the HCA IOMMU which represents the
whole SG. Next, the mapped SG is formatted for registration with the HCA.
This patch re-writes the logic that does the above, to make it clearer
and simpler. It also fixes a bug in the being aligned for RDMA checks,
where a "start" check wasn't done but rather only "end" check.
[commit message in RH kernel tree: "Under heavy load, without the patch,
the HCA can be programmed to write (corrupt) into pages/location which
doesn't belong to the SG associated with the actual I/O and cause a
kernel oops."]
Signed-off-by: Alexander Nezhinsky <alexandern@voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The tpm_tis driver already has a list of supported pnp_device_ids.
This patch simply exports that list as a MODULE_DEVICE_TABLE() so that
the module autoloader will discover and load the module at boottime.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Creating many small files in rapid succession on a small
filesystem can lead to spurious ENOSPC; on a 104MB filesystem:
for i in `seq 1 22500`; do
echo -n > $SCRATCH_MNT/$i
echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
done
leads to ENOSPC even though after a sync, 40% of the fs is free
again.
This is because we reserve worst-case metadata for delalloc writes,
and when data is allocated that worst-case reservation is not
usually needed.
When freespace is low, kicking off an async writeback will start
converting that worst-case space usage into something more realistic,
almost always freeing up space to continue.
This resolves the testcase for me, and survives all 4 generic
ENOSPC tests in xfstests.
We'll still need a hard synchronous sync to squeeze out the last bit,
but this fixes things up to a large degree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ext4, at least, would like to start pushing on writeback if it starts
to get close to ENOSPC when reserving worst-case blocks for delalloc
writes. Writing out delalloc data will convert those worst-case
predictions into usually smaller actual usage, freeing up space
before we hit ENOSPC based on this speculation.
Thanks to Jens for the suggestion for the helper function,
& the naming help.
I've made the helper return status on whether writeback was
started even though I don't plan to use it in the ext4 patch;
it seems like it would be potentially useful to test this
in some cases.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reinette found the reason for the warnings that
happened occasionally when a hw-offloaded scan
finished; her description of the problem:
mac80211 will defer the handling of scan requests if it is
busy with management work at the time. The scan requests
are deferred and run after the work has completed. When
this occurs there are currently two problems.
* The scan request for hardware scan is not fully populated
with the band and channels to scan not initialized.
* When the scan is queued the state is not correctly updated
to reflect that a scan is in progress. The problem here is
that when the driver completes the scan and calls
ieee80211_scan_completed() a warning will be triggered
since mac80211 was not aware that a scan was in progress.
The reason is that the queued scan work will start
the hw scan right away when the hw_scan_req struct
has already been allocated. However, in the first
pass it will not have been filled, which happens
at the same time as setting the bits. To fix this,
simply move the allocation after the pending work
test as well, so that the first iteration of the
scan work will call __ieee80211_start_scan() even
in the hardware scan case.
Bug-identified-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Chase Douglas <chase.douglas@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is insufficient. The recvfrom methods must always call
svc_xprt_received on completion so that the socket gets re-queued if
there is any more work to do. This particular path did not make that
call because it actually destroyed the svsk, making requeue pointless.
When the svc_delete_socket was change to just set a bit, we should have
added a call to svc_xprt_received,
This is the problem that b0401d7253 attempted to fix, incorrectly.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit b0401d725334a94d57335790b8ac2404144748ee, which
moved svc_delete_xprt() outside of XPT_BUSY, and allowed it to be called
after svc_xpt_recived(), removing its last reference and destroying it
after it had already been queued for future processing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For nfsd we provide users the option of mapping uid's to server-side
supplementary group lists. That makes sense for nfsd, but not
necessarily for other rpc users (such as the callback client).
So move that lookup to svcauth_unix_set_client, which is a
program-specific method.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to. Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.
So instead set max_phys_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.
This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.
It causes a NULL pointer exception on some configurations when CONFIG_TRACING is
enabled on 2.6.33. This patch fixes the problem (acknowledged by Randy who
reported the bug).
It did not appear to hurt previously because most of the accesses were done
through local_inc, which probably obfuscated the access enough that no compiler
optimizations were done. But with local_read() done when CONFIG_TRACING is
active, this becomes a problem. Non-CONFIG_TRACING is probably affected as well
(module.c contains local_set and local_read that use __module_ref_addr()), but I
guess nobody noticed because we've been lucky enough that the compiler did not
generate the inappropriate optimization pattern there.
This patch should be queued for the 2.6.29.x through 2.6.33.x stable branches.
(tested on 2.6.33.1 x86_64)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> CC: Eric Dumazet <dada1@cosmosbay.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Tejun Heo <tj@kernel.org> CC: Ingo Molnar <mingo@elte.hu> CC: Andrew Morton <akpm@linux-foundation.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
But it's really just moving the code around. But it's enough to say that the
problems appeared before Jul 19 01:48:54 2007, which brings us back to 2.6.23.
It should be applied to stable 2.6.23.x to 2.6.33.x (or whichever of these
stable branches are still maintained).
(tested on 2.6.33.1 x86_64)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Randy Dunlap <randy.dunlap@oracle.com> CC: Eric Dumazet <dada1@cosmosbay.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Tejun Heo <tj@kernel.org> CC: Ingo Molnar <mingo@elte.hu> CC: Andrew Morton <akpm@linux-foundation.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When Wacom devices wake up from a sleep, the switch mode command
(wacom_query_tablet_data) is needed before wacom_open is called.
wacom_query_tablet_data should not be executed inside wacom_open
since wacom_open is called more than once during probe.
access_bit_width field is u8 in ACPICA, thus 256 value written to it
becomes 0, causing divide by zero later.
Proper fix would be to remove access_bit_width at all, just because
we already have access_byte_width, which is access_bit_width / 8.
Limit access width to 64 bit for now.
https://bugzilla.kernel.org/show_bug.cgi?id=15749
fixes regression caused by the fix for:
https://bugzilla.kernel.org/show_bug.cgi?id=14667
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
perf_events, x86: Implement Intel Westmere support
The new Intel documentation includes Westmere arch specific
event maps that are significantly different from the Nehalem
ones. Add support for this generation.
Found the CPUID model numbers on wikipedia.
Also ammend some Nehalem constraints, spotted those when looking
for the differences between Nehalem and Westmere.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <20100127221122.151865645@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf, x86: Enable Nehalem-EX support
According to Intel Software Devel Manual Volume 3B, the
Nehalem-EX PMU is just like regular Nehalem (except for the
uncore support, which is completely different).
Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Youquan Song <youquan.song@linux.intel.com>
Make sure, that TCP has a nonzero RTT estimation after three-way
handshake. Currently, a listening TCP has a value of 0 for srtt,
rttvar and rto right after the three-way handshake is completed
with TCP timestamps disabled.
This will lead to corrupt RTO recalculation and retransmission
flood when RTO is recalculated on backoff reversion as introduced
in "Revert RTO on ICMP destination unreachable"
(f1ecd5d9e7366609d640ff4040304ea197fbc618).
This behaviour can be provoked by connecting to a server which
"responds first" (like SMTP) and rejecting every packet after
the handshake with dest-unreachable, which will lead to softirq
load on the server (up to 30% per socket in some tests).
Thanks to Ilpo Jarvinen for providing debug patches and to
Denys Fedoryshchenko for reporting and testing.
Changes since v3: Removed bad characters in patchfile.
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Official patch to fix the r8169 frame length check error.
Based on this initial thread:
http://marc.info/?l=linux-netdev&m=126202972828626&w=1
This is the official patch to fix the frame length problems in the r8169
driver. As noted in the previous thread, while this patch incurs a performance
hit on the driver, its possible to improve performance dynamically by updating
the mtu and rx_copybreak values at runtime to return performance to what it was
for those NICS which are unaffected by the ideosyncracy (if there are any).
Summary:
A while back Eric submitted a patch for r8169 in which the proper
allocated frame size was written to RXMaxSize to prevent the NIC from dmaing too
much data. This was done in commit fdd7b4c3302c93f6833e338903ea77245eb510b4. A
long time prior to that however, Francois posted 126fa4b9ca5d9d7cb7d46f779ad3bd3631ca387c, which expiclitly disabled the MaxSize
setting due to the fact that the hardware behaved in odd ways when overlong
frames were received on NIC's supported by this driver. This was mentioned in a
security conference recently:
http://events.ccc.de/congress/2009/Fahrplan//events/3596.en.html
It seems that if we can't enable frame size filtering, then, as Eric correctly
noticed, we can find ourselves DMA-ing too much data to a buffer, causing
corruption. As a result is seems that we are forced to allocate a frame which
is ready to handle a maximally sized receive.
This obviously has performance issues with it, so to mitigate that issue, this
patch does two things:
1) Raises the copybreak value to the frame allocation size, which should force
appropriately sized packets to get allocated on rx, rather than a full new 16k
buffer.
2) This patch only disables frame filtering initially (i.e., during the NIC
open), changing the MTU results in ring buffer allocation of a size in relation
to the new mtu (along with a warning indicating that this is dangerous).
Because of item (2), individuals who can't cope with the performance hit (or can
otherwise filter frames to prevent the bug), or who have hardware they are sure
is unaffected by this issue, can manually lower the copybreak and reset the mtu
such that performance is restored easily.
Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tx ring buffers after tx_ring->next_to_use are volatile and could
change, possibly causing a crash. Stop cleaning when we hit
tx_ring->next_to_use.
Signed-off-by: Terry Loftin <terry.loftin@hp.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Matthew Burgess <matthew@linuxfromscratch.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a problem if an "internal short scan" is in progress when a
mac80211 requested scan arrives. If this new scan request arrives within
the "next_scan_jiffies" period then driver will immediately return success
and complete the scan. The problem here is that the scan has not been
fully initialized at this time (is_internal_short_scan is still set to true
because of the currently running scan), which results in the scan
completion never to be sent to mac80211. At this time also, evan though the
internal short scan is still running the state (is_internal_short_scan)
will be set to false, so when the internal scan does complete then mac80211
will receive a scan completion.
Fix this by checking right away if a scan is in progress when a scan
request arrives from mac80211.
John Wright [Tue, 13 Apr 2010 22:55:37 +0000 (16:55 -0600)]
sched: Fix a race between ttwu() and migrate_task()
Based on commit e2912009fb7b715728311b0d8fe327a1432b3f79 upstream, but
done differently as this issue is not present in .33 or .34 kernels due
to rework in this area.
If a task is in the TASK_WAITING state, then try_to_wake_up() is working
on it, and it will place it on the correct cpu.
This commit ensures that neither migrate_task() nor __migrate_task()
calls set_task_cpu(p) while p is in the TASK_WAKING state. Otherwise,
there could be two concurrent calls to set_task_cpu(p), resulting in
the task's cfs_rq being inconsistent with its cpu.
Signed-off-by: John Wright <john.wright@hp.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.
Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path. Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path. This
could lead to confusion in some userspace applications, since the two
values should be equal.
Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.
When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.
The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.
As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().
This patch removes the d_drop call and fixes both issues.
This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887
Reported-by: Árpád Bíró <biroa@demasz.hu> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Dustin Kirkland <kirkland@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 15b8dd53f5ffa changed the string in info->hardware_id from a static
array to a pointer and added a length field. But instead of changing
"sizeof(array)" to "length", we changed it to "sizeof(length)" (== 4),
which corrupts the string we're trying to null-terminate.
We no longer even need to null-terminate the string, but we *do* need to
check whether we found a HID. If there's no HID, we used to have an empty
array, but now we have a null pointer.
The combination of these defects causes this oops:
Unable to handle kernel NULL pointer dereference (address 0000000000000003)
modprobe[895]: Oops 8804682956800 [1]
ip is at zx1_gart_probe+0xd0/0xcc0 [hp_agp]
The Biostar mobo seems to give a wrong DMA position, resulting in
stuttering or skipping sounds on 2.6.34. Since the commit 7b3a177b0d4f92b3431b8dca777313a07533a710, "ALSA: pcm_lib: fix "something
must be really wrong" condition", makes the position check more strictly,
the DMA position problem is revealed more clearly now.
The fix is to use only LPIB for obtaining the position, i.e. passing
position_fix=1. This patch adds a static quirk to achieve it as default.
Reported-by: Frank Griffin <ftg@roadrunner.com> Cc: Eric Piel <Eric.Piel@tremplin-utc.net> Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes the b43 driver just automatically fall back to PIO mode when
DMA doesn't work.
The driver already told the user to do it, so rather than have the user
reload the module with a new flag, just make the driver do it
automatically. We keep the message as an indication that something is
wrong, but now just automatically fall back to the hopefully working PIO
case.
(Some post-2.6.33 merge fixups by Larry Finger <Larry.Finger@lwfinger.net>
and yours truly... -- JWL)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If userencounter the "Fatal DMA Problem" with a BCM43XX device, and
still wish to use b43 as the driver, their only option is to rebuild
the kernel with CONFIG_B43_FORCE_PIO. This patch removes this option and
allows PIO mode to be selected with a load-time parameter for the module.
Note that the configuration variable CONFIG_B43_PIO is also removed.
Once the DMA problem with the BCM4312 devices is solved, this patch will
likely be reverted.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: John Daiker <daikerjohn@gmail.com> Cc: maximilian attems <max@stro.at> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As shown in Kernel Bugzilla #14761, doing a controller restart after a
fatal DMA error does not accomplish anything other than consume the CPU
on an affected system. Accordingly, substitute a meaningful message for
the restart.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The IPoIB UD QP reports send completions to priv->send_cq, which is
usually left unarmed; it only gets armed when the number of
outstanding send requests reaches the size of the TX queue. This
arming is done only in the send path for the UD QP. However, when
sending CM packets, the net queue may be stopped for the same reasons
but no measures are taken to recover the UD path from a lockup.
Consider this scenario: a host sends high rate of both CM and UD
packets, with a TX queue length of N. If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding
packets is N-1, and CM sends a packet (making the number of
outstanding sends equal N), the TX queue will be stopped. When all
the CM packets complete, the number of outstanding packets will still
be higher than N/2 so the TX queue will not be restarted.
Fix this by calling ib_req_notify_cq() when the queue is stopped in
the CM path.
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The aer_inject module hangs in aer_inject() when checking the device's
error masks. The hang is due to a recursive use of the aer_inject lock.
The aer_inject() routine grabs the lock while processing the error and then
calls pci_read_config_dword to read the masks. The pci_read_config_dword
routine is earlier overridden by pci_read_aer, which among other things,
grabs the aer_inject lock.
Fixed by moving the pci_read_config_dword calls to read the masks to before
the lock is taken.
Acked-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The 32bit kernel does not add padding bytes in the fc_bsg_request structure
whereas the 64bit kernel adds padding bytes in the fc_bsg_request structure.
Due to this, structure elements gets mismatched with 32bit application and
64bit kernel.To resolve this, used packed modifier to avoid adding padding bytes. Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Correcteable/Uncorrectable Error Mask Registers are used by PCIe AER
driver which will controls the reporting of individual errors to PCIe RC
via PCIe error messages.
If hardware masks special error reporting to RC, the aer_inject driver
should not inject aer error.
This patch adds the device ID necessary to support the 82576NS SerDes
adapter.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the port state is blocked and the fast io fail tmo has
fired then this patch will fail bsg requests immediately.
This is needed if userspace is sending IOs to test the transport
like with fcping, so it will not have to wait for the dev loss tmo.
With this patch he bsg req fast io fail code behaves like the normal
and sg io/passthrough fast io fail.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-By: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While investigating a bug, I came across a possible bug in v9fs. The
problem is similar to the one reported for NFS by ASANO Masahiro in
http://lkml.org/lkml/2005/12/21/334.
v9fs_file_lock() will skip locks on file which has mode set to 02666.
This is a problem in cases where the mode of the file is changed after
a process has obtained a lock on the file. Such a lock will be skipped
during unlock and the machine will end up with a BUG in
locks_remove_flock().
v9fs_file_lock() should skip the check for mandatory locks when
unlocking a file.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In ocfs2_validate_gd_parent, we check bg_chain against the
cl_next_free_rec of the dinode. Actually in resize, we have
the chance of bg_chain == cl_next_free_rec. So add some
additional condition check for it.
I also rename paramter "clean_error" to "resize", since the
old one is not clearly enough to indicate that we should only
meet with this case in resize.
btw, the correpsonding bug is
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1230.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ocfs2_set_acl() and ocfs2_init_acl() were setting i_mode on the in-memory
inode, but never setting it on the disk copy. Thus, acls were some times not
getting propagated between nodes. This patch fixes the issue by adding a
helper function ocfs2_acl_set_mode() which does this the right way.
ocfs2_set_acl() and ocfs2_init_acl() are then updated to call
ocfs2_acl_set_mode().
Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jan-Matthias Braun spotted a bug which locks up the driver when the
comedi ring buffer runs empty and provided a patch. The driver would
still send the data to comedi but the reader won't wake up any more.
What's required is setting the flag COMEDI_CB_BLOCK after new data has
arrived which wakes up the reader and therefore the read() command.
I've fixed a bug in the USBDUX driver which caused timeouts while
sending commands to the boards. This was mainly because of one bulk
transfer which had a timeout of 1ms (!). I've now set all timeouts to
1000ms.
dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.
Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the problem that system may stall if target's ->map_rq
returns DM_MAPIO_REQUEUE in map_request().
E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
bounces between all-paths-down and paths-up on I/O load.
When target's ->map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
the request and returns to dm_request_fn(). Then, dm_request_fn()
doesn't exit the I/O dispatching loop and continues processing
the requeued request again.
This map and requeue loop can be done with interrupt disabled,
so 1 CPU system can be stalled if this situation happens.
For example, commands below can stall my 1 CPU box within 1 minute or so:
# dmsetup table mp
mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
# while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &
# while true; do \
> dmsetup message mp 0 "fail_path 8:144" \
> dmsetup suspend --noflush mp \
> dmsetup resume mp \
> dmsetup message mp 0 "reinstate_path 8:144" \
> done
To fix the problem above, this patch changes dm_request_fn() to exit
the I/O dispatching loop once if a request is requeued in map_request().
The Intel Architecture Optimization Reference Manual states that a short
load that follows a long store to the same object will suffer a store
forwading penalty, particularly if the two accesses use different addresses.
Trivially, a long load that follows a short store will also suffer a penalty.
__downgrade_write() in rwsem incurs both penalties: the increment operation
will not be able to reuse a recently-loaded rwsem value, and its result will
not be reused by any recently-following rwsem operation.
A comment in the code states that this is because 64-bit immediates are
special and expensive; but while they are slightly special (only a single
instruction allows them), they aren't expensive: a test shows that two loops,
one loading a 32-bit immediate and one loading a 64-bit immediate, both take
1.5 cycles per iteration.
Fix this by changing __downgrade_write to use the same add instruction on
i386 and on x86_64, so that it uses the same operand size as all the other
rwsem functions.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On Sun, 17 Jan 2010, Ingo Molnar wrote:
>
> FYI, -tip testing found that these changes break the UML build:
>
> kernel/built-in.o: In function `__up_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:192: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__up_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:210: undefined reference to `call_rwsem_wake'
> kernel/built-in.o: In function `__downgrade_write':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:228: undefined reference to `call_rwsem_downgrade_wake'
> kernel/built-in.o: In function `__down_read':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:112: undefined reference to `call_rwsem_down_read_failed'
> kernel/built-in.o: In function `__down_write_nested':
> /home/mingo/tip/arch/x86/include/asm/rwsem.h:154: undefined reference to `call_rwsem_down_write_failed'
> collect2: ld returned 1 exit status
Add lib/rwsem_64.o to the UML subarch objects to fix.
LKML-Reference: <alpine.LFD.2.00.1001171023440.13231@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This one is much faster than the spinlock based fallback rwsem code,
with certain artifical benchmarks having shown 300%+ improvement on
threaded page faults etc.
Again, note the 32767-thread limit here. So this really does need that
whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
extension on top of it, but that is conceptually a totally independent
issue.
NOT TESTED! The original patch that this all was based on were tested by
KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
cleaned-up series, so caveat emptor..
Also note that it _may_ be a good idea to mark some more registers
clobbered on x86-64 in the inline asms instead of saving/restoring them.
They are inline functions, but they are only used in places where there
are not a lot of live registers _anyway_, so doing for example the
clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
worse, and would make the slow-path code smaller.
(Not that the slow-path really matters to that degree. Saving a few
unnecessary registers is the _least_ of our problems when we hit the slow
path. The instruction/cycle counting really only matters in the fast
path).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121810410.17145@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For x86-64, 32767 threads really is not enough. Change rwsem_count_t
to a signed long, so that it is 64 bits on x86-64.
This required the following changes to the assembly code:
a) %z0 doesn't work on all versions of gcc! At least gcc 4.4.2 as
shipped with Fedora 12 emits "ll" not "q" for 64 bits, even for
integer operands. Newer gccs apparently do this correctly, but
avoid this problem by using the _ASM_ macros instead of %z.
b) 64 bits immediates are only allowed in "movq $imm,%reg"
constructs... no others. Change some of the constraints to "e",
and fix the one case where we would have had to use an invalid
immediate -- in that case, we only care about the upper half
anyway, so just access the upper half.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <tip-bafaecd11df15ad5b1e598adc7736afcd38ee13d@git.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The fast version of the rwsems (the code that uses xadd) has
traditionally only worked on x86-32, and as a result it mixes different
kinds of types wildly - they just all happen to be 32-bit. We have
"long", we have "__s32", and we have "int".
To make it work on x86-64, the types suddenly matter a lot more. It can
be either a 32-bit or 64-bit signed type, and both work (with the caveat
that a 32-bit counter will only have 15 bits of effective write
counters, so it's limited to 32767 users). But whatever type you
choose, it needs to be used consistently.
This makes a new 'rwsem_counter_t', that is a 32-bit signed type. For a
64-bit type, you'd need to also update the BIAS values.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121755220.17145@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This makes gcc use the right register names and instruction operand sizes
automatically for the rwsem inline asm statements.
So instead of using "(%%eax)" to specify the memory address that is the
semaphore, we use "(%1)" or similar. And instead of forcing the operation
to always be 32-bit, we use "%z0", taking the size from the actual
semaphore data structure itself.
This doesn't actually matter on x86-32, but if we want to use the same
inline asm for x86-64, we'll need to have the compiler generate the proper
64-bit names for the registers (%rax instead of %eax), and if we want to
use a 64-bit counter too (in order to avoid the 15-bit limit on the
write counter that limits concurrent users to 32767 threads), we'll need
to be able to generate instructions with "q" accesses rather than "l".
Since this header currently isn't enabled on x86-64, none of that matters,
but we do want to use the xadd version of the semaphores rather than have
to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
when you have lots of threads all taking page faults, and the fallback
rwsem code that uses a spinlock performs abysmally badly in that case.
[ hpa: modified the patch to skip size suffixes entirely when they are
redundant due to register operands. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eugene Teo [Tue, 23 Mar 2010 02:52:13 +0000 (10:52 +0800)]
vgaarb: fix "target=default" passing
Commit 77c1ff3982c6b36961725dd19e872a1c07df7f3b fixed the userspace
pointer dereference, but introduced another bug pointed out by Eugene Teo
in RH bug #564264. Instead of comparing the point we were at in the string,
we instead compared the beginning of the string to "default".
arch/x86/built-in.o: In function `store_cache_disable':
intel_cacheinfo.c:(.text+0xc509): undefined reference to `amd_get_nb_id'
arch/x86/built-in.o: In function `show_cache_disable':
intel_cacheinfo.c:(.text+0xc7d3): undefined reference to `amd_get_nb_id'
when CONFIG_CPU_SUP_AMD is not enabled because the amd_get_nb_id
helper is defined in AMD-specific code but also used in generic code
(intel_cacheinfo.c). Reorganize the L3 cache index disable code under
CONFIG_CPU_SUP_AMD since it is AMD-only anyway.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100218184210.GF20473@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The show/store_cache_disable routines depend unnecessarily on NUMA's
cpu_to_node and the disabling of cache indices broke when !CONFIG_NUMA.
Remove that dependency by using a helper which is always correct.
While at it, enable L3 Cache Index disable on rev D1 Istanbuls which
sport the feature too.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100218184339.GG20473@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We need to know the valid L3 indices interval when disabling them over
/sysfs. Do that when the core is brought online and add boundary checks
to the sysfs .store attribute.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1264172467-25155-6-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Dave Jones <davej@redhat.com> Cc: David Airlie <airlied@linux.ie> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1264172467-25155-3-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Correct the masks used for writing the cache index disable indices.
* Do not turn off L3 scrubber - it is not necessary.
* Make sure wbinvd is executed on the same node where the L3 is.
* Check for out-of-bounds values written to the registers.
* Make show_cache_disable hex values unambiguous
* Check for Erratum #388
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1264172467-25155-4-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add wbinvd_on_cpu and wbinvd_on_all_cpus stubs for executing wbinvd on a
particular CPU.
[ hpa: renamed lib/smp.c to lib/cache-smp.c ]
[ hpa: wbinvd_on_all_cpus() returns int, but wbinvd() returns
void. Thus, the former cannot be a macro for the latter,
replace with an inline function. ]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1264172467-25155-2-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Disabling the IOMMU can potetially allow DMA transactions to
complete without being translated. Leave it enabled, and allow
crash kernel to do the IOMMU reinitialization properly.
Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Hit another kdump problem as reported by Neil Horman. When initializaing
the IOMMU, we attach devices to their domains before the IOMMU is
fully (re)initialized. Attaching a device will issue some important
invalidations. In the context of the newly kexec'd kdump kernel, the
IOMMU may have stale cached data from the original kernel. Because we
do the attach too early, the invalidation commands are placed in the new
command buffer before the IOMMU is updated w/ that buffer. This leaves
the stale entries in the kdump context and can renders device unusable.
Simply enable the IOMMU before we do the attach.
Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>