s_syncing livelock avoidance was breaking data integrity guarantee of
sys_sync, by allowing sys_sync to skip writing or waiting for superblocks
if there is a concurrent sys_sync happening.
This livelock avoidance is much less important now that we don't have the
get_super_to_sync() call after every sb that we sync. This was replaced
by __put_super_and_need_restart.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix data integrity semantics required by sys_sync, by iterating over all
inodes and waiting for any writeback pages after the initial writeout.
Comments explain the exact problem.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Remove WB_SYNC_HOLD. The primary motiviation is the design of my
anti-starvation code for fsync. It requires taking an inode lock over the
sync operation, so we could run into lock ordering problems with multiple
inodes. It is possible to take a single global lock to solve the ordering
problem, but then that would prevent a future nice implementation of "sync
multiple inodes" based on lock order via inode address.
Seems like a backward step to remove this, but actually it is busted
anyway: we can't use the inode lists for data integrity wait: an inode can
be taken off the dirty lists but still be under writeback. In order to
satisfy data integrity semantics, we should wait for it to finish
writeback, but if we only search the dirty lists, we'll miss it.
It would be possible to have a "writeback" list, for sys_sync, I suppose.
But why complicate things by prematurely optimise? For unmounting, we
could avoid the "livelock avoidance" code, which would be easier, but
again premature IMO.
Fixing the existing data integrity problem will come next.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Direct IO can invalidate and sync a lot of pagecache pages in the mapping.
A 4K direct IO will actually try to sync and/or invalidate the pagecache
of the entire file, for example (which might be many GB or TB large).
Improve this by doing range syncs. Also, memory no longer has to be
unmapped to catch the dirty bits for syncing, as dirty bits would remain
coherent due to dirty mmap accounting.
This fixes the immediate DM deadlocks when doing direct IO reads to block
device with a mounted filesystem, if only by papering over the problem
somewhat rather than addressing the fsync starvation cases.
Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Chris Mason notices do_sync_mapping_range didn't actually ask for data
integrity writeout. Unfortunately, it is advertised as being usable for
data integrity operations.
This is a data integrity bug.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Terminate the write_cache_pages loop upon encountering the first page past
end, without locking the page. Pages cannot have their index change when
we have a reference on them (truncate, eg truncate_inode_pages_range
performs the same check without the page lock).
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In write_cache_pages, if we get stuck behind another process that is
cleaning pages, we will be forced to wait for them to finish, then perform
our own writeout (if it was redirtied during the long wait), then wait for
that.
If a page under writeout is still clean, we can skip waiting for it (if
we're part of a data integrity sync, we'll be waiting for all writeout
pages afterwards, so we'll still be waiting for the other guy's write
that's cleaned the page).
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In write_cache_pages, nr_to_write is heeded even for data-integrity syncs,
so the function will return success after writing out nr_to_write pages,
even if that was not sufficient to guarantee data integrity.
The callers tend to set it to values that could break data interity
semantics easily in practice. For example, nr_to_write can be set to
mapping->nr_pages * 2, however if a file has a single, dirty page, then
fsync is called, subsequent pages might be concurrently added and dirtied,
then write_cache_pages might writeout two of these newly dirty pages,
while not writing out the old page that should have been written out.
Fix this by ignoring nr_to_write if it is a data integrity sync.
This is a data integrity bug.
The reason this has been done in the past is to avoid stalling sync
operations behind page dirtiers.
"If a file has one dirty page at offset 1000000000000000 then someone
does an fsync() and someone else gets in first and starts madly writing
pages at offset 0, we want to write that page at 1000000000000000.
Somehow."
What we do today is return success after an arbitrary amount of pages are
written, whether or not we have provided the data-integrity semantics that
the caller has asked for. Even this doesn't actually fix all stall cases
completely: in the above situation, if the file has a huge number of pages
in pagecache (but not dirty), then mapping->nrpages is going to be huge,
even if pages are being dirtied.
This change does indeed make the possibility of long stalls lager, and
that's not a good thing, but lying about data integrity is even worse. We
have to either perform the sync, or return -ELINUXISLAME so at least the
caller knows what has happened.
There are subsequent competing approaches in the works to solve the stall
problems properly, without compromising data integrity.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In write_cache_pages, if ret signals a real error, but we still have some
pages left in the pagevec, done would be set to 1, but the remaining pages
would continue to be processed and ret will be overwritten in the process.
It could easily be overwritten with success, and thus success will be
returned even if there is an error. Thus the caller is told all writes
succeeded, wheras in reality some did not.
Fix this by bailing immediately if there is an error, and retaining the
first error code.
This is a data integrity bug.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We'd like to break out of the loop early in many situations, however the
existing code has been setting mapping->writeback_index past the final
page in the pagevec lookup for cyclic writeback. This is a problem if we
don't process all pages up to the final page.
Currently the code mostly keeps writeback_index reasonable and hacked
around this by not breaking out of the loop or writing pages outside the
range in these cases. Keep track of a real "done index" that enables us
to terminate the loop in a much more flexible manner.
Needed by the subsequent patch to preserve writepage errors, and then
further patches to break out of the loop early for other reasons. However
there are no functional changes with this patch alone.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In write_cache_pages, scanned == 1 is supposed to mean that cyclic
writeback has circled through zero, thus we should not circle again.
However it gets set to 1 after the first successful pagevec lookup. This
leads to cases where not enough data gets written.
Counterexample: file with first 10 pages dirty, writeback_index == 5,
nr_to_write == 10. Then the 5 last pages will be found, and scanned will
be set to 1, after writing those out, we will not cycle back to get the
first 5.
Rework this logic, now we'll always cycle unless we started off from index
0. When cycling, only write out as far as 1 page before the start page
from the first cycle (so we don't write parts of the file twice).
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When CONFIG_DMI is not enabled, dmi detection should flag that no board
could be detected (err=1) rather than another error condition (err<0).
This fixes the fallback to manual probing for all motherboards, even
those without DMI strings, when CONFIG_DMI=n.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices
The subpage_prot syscall fails on second and subsequent calls for a given
region, because is_hugepage_only_range() is mis-identifying the 4 kB
slices when the process has a 64 kB page size.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
An old bug crept back into the ICMP/ICMPv6 conntrack protocols: the timeout
values are defined as unsigned longs, the sysctl's maxsize is set to
sizeof(unsigned int). Use unsigned int for the timeout values as in the
other conntrack protocols.
Reported-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 8cc784ee (netfilter: change return types of match functions
for ebtables extensions) broke ebtables matches by inverting the
sense of match/nomatch.
Reported-by: Matt Cross <matthltc@us.ibm.com> Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 55b69e91 (netfilter: implement NFPROTO_UNSPEC as a wildcard
for extensions) broke revision probing for matches and targets that
are registered with NFPROTO_UNSPEC.
Fix by continuing the search on the NFPROTO_UNSPEC list if nothing
is found on the af-specific lists.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
All p54usb devices need a explicit termination packet, in oder to finish the pending transfer properly.
Else, the firmware could freeze, or simply drop the frame.
Signed-off-by: Christian Lamparter <chunkeey@web.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1197) fixes an error introduced recently. Since a
significant number of devices can't handle Set-Interface requests, we
no longer call usb_set_interface() when a driver unbinds from an
interface, provided the interface is already in altsetting 0. However
the interface still does get disabled, and the call to
usb_set_interface() was the only thing re-enabling it. Since the
interface doesn't get re-enabled, further attempts to use it fail.
So the patch adds a call to usb_enable_interface() when a driver
unbinds and the interface is in altsetting 0. For this to work
right, the interface's endpoints have to be re-enabled but their
toggles have to be left alone. Therefore an additional argument is
added to usb_enable_endpoint() and usb_enable_interface(), a flag
indicating whether or not the endpoint toggles should be reset.
This is a forward-ported version of a patch which fixes Bugzilla
#12301.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: David Roka <roka@dawid.hu> Reported-by: Erik Ekman <erik@kryo.se> Tested-by: Erik Ekman <erik@kryo.se> Tested-by: Alon Bar-Lev <alon.barlev@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket
results in masking of EOF (RDHUP) and error conditions on the socket
by an -EAGAIN return. Move the NONBLOCK check in tcp_splice_read()
to be after the EOF and error checks to fix this.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch bumps the release number of the driver.
Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch saves the MIER register contents before treating
interrupts, then restores them correcty at the end of the
interrupt routine.
Signed-off-by: Joe Chou <Joe.Chou@rdc.com.tw> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a reverse logic in the MDIO code.
Signed-off-by: Joe Chou <Joe.Chou@rdc.com.tw> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
This isn't enough without rmb() on the read side, but on the other
hand adding such barriers doesn't give any savings, so the lock is
added instead.
Reported-by: m0sia <m0sia@plotinka.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If FWD-TSN chunk is received with bad stream ID, the sctp will not do the
validity check, this may cause memory overflow when overwrite the TSN of
the stream ID.
When a fib6 table dump is prematurely ended, we won't unlink
its walker from the list. This causes all sorts of grief for
other users of the list later.
Reported-by: Chris Caputo <ccaputo@alt.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jarek Poplawski [Tue, 20 Jan 2009 22:06:26 +0000 (14:06 -0800)]
pkt_sched: sch_htb: Fix deadlock in hrtimers triggered by HTB
[ Upstream commit: none
This is a quick fix for -stable purposes. Upstream fixes these
problems via a large set of invasive hrtimer changes. ]
Most probably there is a (still unproven) race in hrtimers (before
2.6.29 kernels), which causes a corruption of hrtimers rbtree. This
patch doesn't fix it, but should let HTB avoid triggering the bug.
Reported-by: Denys Fedoryschenko <denys@visp.net.lb> Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Reported-by: Chris Caputo <ccaputo@alt.net> Tested-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1194b) makes usb-storage set the CAPACITY_HEURISTICS flag
for all devices made by Nokia, Nikon, or Motorola. These companies
seem to include the READ CAPACITY bug in all of their devices.
Since cell phones and digital cameras rely on flash storage, which
always has an even number of sectors, setting CAPACITY_HEURISTICS
shouldn't cause any problems. Not even if the companies wise up and
start making devices without the bug.
A large number of unusual_devs entries are now unnecessary, so the
patch removes them.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1189c) adds some hacks to usb-storage for dealing with
the growing problems involving bad capacity values and last-sector
accesses:
A new flag, US_FL_CAPACITY_OK, is created to indicate that
the device is known to report its capacity correctly. An
unusual_devs entry for Linux's own File-backed Storage Gadget
is added with this flag set, since g_file_storage always
reports the correct capacity and since the capacity need
not be even (it is determined by the size of the backing
file).
An entry in unusual_devs.h which has only the CAPACITY_OK
flag set shouldn't prejudice libusual, since the device will
work perfectly well with either usb-storage or ub. So a
new macro, COMPLIANT_DEV, is added to let libusual know
about these entries.
When a last-sector access fails three times in a row and
neither the FIX_CAPACITY nor the CAPACITY_OK flag is set,
we assume the last-sector bug is present. We replace the
existing status and sense data with values that will cause
the SCSI core to fail the access immediately rather than
retry indefinitely. This should fix the difficulties
people have been having with Nokia phones.
This version of the patch differs from the version accepted into the
mainline only in that it does not trigger a WARN() when an
odd-numbered last-sector access succeeds. In a stable kernel series
we don't want to go around spamming users' logs and consoles for no
good reason.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Taken from http://bugzilla.kernel.org/show_bug.cgi?id=12397
We're doing an sprintf of an 11-char string into an 11-char buffer.
Whoops. It breaks firmware uploading.
Reported-by: Jos-Vicente Gilabert <josevteg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The changes specific for Samsung laptops seem unapplicable to other
hardware models like ASUS. The mic inputs are lost on such hardware
by the change 5d5d5f43f1b835c375de9bd270cce030d16e2871.
This patch adds back the old laptop-eapd model, and create a new
model "samsung" for the new one specific to Samsung laptops with
automatic mic selection feature.
On the Asus Xonar D2 and D2X models, the SPI chip select signal for the
fourth DAC shares its pin with the serial clock for the EEPROM that
contains the PCI subdevice ID values. It appears that when DAC
registers are written and some other unknown conditions occur (probably
noise on the EEPROM's chip select line), the EEPROM gets overwritten
with garbage, which makes it impossible to properly detect the card
later.
Therefore, we better avoid DAC register writes and make sure that the
driver works with the DAC's registers' default values. Consequently,
the sample format is now I2S instead of left-justified (no user-visible
change), and the DAC's volume/mute registers cannot be used anymore
(volume changes are now done by the software volume plugin).
sched_clock() on ia64 is based on ar.itc, so is never
completely synchronized between cpus. On some platforms
(e.g. certain models of SGI Altix) it may be running at
radically different frequencies.
Based on a patch from Dimitri Sivanich which set this
just for SN2 && GENERIC kernels ... it is needed for
all ia64 machines.
Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
will not advance vruntime, and cause lags the other way around (which we
fixed with that initial patch: 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69
(sched: more accurate min_vruntime accounting).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Mike Galbraith <efault@gmx.de> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After XPC has been up and running on multiple partitions for any length of
time, if XPC on one of the partitions is stopped and restarted (either by
a rmmod/insmod or a system restart), it is possible for the XPCs running
on the other partitions to falsely detect a lack of heartbeat from the XPC
that was just restarted. This false detection will occur if the restarted
XPC comes up within the five-seconds preceding one of the other XPC's
heartbeat check (which occurs once every twenty seconds).
The detection of no heartbeat results in the detecting XPC deactivating
from the just restarted XPC. The only remedy is to restart one of the
XPCs and hope that one doesn't hit this five-second window on any of the
other partitions.
Signed-off-by: Dean Nelson <dcn@sgi.com> Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David points out that the idr_remove_all() function returns unused slabs
to the kmem cache, but needs to zero them first or else they will be
uninitialized upon next use. This causes crashes which have been observed
in the firewire subsystem.
He fixed this by zeroing the object before freeing it in idr_remove_all().
But we agree that simply removing the constructor and zeroing the object
at allocation time is simpler than relying upon slab constructor machinery
and might even be faster.
There are no known codesites which trigger this bug in 2.6.27 or 2.6.28.
The post-2.6.28 firewire changes are the only known triggerer.
There might of course be not-yet-discovered triggerers in 2.6.27 and
2.6.28, and there might be out-of-tree triggerers which are added to those
kernel versions. I'll let the -stable guys decide whether they want to
backport this fix.
Reported-by: David Moore <dcm@acm.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Kristian Hgsberg <krh@redhat.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
add USB ID for the Linksys WUSB200 Wireless-G Business USB Adapter to
rt73usb.
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In a PCIe hierarchy with a switch present, if the link state of an
endpoint device is changed, we must check the whole hierarchy from the
endpoint device to root port, and for each link in the hierarchy, the new
link state should be configured. Previously, the implementation checked
the state but forgot to configure the links between root port to switch.
Fixes Novell bz #448987.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
John Stanley reported EOVERFLOW errors in readdir from his self-build
glibc. I traced this down to glibc enabling d_off overflow checks
in one of the about five million different getdents implementations.
In 2.6.28 Dave Woodhouse moved our readdir double buffering required
for NFS4 readdirplus into nfsd and at that point we lost the capping
of the directory offsets to 32 bit signed values. Johns glibc used
getdents64 to even implement readdir for normal 32 bit offset dirents,
and failed with EOVERFLOW only if this happens on the first dirent in
a getdents call. I managed to come up with a testcase that uses
raw getdents and does the EOVERFLOW check manually. We always hit
it with our last entry due to the special end of directory marker.
The patch below is a dumb version of just putting back the masking,
to make sure we have the same behavior as in 2.6.27 and earlier.
I will work on a better and cleaner fix for 2.6.30.
Reported-by: John Stanley <jpsinthemix@verizon.net> Tested-by: John Stanley <jpsinthemix@verizon.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This assertion is incorrect for lockless pagecache. By definition if we
have an unpinned page that we are trying to take a speculative reference
to, it may become the tail of a compound page at any time (if it is
freed, then reallocated as a compound page).
It was still a valid assertion for the vmscan.c LRU isolation case, but
it doesn't seem incredibly helpful... if somebody wants it, they can
put it back directly where it applies in the vmscan code.
Noise floor calibration occasionally fails on Atheros hardware.
This is not fatal and can happen if there's simply too much
noise on the air. Ignoring the calibration error is the right
thing to do here, because when the error is ignored, the hardware
will still work, whereas if the error causes the driver to bail out
of a bigger configuration function and does not configure the tx
queues or the IMR (as is the case in reset.c), the hw no longer
works properly until the next reset.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Bob Copeland <me@bobcopeland.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While doing various error injection testing, such as cable
pulls and target moves, some issues were observed in handling
these events. This patch improves the way these events are handled
by increasing the delay waiting for the fabric to settle and also
changes the behavior of Link Up to break the CRQ to ensure everything
gets cleaned up properly on the VIOS.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adds a delay prior to retrying a failed NPIV login. This fixes
a scenario if the backing fibre channel adapter is getting reset
due to an EEH event, NPIV login will fail. Currently, ibmvfc
retries three times very quickly, resets the CRQ and tries one
more time. If the adapter is getting reset due to EEH, this isn't
enough time. This adds a delay prior to retrying a failed NPIV
login and also increments the number of retries.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When running Active Memory Sharing, the Collaborative Memory Manager
(CMM) may mark some pages as "loaned" with the hypervisor.
Periodically, the CMM will query the hypervisor for a loan request,
which is a single signed value. When kexec'ing into a kdump kernel,
the CMM driver in the kdump kernel is not aware of the pages the
previous kernel had marked as "loaned", so the hypervisor and the CMM
driver are out of sync. This results in the CMM driver getting a
negative loan request, which can then get treated as a large unsigned
value and can cause kdump to hang due to the CMM driver inflating too
large. Since there really is no clean way for the CMM driver in the
kdump kernel to clean this up, simply disable CMM in the kdump kernel.
This fixes hangs we were seeing doing kdump with AMS.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
An XFS workload showed up a bug in the lockless pagecache patch. Basically it
would go into an "infinite" loop, although it would sometimes be able to break
out of the loop! The reason is a missing compiler barrier in the "increment
reference count unless it was zero" case of the lockless pagecache protocol in
the gang lookup functions.
This would cause the compiler to use a cached value of struct page pointer to
retry the operation with, rather than reload it. So the page might have been
removed from pagecache and freed (refcount==0) but the lookup would not correctly
notice the page is no longer in pagecache, and keep attempting to increment the
refcount and failing, until the page gets reallocated for something else. This
isn't a data corruption because the condition will be detected if the page has
been reallocated. However it can result in a lockup.
Linus points out that ACCESS_ONCE is also required in that pointer load, even
if it's absence is not causing a bug on our particular build. The most general
way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.
Assembly of find_get_pages,
before:
.L220:
movq (%rbx), %rax #* ivtmp.1162, tmp82
movq (%rax), %rdi #, prephitmp.1149
.L218:
testb $1, %dil #, prephitmp.1149
jne .L217 #,
testq %rdi, %rdi # prephitmp.1149
je .L203 #,
cmpq $-1, %rdi #, prephitmp.1149
je .L217 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L218 #,
after:
.L212:
movq (%rbx), %rax #* ivtmp.1109, tmp81
movq (%rax), %rdi #, ret
testb $1, %dil #, ret
jne .L211 #,
testq %rdi, %rdi # ret
je .L197 #,
cmpq $-1, %rdi #, ret
je .L211 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L212 #,
(notice the obvious infinite loop in the first example, if page->count remains 0)
Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This enables beacons to come through on STA/IBSS.
It should fix sporadic connection issues. Right now
mac80211 expect beacons so give it beacons.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Here the intention is to get the pgd corresponding to the current process
and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
However, for kernel threads current->mm is NULL and hence pgd =
pgd_offset(init_mm, address) = pgd_ref which means the fault handler
returns without setting the pgd entry in the MM structure in the context
of which the kernel thread has faulted. This could lead to never-ending
faults and busy looping of kernel threads like pdflush. So, shouldn't the
pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
pgd_offset(current->active_mm ?: &init_mm, address);
We can use active_mm unconditionally because it should be always set.
I increased the delay step by step until loading of mvsas
reliably detected the drive 200 times in sequence. A much better
approach would be to monitor the hardware for some flag which
indicates that port detection has finished, but I do not have any
hardware documentation.
Signed-off-by: Reinhard Nissl <rnissl@gmx.de> Cc: Ke Wei <kewei@marvell.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The result from readlink is being used to index into the link name
buffer without checking whether it is a valid length. If readlink
returns an error this will fault or cause memory corruption.
aops->readpages() and its NFS helper readpage_async_filler() will only
be called to do readahead I/O for newly allocated pages. So it's not
necessary to test for the always 0 dirty/uptodate page flags.
The removal of nfs_wb_page() call also fixes a readahead bug: the NFS
readahead has been synchronous since 2.6.23, because that call will
clear PG_readahead, which is the reminder for asynchronous readahead.
More background: the PG_readahead page flag is shared with PG_reclaim,
one for read path and the other for write path. clear_page_dirty_for_io()
unconditionally clears PG_readahead to prevent possible readahead residuals,
assuming itself to be always called in the write path. However, NFS is one
and the only exception in that it _always_ calls clear_page_dirty_for_io()
in the read path, i.e. for readpages()/readpage().
This patch corrects the issue when one connects a Nokia 5200 cell
phone in data storage mode. If one uses an unpatched unusual_devs.h,
the following messages appear on /var/log/messages:
Dec 12 01:03:24 alberich kernel: usb 4-2: new full speed USB device
using uhci_hcd and address 3
Dec 12 01:03:25 alberich kernel: usb 4-2: configuration #1 chosen from 1 choice
Dec 12 01:03:25 alberich kernel: scsi10 : SCSI emulation for USB Mass
Storage devices
Dec 12 01:03:25 alberich kernel: usb 4-2: New USB device found,
idVendor=0421, idProduct=04bd
Dec 12 01:03:25 alberich kernel: usb 4-2: New USB device strings:
Mfr=1, Product=2, SerialNumber=3
Dec 12 01:03:25 alberich kernel: usb 4-2: Product: Nokia 5200
Dec 12 01:03:25 alberich kernel: usb 4-2: Manufacturer: Nokia
Dec 12 01:03:25 alberich kernel: usb 4-2: SerialNumber: 353930018354523
Dec 12 01:03:25 alberich kernel: usbcore: registered new interface driver ub
Dec 12 01:03:30 alberich kernel: scsi 10:0:0:0: Direct-Access
Nokia Nokia 5200 0000 PQ: 0 AN
SI: 4
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] 3985409 512-byte
hardware sectors (2041 MB)
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Write Protect is off
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Assuming drive
cache: write through
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] 3985409 512-byte
hardware sectors (2041 MB)
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Write Protect is off
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Assuming drive
cache: write through
Dec 12 01:03:30 alberich kernel: sdg: sdg1
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Attached SCSI removable disk
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: Attached scsi generic sg9 type 0
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Sense Key : No
Sense [current]
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Add. Sense: No
additional sense information
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Sense Key : No
Sense [current]
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Add. Sense: No
additional sense information
Dec 12 01:03:30 alberich kernel: sd 10:0:0:0: [sdg] Sense Key : No
Sense [current]
(...)
The MicroSD card in the phone remains inaccessible and finally the
cell phone turns itself off. The patch solves this problem and makes
the cell phone fully accessible:
I've found necessary to use the FL_US_CAPACITY_FIX switch, as without
it the cell phone is recognized but it went berserk when performing
low-level functions on it (a fdisk -l /dev/uba for example).
lsusb -v output follows:
Bus 004 Device 004: ID 0421:04bd Nokia Mobile Phones
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0421 Nokia Mobile Phones
idProduct 0x04bd
bcdDevice 6.03
iManufacturer 1 Nokia
iProduct 2 Nokia 5200
iSerial 3 353930018354523
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 32
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk (Zip)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Device Status: 0x0001
Self Powered
Signed-off-by: Paulo Afonso Graner Fessel <pfessel@gmail.com> Signed-off-by: Phil Dibowitz <phil@ipom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I have another Argosy USB storage device, which has the same problem
with the Argosy USB storage device already fixed in 2.6.27.7. But this
device has another product ID (840:84), so this patch adds a new entry
into unusual_devs to fix the mount problem.
I enclose here two patches: one against 2.6.27.8, and another against
the latest linus-git tree.
The information about the Argosy device is like below:
#lsusb -v -d 840:84
Bus 005 Device 005: ID 0840:0084 Argosy Research, Inc.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0840 Argosy Research, Inc.
idProduct 0x0084
bcdDevice 0.01
iManufacturer 1 Generic
iProduct 2 USB 2.0 Storage Device
iSerial 3 8400000000002549
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 32
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 2mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk (Zip)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Device Qualifier (for other device speed):
bLength 10
bDescriptorType 6
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
bNumConfigurations 1
Device Status: 0x0000
(Bus Powered)
Before the patch, dmesg returns a lot of information like below (my
dmesg is overflown):
....
[ 138.833390] sd 7:0:0:0: [sdb] Add. Sense: No additional sense information
[ 138.877631] sd 7:0:0:0: [sdb] Sense Key : No Sense [current]
[ 138.877643] sd 7:0:0:0: [sdb] Add. Sense: No additional sense information
[ 138.921906] sd 7:0:0:0: [sdb] Sense Key : No Sense [current]
[ 138.921923] sd 7:0:0:0: [sdb] Add. Sense: No additional sense information
....
After the fix, dmesg returns below information:
....
usb 5-1: new high speed USB device using ehci_hcd and address 5
usb 5-1: configuration #1 chosen from 1 choice
scsi7 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 5
usb-storage: waiting for device to settle before scanning
usb-storage: device scan complete
scsi 7:0:0:0: Direct-Access HTS54808 0M9AT00 MG4O PQ: 0 ANSI: 0
sd 7:0:0:0: [sdb] 156301488 512-byte hardware sectors (80026 MB)
sd 7:0:0:0: [sdb] Write Protect is off
sd 7:0:0:0: [sdb] Mode Sense: 03 00 00 00
sd 7:0:0:0: [sdb] Assuming drive cache: write through
sd 7:0:0:0: [sdb] 156301488 512-byte hardware sectors (80026 MB)
sd 7:0:0:0: [sdb] Write Protect is off
sd 7:0:0:0: [sdb] Mode Sense: 03 00 00 00
sd 7:0:0:0: [sdb] Assuming drive cache: write through
sdb: sdb1
sd 7:0:0:0: [sdb] Attached SCSI disk
sd 7:0:0:0: Attached scsi generic sg1 type 0
kjournald starting. Commit interval 5 seconds
EXT3 FS on sdb1, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Cc: Kuniyasu Suzaki <k.suzaki@aist.go.jp> Signed-off-by: Nguyen Anh Quynh <aquynh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
seen occasional 5-second hangs from telnet. telnetd calls select with a
NULL timeout, but with the new kernel, the system call occasionally
returns 0, which causes telnet to call sleep (5). This did not happen
with earlier kernels.
The code in sys_pselect7 looks a bit strange, in particular the variable
"to" is initialized to NULL, then changed if a non-null timeout was
passed in, but not used further. It needs to be passed to
core_sys_select instead of &end_time.