Use a similar approach to the SMB session sharing. Add a list of tcons
attached to each SMB session. Move the refcount to non-atomic. Protect
all of the above with the cifs_tcp_ses_lock. Add functions to
properly find and put references to the tcons.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We do this by abandoning the global list of SMB sessions and instead
moving to a per-server list. This entails adding a new list head to the
TCP_Server_Info struct. The refcounting for the cifsSesInfo is moved to
a non-atomic variable. We have to protect it by a lock anyway, so there's
no benefit to making it an atomic. The list and refcount are protected
by the global cifs_tcp_ses_lock.
The patch also adds a new routines to find and put SMB sessions and
that properly take and put references under the lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The code that allows these structs to be shared is extremely racy.
Disable the sharing of SMB and tcon structs for now until we can
come up with a way to do this that's race free.
We want to continue to share TCP sessions, however since they are
required for multiuser mounts. For that, implement a new (hopefully
race-free) scheme. Add a new global list of TCP sessions, and take
care to get a reference to it whenever we're dealing with one.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We're currently declaring both a sockaddr_in and sockaddr6_in on the
stack, but we really only need storage for one of them. Declare a
sockaddr struct and cast it to the proper type. Also, eliminate the
protocolType field in the TCP_Server_Info struct. It's redundant since
we have a sa_family field in the sockaddr anyway.
We may need to revisit this if SCTP is ever implemented, but for now
this will simplify the code.
CIFS over IPv6 also has a number of problems currently. This fixes all
of them that I found. Eventually, it would be nice to move more of the
code to be protocol independent, but this is a start.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Also adds two lines missing from the previous patch (for the need reconnect flag in the
/proc/fs/cifs/DebugData handling)
The new global_cifs_sock_list is added, and initialized in init_cifs but not used yet.
Jeff Layton will be adding code in to use that and to remove the GlobalTcon and GlobalSMBSession
lists.
CC: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In preparation for Jeff's big umount/mount fixes to remove the possibility of
various races in cifs mount and linked list handling of sessions, sockets and
tree connections, this patch cleans up some repetitive code in cifs_mount,
and addresses a problem with ses->status and tcon->tidStatus in which we
were overloading the "need_reconnect" state with other status in that
field. So the "need_reconnect" flag has been broken out from those
two state fields (need reconnect was not mutually exclusive from some of the
other possible tid and ses states). In addition, a few exit cases in
cifs_mount were cleaned up, and a problem with a tcon flag (for lease support)
was not being set consistently for the 2nd mount of the same share
CC: Jeff Layton <jlayton@redhat.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is an implementation of David Miller's suggested fix in:
https://bugzilla.redhat.com/show_bug.cgi?id=470201
It has been updated to use wait_event() instead of
wait_event_interruptible().
Paraphrasing the description from the above report, it makes sendmsg()
block while UNIX garbage collection is in progress. This avoids a
situation where child processes continue to queue new FDs over a
AF_UNIX socket to a parent which is in the exit path and running
garbage collection on these FDs. This contention can result in soft
lockups and oom-killing of unrelated processes.
Signed-off-by: dann frazier <dannf@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When resizing a CQ, MTTs associated with the old CQE buffer were not
freed. As a result, if any app used resize CQ repeatedly, all MTTs
were eventually exhausted, which led to all memory registration
operations failing until the driver is reloaded.
Once the RESIZE_CQ command returns successfully from FW, FW no longer
accesses the old CQ buffer, so it is safe to deallocate the MTT
entries used by the old CQ buffer.
Finally, if the RESIZE_CQ command fails, the MTTs allocated for the
new CQEs buffer also need to be de-allocated.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1416>.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add another model ID of a broken firmware to prevent early I/O errors
by acesses at the end of the disk. Reported at linux1394-user,
http://marc.info/?t=122670842900002
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add another model ID of a broken firmware to prevent early I/O errors
by acesses at the end of the disk. Reported at linux1394-user,
http://marc.info/?t=122670842900002
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For some unknown reason at Steven Rostedt added in disabling of the SPE
instruction generation for e500 based PPC cores in commit 6ec562328fda585be2d7f472cfac99d3b44d362a.
We are removing it because:
1. It generates e500 kernels that don't work
2. its not the correct set of flags to do this
3. we handle this in the arch/powerpc/Makefile already
4. its unknown in talking to Steven why he did this
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Tested-and-Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the "hpwdt" module is loaded (even if the /dev/watchdog device is not
opened), then kdump does not work. The panic kernel either does not start at
all or crash in various places.
The problem is that hpwdt_pretimeout is registered with register_die_notifier()
with the highest possible priority. Because it returns NOTIFY_STOP, the
crash_nmi_callback which is also registered with register_die_notifier()
is never executed. This causes the shutdown of other CPUs to fail.
Reverting the order is no option: The crash_nmi_callback executes HLT
and so never returns normally. Because of that, it must be executed as
last notifier, which currently is done.
So, that patch returns NOTIFY_OK to keep the crash_nmi_callback executed.
Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The address provided by the SMBIOS/DMI CRU information is mapped via
ioremap() in the virtual address space. However, since the address is
executed (i.e. call'd), we need to set that pages as executable.
Without that, I get following oops on a HP ProLiant DL385 G2
machine with BIOS from 05/29/2008 when I trigger crashdump:
A mutex_unlock(&gang->aff_mutex) in spufs_create_context() is missing
in case spufs_context_open() fails. As a result, spu_create syscall
and spu_get_idle() may block.
This patch adds the mutex_unlock.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Acked-by: Andre Detsch <adetsch@br.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, we can end up in an infinite loop if we get a signal
while the kernel has faulted in spufs_ps_fault. Eg:
alarm(1);
write(fd, some_spu_psmap_register_address, 4);
- the write's copy_from_user will fault on the ps mapping, and
signal_pending will be non-zero. Because returning from the fault
handler will never clear TIF_SIGPENDING, so we'll just keep faulting,
resulting in an unkillable process using 100% of CPU.
This change returns VM_FAULT_SIGBUS if there's a fatal signal pending,
letting us escape the loop.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[CIFS] Reduce number of socket retries in large write path
CIFS in some heavy stress conditions cifs could get EAGAIN
repeatedly in smb_send2 which led to repeated retries and eventually
failure of large writes which could lead to data corruption.
There are three changes that were suggested by various network
developers:
1) convert cifs from non-blocking to blocking tcp sendmsg
(we left in the retry on failure)
2) change cifs to not set sendbuf and rcvbuf size for the socket
(let tcp autotune the buffer sizes since that works much better
in the TCP stack now)
3) if we have a partial frame sent in smb_send2, mark the tcp
session as invalid (close the socket and reconnect) so we do
not corrupt the remaining part of the SMB with the beginning
of the next SMB.
This does not appear to hurt performance measurably and has
been run in various scenarios, but it definately removes
a corruption that we were seeing in some high stress
test cases.
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Belkin F5D7050rev5000de (id 050d:705e) has the Realtek RTL8187B chip
and works with the 2.6.27 driver.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some recent Seagate harddrives have firmware bug which causes FLUSH
CACHE to timeout under certain circumstances if NCQ is being used.
This can be worked around by disabling NCQ and fixed by updating the
firmware. Implement ATA_HORKAGE_FIRMWARE_UPDATE and blacklist these
devices.
The wiki page has been updated to contain information on this issue.
Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
igb_set_wol(). Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Impact: make warning message disappear - functionality unchanged
Problems with bogus IRQ0 override of those laptops should be fixed
with commits
x86: SB600: skip IRQ0 override if it is not routed to INT2 of IOAPIC
x86: SB450: skip IRQ0 override if it is not routed to INT2 of IOAPIC
that introduce early-quirks based on chipset configuration.
For further information, see
http://bugzilla.kernel.org/show_bug.cgi?id=11516
Instead of removing the related dmi-quirks completely we'd like to
keep them for (at least) one kernel version -- to double-check whether
the early-quirks really took effect. But the dmi-quirks need to be
called after early-quirks are executed. With this patch calling
sequence for dmi-quriks is changed as follows:
The NUMA code on x86_32 creates special memory mapping that allows
each node's pgdat to be located in this node's memory. For this
purpose it allocates a memory area at the end of each node's memory
and maps this area so that it is accessible with virtual addresses
belonging to low memory. As a result, if there is high memory,
these NUMA-allocated areas are physically located in high memory,
although they are mapped to low memory addresses.
Our hibernation code does not take that into account and for this
reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and
with high memory present. Fix this by adding a special mapping for
the NUMA-allocated memory areas to the temporary page tables created
during the last phase of resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When reserving space for the hypervisor the Xen paravirt backend adds
an extra two pages (this was carried forward from the 2.6.18-xen tree
which had them "for safety"). Depending on various CONFIG options this
can cause the boot time fixmaps to span multiple PMDs which is not
supported and triggers a WARN in early_ioremap_init().
The bad_bios_dmi_table() quirk never triggered because we do DMI setup
too late. Move it a bit earlier.
There is no real reason to reserve these two extra pages and the
fixmap already incorporates FIX_HOLE which serves the same
purpose. None of the other callers of reserve_top_address do this.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A workaround for AMD CPU family 11h erratum 311 might cause that the
P-state Status Register shows a "current P-state" which is larger than
the "current P-state limit" in P-state Current Limit Register. For the
wrong P-state value there is no ACPI _PSS object defined and
powernow-k8/cpufreq can't determine the proper CPU frequency for that
state.
As a consequence this can cause a panic during boot (potentially with
all recent kernel versions -- at least I have reproduced it with
various 2.6.27 kernels and with the current .28 series), as an
example:
In short, aftereffect of the wrong P-state is that
cpufreq_stats_update() uses "-1" as index for some array in
cpufreq_stats_update (unsigned int cpu)
{
...
if (stat->time_in_state)
stat->time_in_state[stat->last_index] =
cputime64_add(stat->time_in_state[stat->last_index],
cputime_sub(cur_time, stat->last_time));
...
}
Fortunately, the wrong P-state value is returned only if the core is
in P-state 0. This fix solves the problem by detecting the
out-of-range P-state, ignoring it, and using "0" instead.
Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that the PCI core manages the 'name' for each individual
hotplug driver, and all drivers (except rpaphp) have been converted
to use hotplug_slot_name(), there is no need for the PCI hotplug
core to drag around its own copy of name either.
We no longer need to manage our version of hotplug_slot->name
since the PCI and hotplug core manage it on our behalf.
Update the sn_hp_slot_private_alloc() interface to fill in
the correct name for us, as that function already has all
the parameters needed to determine the name.
rpaphp tends to use slot->name directly everywhere, and doesn't
ever need slot->hotplug_slot->name.
struct hotplug_slot->name is going away, so convert rpaphp directly
manipulate its own slot->name everywhere, and don't bother touching
slot->hotplug_slot->name.
In preparation for cleaning up the various hotplug drivers
such that they don't have to manage their own 'name' parameters
anymore, we provide the following convenience functions:
pci_slot_name()
hotplug_slot_name()
These helpers will be used by individual hotplug drivers.
Prevent callers of pci_create_slot() from registering slots with
duplicate names. This condition occurs most often when PCI hotplug
drivers are loaded on platforms with broken firmware that assigns
identical names to multiple slots.
We now rename these duplicate slots on behalf of the user.
If firmware assigns the name N to multiple slots, then:
The first registered slot is assigned N
The second registered slot is assigned N-1
The third registered slot is assigned N-2
etc.
This is the permanent fix mentioned in earlier commits d6a9e9b4 and 167e782e (shpchp/pciehp: Rename duplicate slot name...).
We take advantage of the new 'hotplug' parameter in pci_create_slot()
to prevent a slot create/rename race between hotplug drivers and
detection drivers.
The hotplug driver creates the slot with its desired name, and then
releases the semaphore. Now, the detection driver tries to create
the same slot, but it already exists. We don't care about renaming,
so return the existing slot.
The detection driver creates the slot with name "X". Then the hotplug
driver tries to create the same slot, but wants the name "Y" instead.
We detect that we're trying to create the same slot and that we also
want a rename, so rename the slot to "Y" and return.
Two separate hotplug drivers are attempting to claim the slot and
are passing valid hotplug_slot args to pci_create_slot(). We detect
that the slot already has a ->hotplug callback, prevent a rename,
and return -EBUSY.
Slot detection drivers can co-exist with hotplug drivers. The names
of the detected/claimed slots may be different depending on module
load order.
For legacy reasons, we need to allow hotplug drivers to override
the slot name if a detection driver is loaded first (and they find
the same slots).
Creating and overriding slot names should be an atomic operation,
otherwise you get a locking nightmare as various drivers race to
call pci_create_slot().
pci_create_slot() is already serialized by grabbing the pci_bus_sem.
We update the API and add a 'hotplug' param, which is:
set if the caller is a hotplug driver
NULL if the caller is a detection driver
pci_create_slot() does not actually use the 'hotplug' parameter in this
patch. A later patch will add the logic that uses it.
Update pci_hp_register() to take a const char *name parameter.
The motivation for this is to clean up the individual hotplug
drivers so that each one does not have to manage its own name.
The PCI core should be the place where we manage the name.
We update the interface and all callsites first, in a
"no functional change" manner, and clean up the drivers later.
after noticing that my Netgear FA411 (PCMCIA-NIC) [1] stopped working with
the release of the 2.6.25 kernel (sidux-version), I checked the
respective driver sources and noticed that the pcnet_cs driver bailed
out with "use axnet_cs instead" for the Netgear FA411, but axnet_cs
doesn't claim this ID.
I compiled a kernel with the PCMCIA-ID for the netgear card moved to
axnet_cs from pcnet_cs which worked. I then contacted sidux-kernel
maintainer Stefan Lippers-Hollmann who turned the info into this patch
and integrated it into the kernel:
This works for me and AFAIK there were no reports of any breakage for
other devices on sidux-support.
This looks like a trivial patch, but since I have very limited
experience with kernel modifications I might be woefully wrong there.
But if there are no side effects of this patch, is it possible to get it
into the official kernel?
I can provide more detailed information on the affected hardware if
necessary.
From: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Date: Sat, 1 Nov 2008 23:53:04 +0000
Subject: PCMCIA: move PCMCIA ID for Netgear FA411 from pcnet_cs to axnet_cs:
Since kernel 2.6.25, commit 61da96be07ec860e260ca4af0199b9d48d000b80
(pcnet_cs: if AX88190-based card, printk "use axnet_cs instead" message.),
pcnet_cs bails out with "use axnet_cs instead" for the Netgear FA411, but
axnet_cs doesn't claim this ID.
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Signed-off-by: Cord Walter <qord@cwalter.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We should only tell the hardware its capable of DMA'ing
to us only what we asked dev_alloc_skb(). Prior to this
it is possible a large RX'd frame could have corrupted
DMA data but for us but we were saved only because we
were previously also pci_map_single()'ing the same large
value. The issue prior to this though was we were unmapping
a smaller amount which the prior DMA patch fixed.
Signed-off-by: Bennyam Malavazi <Bennyam.Malavazi@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This should fix the SW-IOMMU bounce buffer starvation
seen ok kernel.org bugzilla 11811:
http://bugzilla.kernel.org/show_bug.cgi?id=11811
Users on MacBook Pro 3.1/MacBook v2 would see something like:
DMA: Out of SW-IOMMU space for 4224 bytes at device 0000:0b:00.0
Unfortunately its only easy to trigger on MacBook Pro 3.1/MacBook v2
so far so its difficult to debug (even with swiotlb=force).
We were pci_unmap_single()'ing less bytes than what we called
for with pci_map_single() and as such we were starving
the swiotlb from its 64MB amount of bounce buffers. We remain
consistent and now always use sc->rxbufsize for RX. While at
it we update the beacon DMA maps as well to only use the data
portion of the skb, previous to this we were pci_map_single()'ing
more data for beaconing than what we tell the hardware it can use,
therefore pushing more iotlb abuse.
Still not sure why this is so easily triggerable on
MacBook Pro 3.1, it may be the hardware configuration
tends to use more memory > 3GB mark for DMA.
Signed-off-by: Maciej Zenczykowski <zenczykowski@gmail.com> Signed-off-by: Bennyam Malavazi <Bennyam.Malavazi@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Impact: fix boot crash on AMD IOMMU if CONFIG_GART_IOMMU is off
Currently these macros evaluate to a no-op except the kernel is compiled
with GART or Calgary support. But we also need these macros when we have
SWIOTLB, VT-d or AMD IOMMU in the kernel. Since we always compile at
least with SWIOTLB we can define these macros always.
This patch is also for stable backport for the same reason the SWIOTLB
default selection patch is.
Impact: widen the reach of the low-memory-protect DMI quirk
Phoenix BIOSes variously identify their vendor as "Phoenix Technologies,
LTD" or "Phoenix Technologies LTD" (without the comma.)
This patch makes the identification string in the bad_bios_dmi_table
more general (following a suggestion by Ingo Molnar), so that both
versions are handled.
Again, the patched file compiles cleanly and the patch has been tested
successfully on my machine.
The netmos_9xx5_combo type assumes that PCI SSID provides always the
correct value for the number of parallel and serial ports, but there are
indeed broken devices with wrong numbers, which may result in Oops.
This patch simply adds the check of the array range.
The Zepto 6615WD laptop (rebranded Inventec Symphony system) needs a
key release quirk for its volume keys to work. The attached patch adds
the quirk to the atkbd driver.
This fixes a regression introduced by 2c6e6db41f01b6b4eb98809350827c9678996698
"Minimize per_cpu reservations." That patch incorrectly used information about
what CPUs are possible that was not yet initialized by ACPI. The end result
was that per_cpu structures for offline CPUs were not initialized causing a
NULL pointer reference.
Since we cannot do the full acpi_boot_init() call any earlier, the simplest
fix is to just parse the MADT for SAPIC entries early to find the CPU
info. This should also allow for some cleanup of the code added by the
"Minimize per_cpu reservations". This patch just fixes the regressions, the
cleanup will come in a later patch.
Signed-off-by: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Alex Chiang <achiang@hp.com> CC: Robin Holt <holt@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex. That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount. We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.
Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done. Cleanup is just deactivate_super().
However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords. That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.
We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable. OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e. that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().
So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong. We had
to drop ih->mutex before we could grab ->s_umount. So the watch
could've been gone already.
That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer. If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address. That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that. Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.
So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check. If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.
And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.
It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface. Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds. To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced. A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:
max_user_instances = Maximum number of devices - per user
max_user_watches = Maximum number of "watched" fds - per user
The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM. As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users. The
default value for "max_user_instances" is set to 128, that should be
enough too.
This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already
listed, so that should be ok.
Any user on existing parisc 32- and 64bit-kernels can easily crash
the kernel and as such enforce a DSO.
A simple testcase is available here:
http://gsyprf10.external.hp.com/~deller/crash.tgz
The problem is introduced by the fact, that the handle_interruption()
crash handler calls the show_regs() function, which in turn tries to
unwind the stack by calling parisc_show_stack(). Since the stack contains
userspace addresses, a try to unwind the stack is dangerous and useless
and leads to the crash.
The fix is trivial: For userspace processes
a) avoid to unwind the stack, and
b) avoid to resolve userspace addresses to kernel symbol names.
While touching this code, I converted print_symbol() to %pS
printk formats and made parisc_show_stack() static.
An initial patch for this was written by Kyle McMartin back in August:
http://marc.info/?l=linux-parisc&m=121805168830283&w=2
Compile and run-tested with a 64bit parisc kernel.
Signed-off-by: Helge Deller <deller@gmx.de> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A problem was found while reviewing the code after Bugzilla bug
http://bugzilla.kernel.org/show_bug.cgi?id=11796.
In ipc_addid(), the newly allocated ipc structure is inserted into the
ipcs tree (i.e made visible to readers) without locking it. This is not
correct since its initialization continues after it has been inserted in
the tree.
This patch moves the ipc structure lock initialization + locking before
the actual insertion.
kunmap() takes as argument the struct page that orginally got kmap()'d,
however the sg_miter_stop() function passed it the kernel virtual address
instead, resulting in weird stuff.
Somehow I ended up fixing this bug by accident while looking for a bug in
the same area.
Reported-by: kerneloops.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are already various drivers having bigger label than 12 bytes. Most
of them fit well under 20 bytes but make column width exact so that
oversized labels don't mess up output alignment.
Signed-off-by: Jarkko Nikula <jarkko.nikula@nokia.com> Acked-by: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When booting in a direct color mode, the penguin has dirty feet, i.e.,
some pixels have the wrong color. This is caused by
fb_set_logo_directpalette() which does not initialize the last 32 palette
entries.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fixes a data corruption bug in pxa2xx_spi.c when operating in full duplex
mode with DMA and using buffers that overlap.
SPI transmit and receive buffers are allowed to be the same or to overlap.
However, this driver fails if such overlap is attempted in DMA mode
because it maps the rx and tx buffers in the wrong order. By mapping
DMA_FROM_DEVICE (read) before DMA_TO_DEVICE (write), it invalidates the
cache before flushing it, thus discarding data which should have been
transmitted.
The patch corrects the order of mapping. This bug exists in all versions
of pxa2xx_spi.c; similar bugs are in the drivers for two other SPI
controllers (au1500, imx).
A version of this patch has been tested on kernel 2.6.20 using
verification of loopback data with: random transfer length, random
bits-per-word, random positive offsets (both larger and smaller than
transfer length) between the start of the rx and tx buffers, and varying
clock rates.
Signed-off-by: Ned Forrester <nforrester@whoi.edu> Cc: Vernon Sauder <vernoninhand@gmail.com> Cc: J. Scott Merritt <merrij3@rpi.edu> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I have received some reports of out-of-memory errors on some older AMD
architectures. These errors are what I would expect to see if
crypt_stat->key were split between two separate pages. eCryptfs should
not assume that any of the memory sent through virt_to_scatterlist() is
all contained in a single page, and so this patch allocates two
scatterlist structs instead of one when processing keys. I have received
confirmation from one person affected by this bug that this patch resolves
the issue for him, and so I am submitting it for inclusion in a future
stable release.
Note that virt_to_scatterlist() runs sg_init_table() on the scatterlist
structs passed to it, so the calls to sg_init_table() in
decrypt_passphrase_encrypted_session_key() are redundant.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Reported-by: Paulo J. S. Silva <pjssilva@ime.usp.br> Cc: "Leon Woestenberg" <leon.woestenberg@gmail.com> Cc: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Impact: properly rebuild sched-domains on kmalloc() failure
When cpuset failed to generate sched domains due to kmalloc()
failure, the scheduler should fallback to the single partition
'fallback_doms' and rebuild sched domains, but now it only
destroys but not rebuilds sched domains.
The regression was introduced by:
| commit dfb512ec4834116124da61d6c1ee10fd0aa32bd6
| Author: Max Krasnyansky <maxk@qualcomm.com>
| Date: Fri Aug 29 13:11:41 2008 -0700
|
| sched: arch_reinit_sched_domains() must destroy domains to force rebuild
After the above commit, partition_sched_domains(0, NULL, NULL) will
only destroy sched domains and partition_sched_domains(1, NULL, NULL)
will create the default sched domain.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch is required for all AMD SB600 revisions to avoid USB subsystem hang
symptom. The USB subsystem hang symptom is observed when the system has
multiple USB devices connected to it. In some cases a USB hub may be required
to observe this symptom.
This patch is required for AMD SB700 south bridge revision A12 and A13 to avoid
USB subsystem hang symptom. The USB subsystem hang symptom is observed when the
system has multiple USB devices connected to it. In some cases a USB hub may be
required to observe this symptom.
This patch works around the problem by correcting the internal register setting
that will help by changing the behavior of the internal logic to avoid the
USB subsystem hang issue. The change in the behavior of the logic does not
impact the normal operation of the USB subsystem.
There's a bug in the usbmon binary reader: When using read() to fetch
the packets and a packet's data is partially read, the next read call
will once again return up to len_cap bytes of data. The b_read counter
is not regarded when determining the remaining chunk size.
So, when dumping USB data with "cat /dev/usbmon0 > usbmon.trace" while
reading from a USB storage device and analyzing the dump file
afterwards it will get out of sync after a couple of packets.
Signed-off-by: Ingo van Lil <inguin@gmx.de> Signed-off-by: Pete Zaitcev <zaitcev@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Somewhere in the conversion of the RNDIS gadget code to the new
framework, the descriptor of its data interface seems to have
been copied from the CDC Ethernet driver. Unfortunately that
means it got a nonzero altsetting ... which is incorrect. Issue
uncovered by Richard Röjfors <richard.rojfors@endian.se>.
This patch fixes that problem, and resolves at least some cases
of Windows XP bluescreening itself.
Tested-by: Richard Röjfors <richard.rojfors@endian.se>. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out that atomic_inc_return() returns the *new* value
not the original one, so the logic in rndis_response_available()
kept the first RNDIS response notification from getting out.
This prevented interoperation with MS-Windows (but not Linux).
Fix this to make RNDIS behave again.
Signed-off-by: Richard Röjfors <richard.rojfors@endian.se> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Restart current transaction if we recieved unexpected GPEs instead
of needed ones.
http://bugzilla.kernel.org/show_bug.cgi?id=11896
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1155) fixes a bug in usbcore. When interfaces are
deleted, either because the device was disconnected or because of a
configuration change, the extra attribute files and child endpoint
devices may get left behind. This is because the core removes them
before calling device_del(). But during device_del(), after the
driver is unbound the core will reinstall altsetting 0 and recreate
those extra attributes and children.
The patch prevents this by adding a flag to record when the interface
is in the midst of being unregistered. When the flag is set, the
attribute files and child devices will not be created.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1165) makes a few small changes in the logic used by
ehci-hcd when it encounters a controller error:
Instead of printing out the masked status, it prints the
original status as read directly from the hardware.
It doesn't check for the STS_HALT status bit before taking
action. The mere fact that the STS_FATAL bit is set means
that something bad has happened and the controller needs to
be reset. With the old code this test could never succeed
because the STS_HALT bit was masked out from the status.
I anticipate that this will prevent the occasional "irq X: nobody cared"
problem people encounter when their EHCI controllers die.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1164) fixes a bug in the EHCI scheduler. The interval
value it uses is already in linear format, not logarithmically coded.
The existing code can sometimes crash the system by trying to divide
by zero.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fixes an obvious bug in cdc-acm by avoiding a recursive lock on
acm_start_wb()'s error path. Should apply towards 2.6.27 stable and
2.6.28.
=============================================
[ INFO: possible recursive locking detected ]
2.6.27-2-pae #109
---------------------------------------------
python/31449 is trying to acquire lock:
(&acm->write_lock){++..}, at: [<f89a0348>] acm_start_wb+0x5c/0x7b [cdc_acm]
but task is already holding lock:
(&acm->write_lock){++..}, at: [<f89a04fb>] acm_tty_write+0xe1/0x167 [cdc_acm]
other info that might help us debug this:
2 locks held by python/31449:
#0: (&tty->atomic_write_lock){--..}, at: [<c0260fae>] tty_write_lock+0x14/0x3b
#1: (&acm->write_lock){++..}, at: [<f89a04fb>] acm_tty_write+0xe1/0x167 [cdc_acm]
Add ehci_shutdown() or ohci_shutdown() calls to the USB
PS3 bus glue. ehci_shutdown() and ohci_shutdown() do some
controller specific cleanups not done by usb_remove_hcd().
Fixes errors on shutdown or reboot similar to these:
ps3-ehci-driver sb_07: HC died; cleaning up
irq 51: nobody cared (try booting with the "irqpoll" option)
This fixes a deadlock appearing with some USB peripheral drivers
when running CDC ACM gadget code.
The newish (2.6.27) CDC ACM event notification mechanism sends
messages (IN to the host) which are short enough to fit in most
FIFOs. That means that with some peripheral controller drivers
(evidently not the ones used to verify the notification code!!)
the completion callback can be issued before queue() returns.
The deadlock would come because the completion callback and the
event-issuing code shared a spinlock. Fix is trivial: drop
that lock while queueing the message.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The reason, is that ret is initialized with ENODEV instead of 0 _or_
the kmem cache is not freed in error case with no bus binding.
The difference between OF+PCI and OF only is
| 15148 804 32 15984 3e70 isp1760-of-pci.o
| 13748 676 8 14432 3860 isp1760-of.o
about 1.5 KiB.
Until there is a checkbox where the user *must* select atleast one item,
and may select multiple entries I don't make it selectable anymore.
Having a driver which can't be used under any circumstances is broken
anyway and I've seen distros shipping it that way.
Reported-by: Roland Kletzing <devzero@web.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
dpt_i2o.c::adpt_i2o_to_scsi() reads the value at (reply+5) which
should contain the length in bytes of the transferred data. This
would be correct if reply was a u32 *. However it is a void * here,
so we need to read the value at (reply+20) instead.
The value at (reply+5) is usually 0xff0000, which is apparently
'large enough' and didn't cause any trouble until 2.6.27 where
Mike Reed noted
(https://bugzilla.novell.com/show_bug.cgi?id=421330) that the
driver was incorrectly returning a SUCCESS status if the driver's
request to the firmware to abort a command failed. By doing so,
the mid-layer believed, incorrectly, that the command has
completed and has been returned (ultimately clearing
scsi_cmnd.request_buffer) yet the driver still has the command.
What should correctly happen is a mid-layer escalation
(device-reset, etc.) of recovery during which the driver will
eventually return the outstanding commands to the mid-layer.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 69961c375288bdab7604e0bb1c8d22999bb8a347 ("[PATCH] m68k/Atari:
Interrupt updates") added a BUG_ON() with an incorrect upper bound
comparison, which causes an early crash on VME boards, where IRQ_USER is
8, cnt is 192 and NR_IRQS is 200.
Reported-by: Stephen N Chivers <schivers@csc.com.au> Tested-by: Kars de Jong <jongk@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
According to ACPI spec when the status of some device is not present
but functional, the device is valid and the children of this device
should be enumerated. It means that the device should be added to
linux acpi device tree. But the device driver for this device should not
be loaded.
The detailed info can be found in the section 6.3.7 of ACPI 3.0b spec.
_STA may return bit 0 clear (not present) with bit 3 set (device is
functional). This case is used to indicate a valid device for which no
device driver should be loaded (for example, a bridge device.).
Children of this device may be present and valid. OS should continue
enumeration below a device whose _STA returns this bit combination
http://bugzilla.kernel.org/show_bug.cgi?id=3358
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Holger Macht <hmacht@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cpu_coregroup_map used to grab a mutex on s390 since it was only
called from process context.
Since c7c22e4d5c1fdebfac4dba76de7d0338c2b0d832 "block: add support
for IO CPU affinity" this is not true anymore.
It now also gets called from softirq context.
To prevent possible deadlocks change this in architecture code and
use a spinlock instead of a mutex.
Not all tvaudio chips allow controlling bass/treble. So, the driver
has a table with a flag to indicate if the chip does support it.
Unfortunately, the handling of this logic were broken for a very long
time (probably since the first module version). Due to that, an OOPS
were generated for devices that don't support bass/treble.
This were the resulting OOPS message before the patch, with debug messages
enabled:
icmpmsg_put() can happily corrupt kernel memory, using a static
table and forgetting to reset an array index in a loop.
Remove the static array since its not safe without proper locking.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>