When invalid parameters are passed to apparmor_setprocattr a NULL deref
oops occurs when it tries to record an audit message. This is because
it is passing NULL for the profile parameter for aa_audit. But aa_audit
now requires that the profile passed is not NULL.
Fix this by passing the current profile on the task that is trying to
setprocattr.
Signed-off-by: Kees Cook <kees@ubuntu.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In order to make lazyinit eat approx. 10% of io bandwidth at max, we
are sleeping between zeroing each single inode table. For that purpose
we are using timer which wakes up thread when it expires. It is set
via add_timer() and this may cause troubles in the case that thread
has been woken up earlier and in next iteration we call add_timer() on
still running timer hence hitting BUG_ON in add_timer(). We could fix
that by using mod_timer() instead however we can use
schedule_timeout_interruptible() for waiting and hence simplifying
things a lot.
This commit exchange the old "waiting mechanism" with simple
schedule_timeout_interruptible(), setting the time to sleep. Hence we
do not longer need li_wait_daemon waiting queue and others, so get rid
of it.
We need to take reference to the s_li_request after we take a mutex,
because it might be freed since then, hence result in accessing old
already freed memory. Also we should protect the whole
ext4_remove_li_request() because ext4_li_info might be in the process of
being freed in ext4_lazyinit_thread().
Disk event code automatically blocks events on excl write. This is
primarily to avoid issuing polling commands while burning is in
progress. This behavior doesn't fit other types of devices with
removeable media where polling commands don't have adverse side
effects and door locking usually doesn't exist.
This patch introduces new genhd flag which controls the auto-blocking
behavior and uses it to enable auto-blocking only on optical devices.
There's a race window in xen_drop_mm_ref, where remote cpu may exit
dirty bitmap between the check on this cpu and the point where remote
cpu handles drop request. So in drop_other_mm_ref we need check
whether TLB state is still lazy before calling into leave_mm. This
bug is rarely observed in earlier kernel, but exaggerated by the
commit 831d52bc153971b70e64eccfbed2b232394f22f8
("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
which clears bitmap after changing the TLB state. the call trace is as below:
Tested-by: Maoxiaoyun<tinnycloud@hotmail.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com>
[v1: Fleshed out the git description a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we parse the raw E820, the Xen hypervisor can set "E820_RAM"
to "E820_UNUSABLE" if the mem=X argument is used. As such we
should _not_ consider the E820_UNUSABLE as an 1-1 identity
mapping, but instead use the same case as for E820_RAM.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
git commit 24bdb0b62cc82120924762ae6bc85afc8c3f2b26 (xen: do not create
the extra e820 region at an addr lower than 4G) does not take into
account that ifdef CONFIG_X86_32 instead of e820_end_of_low_ram_pfn()
find_low_pfn_range() is called (both calls are from arch/x86/kernel/setup.c).
find_low_pfn_range() behaves correctly and does not require change in
xen_extra_mem_start initialization. Additionally, if xen_extra_mem_start
is initialized in the same way as ifdef CONFIG_X86_64 then memory hotplug
support for Xen balloon driver (under development) is broken.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
.. when applicable. We need to track in the p2m_mfn and
p2m_mfn_p the MFNs and pointers, respectivly, for the P2M entries
that are allocated for the identity mappings. Without this,
a PV domain with an E820 that triggers the 1-1 mapping to kick in,
won't be able to be restored as the P2M won't have the identity
mappings.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
TI816X (common name for DM816x/C6A816x/AM389x family) devices configured
to boot as PCIe Endpoint have class code = 0. This makes kernel PCI bus
code to skip allocating BARs to these devices resulting into following
type of error when trying to enable them:
"Device 0000:01:00.0 not available because of resource collisions"
The device cannot be operated because of the above issue.
This patch adds a ID specific (TI VENDOR ID and 816X DEVICE ID based)
'early' fixup quirk to replace class code with
PCI_CLASS_MULTIMEDIA_VIDEO as class.
Currently, the call to nfs4_schedule_session_recovery() will actually just
result in a test of the lease when what we really want is to force a
session reset.
Currently, if the server returns NFS4ERR_EXPIRED in reply to a READ or
WRITE, but the RENEW test determines that the lease is still active, we
fail to recover and end up looping forever in a READ/WRITE + RENEW death
spiral.
The TCP connection state code depends on the state_change() callback
being called when the SYN_SENT state is set. However the networking layer
doesn't actually call us back in that case.
None of the latest GPUs had this hooked up, this is necessary for
correct operation in a lot of cases, however we should test this on a few
GPUs in these families as we've had problems in this area before.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On g4x, user interrupt in BSD ring is missed.
This is because though g4x and ironlake share the same bsd_ring,
their interrupt control interfaces have _two_ differences.
1.different irq enable/disable functions:
On g4x are i915_enable_irq and i915_disable_irq.
On ironlake are ironlake_enable_irq and ironlake_disable_irq.
2.different irq flag:
On g4x user interrupt flag in BSD ring on is I915_BSD_USER_INTERRUPT.
On ironlake is GT_BSD_USER_INTERRUPT
Old bsd_ring_get/put_irq call ring_get_irq and ring_get_irq.
ring_get_irq and ring_put_irq only call ironlake_enable/disable_irq.
So comes the irq miss on g4x.
To fix this, as other rings' code do, conditionally call different
functions(i915_enable/disable_irq and ironlake_enable/disable_irq)
and use different interrupt flags in bsd_ring_get/put_irq.
When finding or allocating a ram disk device, brd_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example (I set CONFIG_BLK_DEV_RAM_COUNT=4
for simplicity) :
$ sudo modprobe brd max_part=15
$ ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
$ sudo mknod /dev/ram4 b 1 64
$ sudo dd if=/dev/zero of=/dev/ram4 bs=4k count=256
256+0 records in
256+0 records out 1048576 bytes (1.0 MB) copied, 0.00215578 s, 486 MB/s
namhyung@leonhard:linux$ ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
brw-r--r-- 1 root root 1, 64 2011-05-25 15:45 /dev/ram4
brw-rw---- 1 root disk 1, 1024 2011-05-25 15:44 /dev/ram64
After this patch, /dev/ram4 - instead of /dev/ram64 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that RAMDISK_MAJOR can address.
It does not need to be limited by partition numbers unless
'rd_nr' param was specified.
The 'max_part' parameter controls the number of maximum partition
a brd device can have. However if a user specifies very large
value it would exceed the limitation of device minor number and
can cause a kernel panic (or, at least, produce invalid device
nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe brd max_part=100000
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff81110a9a>] sysfs_create_dir+0x2d/0xae
PGD 7af1067 PUD 7b19067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: brd(+)
It's currently exposed only through /proc which, besides requiring
screen-scraping, doesn't allow userspace to distinguish between two
identical ATM adapters with different ATM indexes. The ATM device index
is required when using PPPoATM on a system with multiple ATM adapters.
Signed-off-by: Dan Williams <dcbw@redhat.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
(interruptibly), repeatedly failing to locate the owner of a 0xff entry in
the swap_map.
Although shmem_writepage() does abandon when it sees incoming page index
is beyond eof, there was still a window in which shmem_truncate_range()
could come in between writepage's dropping lock and updating swap_map,
find the half-completed swap_map entry, and in trying to free it,
leave it in a state that swap_shmem_alloc() could not correct.
Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
of the different cases, but easiest to fix by moving swap_shmem_alloc()
under cover of the lock.
More interesting than the bug: it's been there since 2.6.33, why could
I not see it with earlier kernels? The mmotm of two weeks ago seems to
have some magic for generating races, this is just one of three I found.
With yesterday's git I first saw this in mainline, bisected in search of
that magic, but the easy reproducibility evaporated. Oh well, fix the bug.
The v6 and v7 implementations of flush_kern_dcache_area do not align
the passed MVA to the size of a cacheline in the data cache. If a
misaligned address is used, only a subset of the requested area will
be flushed. This has been observed to cause failures in SMP boot where
the secondary_data initialised by the primary CPU is not cacheline
aligned, causing the secondary CPUs to read incorrect values for their
pgd and stack pointers.
This patch ensures that the base address is cacheline aligned before
flushing the d-cache.
Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Integrity errors need to be passed to the owner of the integrity
metadata for processing. Consequently EILSEQ should be passed up the
stack.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds a check that a block device has a request function
defined before it is used. Otherwise, misconfiguration can cause an oops.
Because we are allowing devices with zero size e.g. an offline multipath
device as in commit 2cd54d9bedb79a97f014e86c0da393416b264eb3
("dm: allow offline devices") there needs to be an additional check
to ensure devices are initialised. Some block devices, like a loop
device without a backing file, exist but have no request function.
Reproducer is trivial: dm-mirror on unbound loop device
(no backing file on loop devices)
Thanks to the reviews and comments by Rafael, James, Mark and Andi.
Here's version 2 of the patch incorporating your comments and also some
update to my previous patch comments.
I noticed that before entering idle state, the menu idle governor will
look up the current pm_qos target value according to the list of qos
requests received. This look up currently needs the acquisition of a
lock to access the list of qos requests to find the qos target value,
slowing down the entrance into idle state due to contention by multiple
cpus to access this list. The contention is severe when there are a lot
of cpus waking and going into idle. For example, for a simple workload
that has 32 pair of processes ping ponging messages to each other, where
64 cpu cores are active in test system, I see the following profile with
37.82% of cpu cycles spent in contention of pm_qos_lock:
A better approach will be to cache the updated pm_qos target value so
reading it does not require lock acquisition as in the patch below.
With this patch the contention for pm_qos_lock is removed and I saw a
2.2X increase in throughput for my message passing workload.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: James Bottomley <James.Bottomley@suse.de> Acked-by: mark gross <markgross@thegnar.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cpuidle menu governor is using u32 as a temporary datatype for storing
nanosecond values which wrap around at 4.294 seconds. This causes errors
in predicted sleep times resulting in higher than should be C state
selection and increased power consumption. This also breaks cpuidle
state residency statistics.
Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i8k uses lahf to read the flag register in 64-bit code; early x86-64
CPUs, however, lack this instruction and we get an invalid opcode
exception at runtime.
Use pushf to load the flag register into the stack instead.
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com> Reported-by: Jeff Rickman <jrickman@myamigos.us> Tested-by: Jeff Rickman <jrickman@myamigos.us> Tested-by: Harry G McGavran Jr <w5pny@arrl.net> Cc: Massimo Dal Zotto <dz@debian.org> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since this cred was not created with copy_creds(), it needs to get
initialized. Otherwise use of syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
can lead to a NULL deref. Thanks to Robert for finding this.
But introduced by commit 47a150edc2a ("Cache user_ns in struct cred").
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Reported-by: Robert Święcki <robert@swiecki.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
According to Documentation/Changes, the kernel should be buildable with
GNU make 3.80+. Commit 88d7be031f9f975bb3f50a0b5ef3796a671e7edf (kbuild:
Use a single clean rule for kernel and external modules) introduced the
"$(or" construct, which requires make 3.81. This causes "make clean" to
malfunction when it is used with external modules.
Replace "$(or" with an equivalent "$(if" expression, to restore backward
compatibility.
Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When re-mounting from R/O mode to R/W mode and the LEB count in the superblock
is not up-to date, because for the underlying UBI volume became larger, we
re-write the superblock. We allocate RAM for these purposes, but never free it.
So this is a memory leak, although very rare one.
The buffers allocated while encrypting and decrypting long filenames can
sometimes straddle two pages. In this situation, virt_to_scatterlist()
will return -ENOMEM, causing the operation to fail and the user will get
scary error messages in their logs:
kernel: ecryptfs_write_tag_70_packet: Internal error whilst attempting
to convert filename memory to scatterlist; expected rc = 1; got rc =
[-12]. block_aligned_filename_size = [272]
kernel: ecryptfs_encrypt_filename: Error attempting to generate tag 70
packet; rc = [-12]
kernel: ecryptfs_encrypt_and_encode_filename: Error attempting to
encrypt filename; rc = [-12]
kernel: ecryptfs_lookup: Error attempting to encrypt and encode
filename; rc = [-12]
The solution is to allow up to 2 scatterlist entries to be used.
eCryptfs wasn't clearing the eCryptfs inode's i_nlink after a successful
vfs_rmdir() on the lower directory. This resulted in the inode evict and
destroy paths to be missed.
Reported-by: Mark Davis <marked86@gmail.com> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The disabling of clocks and freeing GPIO are changed
to fix the occurrence of the crash of rmmod of ehci and ohci
drivers. The GPIOs should be freed after the spin locks are
unlocked.
Signed-off-by: Keshava Munegowda <keshava_mgowda@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
arch_ptrace() was modified to reference init_fpu() to fix up xstate
initialization, which overlooked the fact that there are configurations
that don't enable any of hard FPU support or emulation, resulting in
build errors on DSP parts.
Given that init_fpu() simply sets up the xstate slab cache and is
side-stepped entirely for the DSP case, we can simply always build in the
helper and fix up the references.
div6 clock should not use arch_flags for clk_rate_table_build,
because SH_CLK_DIV6_EXT doesn't care .arch_flags.
clk->freq_table[] will be all CPUFREQ_ENTRY_INVALID without this patch.
cx8802_blackbird_probe makes a device node for the mpeg sub-device
before it has been added to dev->drvlist. If the device is opened
during that time, the open succeeds but request_acquire cannot be
called, so the reference count remains zero. Later, when the device
is closed, the reference count becomes negative --- uh oh.
Close the race by holding core->lock during probe and not releasing
until the device is in drvlist and initialization finished.
Previously the BKL prevented this race.
Reported-by: Andreas Huber <hobrom@gmx.at> Tested-by: Andi Huber <hobrom@gmx.at> Tested-by: Marlon de Boer <marlon@hyves.nl> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The BKL conversion of this driver seems to have gone wrong.
Loading the cx88-blackbird driver deadlocks.
The cause: mpeg_ops::open in the cx2388x blackbird driver acquires the
device lock and calls the sub-driver's request_acquire, which tries to
acquire the lock again. Fix it by clarifying the semantics of
request_acquire, request_release, advise_acquire, and advise_release:
now all will rely on the caller to acquire the device lock.
Based on work by Ben Hutchings <ben@decadent.org.uk>.
The BKL conversion of this driver seems to have gone wrong. Various
uses of the sub-device and driver lists appear to be subject to race
conditions.
In particular, some functions access drvlist without a relevant lock
held, which will race against removal of drivers. Let's start with
that --- clean up by consistently protecting dev->drvlist with
dev->core->lock, noting driver functions that require the device lock
to be held or not to be held.
After this patch, there are still some races --- e.g.,
cx8802_blackbird_remove can run between the time the blackbird driver
is acquired and the time it is used in mpeg_release, and there's a
similar race in cx88_dvb_bus_ctrl. Later patches will address the
remaining known races and the deadlock noticed by Andi. This patch
just makes the semantics clearer in preparation for those later
changes.
Based on work by Ben Hutchings <ben@decadent.org.uk>.
This patch (as1467) removes the last usages of hcd->state from
usbcore. We no longer check to see if an interrupt handler finds that
a controller has died; instead we rely on host controller drivers to
make an explicit call to usb_hc_died().
This fixes a regression introduced by commit 9b37596a2e860404503a3f2a6513db60c296bfdc (USB: move usbcore away from
hcd->state). It used to be that when a controller shared an IRQ with
another device and an interrupt arrived while hcd->state was set to
HC_STATE_HALT, the interrupt handler would be skipped. The commit
removed that test; as a result the current code doesn't skip calling
the handler and ends up believing the controller has died, even though
it's only temporarily stopped. The solution is to ignore HC_STATE_HALT
following the handler's return.
As a consequence of this change, several of the host controller
drivers need to be modified. They can no longer implicitly rely on
usbcore realizing that a controller has died because of hcd->state.
The patch adds calls to usb_hc_died() in the appropriate places.
The patch also changes a few of the interrupt handlers. They don't
expect to be called when hcd->state is equal to HC_STATE_HALT, even if
the controller is still alive. Early returns were added to avoid any
confusion.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Manuel Lauss <manuel.lauss@googlemail.com> CC: Rodolfo Giometti <giometti@linux.it> CC: Olav Kongas <ok@artecdesign.ee> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The original problem encountered by people using NVIDIA chipsets was
that USB devices were not turning off when the system shut down. For
example, the LED on an optical mouse would remain on, draining a
laptop's battery. The problem was caused by a bug in the chipset; an
OHCI controller in the Reset state would continue to drive a bus reset
signal even after system shutdown. The workaround was to put the
controllers into the Suspend state instead.
It turns out that later NVIDIA chipsets do not suffer from this bug.
Instead some have the opposite bug: If a system is shut down while an
OHCI controller is in the Suspend state, USB devices remain powered!
On other systems, shutting down with a Suspended controller causes the
system to reboot immediately. Thus, working around the original bug
on some machines exposes other bugs on other machines.
The best solution seems to be to limit the workaround to OHCI
controllers with a low-numbered PCI product ID. I don't know exactly
at what point NVIDIA changed their chipsets; the value used here is a
guess. So far it was worked out okay for all the people who have
tested it.
This fixes Bugzilla #35032.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Andre "Osku" Schmidt <andre.osku.schmidt@googlemail.com> Tested-by: Yury Siamashka <yurand2@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I am sharing patch to the devices/usb/serial/option.c. This allows
operation of Huawei E353 broadband modem using the “option” driver. The
patch simply adds new constant with proper product ID and an entry to
usb_device_id. I worked on the 2.6.38.6 sources. Tested on Dell inspiron
1764 (i3 core cpu) and brand new Huawei E353 modem, Fedora 15 beta.
Looking at the type of change, i doubt it has potential to introduce
problems in other parts of kernel or the driver itself.
Signed-off-by: Marcin Galczynski <marcin@galczynski.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the USB core wants to change to an alternate interface setting that
doesn't include an active endpoint, or de-configuring the device, the xHCI
driver needs to issue a Configure Endpoint command to tell the host to
drop some endpoints from the schedule. After the command completes, the
xHCI driver needs to free rings for any endpoints that were dropped.
Unfortunately, the xHCI driver wasn't actually freeing the endpoint rings
for dropped endpoints. The rings would be freed if the endpoint's
information was simply changed (and a new ring was installed), but dropped
endpoints never had their rings freed. This caused errors when the ring
segment DMA pool was freed when the xHCI driver was unloaded:
[ 5582.883995] xhci_hcd 0000:06:00.0: dma_pool_destroy xHCI ring segments, ffff88003371d000 busy
[ 5582.884002] xhci_hcd 0000:06:00.0: dma_pool_destroy xHCI ring segments, ffff880033716000 busy
[ 5582.884011] xhci_hcd 0000:06:00.0: dma_pool_destroy xHCI ring segments, ffff880033455000 busy
[ 5582.884018] xhci_hcd 0000:06:00.0: Freed segment pool
[ 5582.884026] xhci_hcd 0000:06:00.0: Freed device context pool
[ 5582.884033] xhci_hcd 0000:06:00.0: Freed small stream array pool
[ 5582.884038] xhci_hcd 0000:06:00.0: Freed medium stream array pool
[ 5582.884048] xhci_hcd 0000:06:00.0: xhci_stop completed - status = 1
[ 5582.884061] xhci_hcd 0000:06:00.0: USB bus 3 deregistered
[ 5582.884193] xhci_hcd 0000:06:00.0: PCI INT A disabled
Fix this issue and free endpoint rings when their endpoints are
successfully dropped.
This patch should be backported to kernels as old as 2.6.31.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When an endpoint ring is freed, it is either cached in a per-device ring
cache, or simply freed if the ring cache is full. If the ring was added
to the cache, then virt_dev->num_rings_cached is incremented. The cache
is designed to hold up to 31 endpoint rings, in array indexes 0 to 30.
When the device is freed (when the slot was disabled),
xhci_free_virt_device() is called, it would free the cached rings in
array indexes 0 to virt_dev->num_rings_cached.
Unfortunately, the original code in xhci_free_or_cache_endpoint_ring()
would put the first entry into the ring cache in array index 1, instead of
array index 0. This was caused by the second assignment to rings_cached:
This meant that when the device was freed, cached rings with indexes 0 to
N would be freed, and the last cached ring in index N+1 would not be
freed. When the driver was unloaded, this caused interesting messages
like:
xhci_hcd 0000:06:00.0: dma_pool_destroy xHCI ring segments, ffff880063040000 busy
This should be queued to stable kernels back to 2.6.33.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
introduced a bug. The USB 2.0 spec says that full speed isochronous endpoints'
bInterval must be decoded as an exponent to a power of two (e.g. interval =
2^(bInterval - 1)). Full speed interrupt endpoints, on the other hand, don't
use exponents, and the interval in frames is encoded straight into bInterval.
Dmitry's patch was supposed to fix up the full speed isochronous to parse
bInterval as an exponent, but instead it changed the *interrupt* endpoint
bInterval decoding. The isochronous endpoint encoding was the same.
This caused full speed devices with interrupt endpoints (including mice, hubs,
and USB to ethernet devices) to fail under NEC 0.96 xHCI host controllers:
[ 100.909818] xhci_hcd 0000:06:00.0: add ep 0x83, slot id 1, new drop flags = 0x0, new add flags = 0x99, new slot info = 0x38100000
[ 100.909821] xhci_hcd 0000:06:00.0: xhci_check_bandwidth called for udev ffff88011f0ea000
...
[ 100.910187] xhci_hcd 0000:06:00.0: ERROR: unexpected command completion code 0x11.
[ 100.910190] xhci_hcd 0000:06:00.0: xhci_reset_bandwidth called for udev ffff88011f0ea000
When the interrupt endpoint was added and a Configure Endpoint command was
issued to the host, the host controller would return a very odd error message
(0x11 means "Slot Not Enabled", which isn't true because the slot was enabled).
Probably the host controller was getting very confused with the bad encoding.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: Dmitry Torokhov <dtor@vmware.com> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Tested-by: Thomas Lindroth <thomas.lindroth@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
composite.c always sets req->length to zero
and expects function driver's setup handlers
to return the amount of bytes to be used
on req->length. If we test against req->length
w_length will always be greater than req->length
thus making us always stall that particular
SEND_ENCAPSULATED_COMMAND request.
Tested against a Windows XP SP3.
Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the xHCI driver attempts to cancel a transfer, it issues a Stop
Endpoint command and waits for the host controller to indicate which TRB
it was in the middle of processing. The host will put an event TRB with
completion code COMP_STOP on the event ring if it stops on a control
transfer TRB (or other types of transfer TRBs). The ring handling code
is supposed to set ep->stopped_trb to the TRB that the host stopped on
when this happens.
Unfortunately, there is a long-standing bug in the control transfer
completion code. It doesn't actually check to see if COMP_STOP is set
before attempting to process the transfer based on which part of the
control TD completed. So when we get an event on the data phase of the
control TRB with COMP_STOP set, it thinks it's a normal completion of
the transfer and doesn't set ep->stopped_td or ep->stopped_trb.
When the ring handling code goes on to process the completion of the Stop
Endpoint command, it sees that ep->stopped_trb is not a part of the TD
it's trying to cancel. It thinks the hardware has its enqueue pointer
somewhere further up in the ring, and thinks it's safe to turn the control
TRBs into no-op TRBs. Since the hardware was in the middle of the control
TRBs to be cancelled, the proper software behavior is to issue a Set TR
dequeue pointer command.
It turns out that the NEC host controllers can handle active TRBs being
set to no-op TRBs after a stop endpoint command, but other host
controllers have issues with this out-of-spec software behavior. Fix this
behavior.
This patch should be backported to kernels as far back as 2.6.31, but it
may be a bit challenging, since process_ctrl_td() was introduced in some
refactoring done in 2.6.36, and some endian-safe patches added in 2.6.40
that touch the same lines.
The Droids MuIn LCD operates like a serial remote terminal.
Data received are displayed directly on the LCD. This patch
fixes the kernel null pointer oops when it is plugged in.
Add NO_DATA_INTERFACE quirk to tell the driver that "control"
and "data" interfaces are not separated for this device, which
prevents dereferencing a null pointer in the device probe code.
Signed-off-by: Erik Slagter <erik@slagter.name> Signed-off-by: Maxin B. John <maxin.john@gmail.com> Tested-by: Erik Slagter <erik@slagter.name> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a problem where data received from the gps is sometimes
transferred incompletely to the serial port. If used in native mode now
all data received via the bulk queue will be forwarded to the serial
port.
Signed-off-by: Hermann Kneissel <herkne@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 1c6529e92b "USB: gadget: g_multi: fixed vendor and
product ID" replaced g_multi's vendor and product ID with
proper ID's from Linux Foundation. This commit now updates
INF files in the Documentation/usb directory which were
omitted in the original commit.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For Tegra i2c controller to function properly new slave mode must be
enabled.
swarren notes:
In particular, I found this was needed when working on enabling the
Tegra audio driver on the Seaboard board. There are two different PCB
layouts for this board; a "clamshell" version, which works just fine
without this change, and the original non-clamshell version, which needs
this change in order for I2C to operate correctly. Without it, I2C
probing fails for some devices, e.g. with:
wm8903 0-001a: Device with ID register 0 is not a WM8903
wm8903 0-001a: asoc: failed to probe CODEC wm8903.0-001a: -19
asoc: failed to instantiate card tegra-wm8903: -19
ALSA device list:
No soundcards found.
Signed-off-by: Rakesh Iyer <riyer@nvidia.com> Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:
$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8
After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.
The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe loop max_part0000
------------[ cut here ]------------
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: loop(+)
I believe I found a problem in __alloc_pages_slowpath, which allows a
process to get stuck endlessly looping, even when lots of memory is
available.
Running an I/O and memory intensive stress-test I see a 0-order page
allocation with __GFP_IO and __GFP_WAIT, running on a system with very
little free memory. Right about the same time that the stress-test gets
killed by the OOM-killer, the utility trying to allocate memory gets stuck
in __alloc_pages_slowpath even though most of the systems memory was freed
by the oom-kill of the stress-test.
The utility ends up looping from the rebalance label down through the
wait_iff_congested continiously. Because order=0,
__alloc_pages_direct_compact skips the call to get_page_from_freelist.
Because all of the reclaimable memory on the system has already been
reclaimed, __alloc_pages_direct_reclaim skips the call to
get_page_from_freelist. Since there is no __GFP_FS flag, the block with
__alloc_pages_may_oom is skipped. The loop hits the wait_iff_congested,
then jumps back to rebalance without ever trying to
get_page_from_freelist. This loop repeats infinitely.
The test case is pretty pathological. Running a mix of I/O stress-tests
that do a lot of fork() and consume all of the system memory, I can pretty
reliably hit this on 600 nodes, in about 12 hours. 32GB/node.
Signed-off-by: Andrew Barry <abarry@cray.com> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel<riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The device reponds with 'invalid report id' when feature report switching it
into multitouch mode is sent to it.
This has been silently ignored before 0825411ade ("HID: bt: Wait for ACK
on Sent Reports"), but since this commit, it propagates -EIO from the _raw
callback .
So let the driver ignore -EIO as response to 0xd7,0x01 report, as that's
how the device reacts in normal mode.
Sad, but following reality.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=35022
Commit f0fba2ad (ASoC: multi-component - ASoC Multi-Component Support)
broke support for Raumfeld platforms as it didn't take into account the
different hardware features on individual devices.
In particular, Raumfeld speakers have no S/PDIF output, so the members
of the snd_soc_card struct must be set dynamically.
Signed-off-by: Daniel Mack <zonque@gmail.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ATI and AMD chipsets seem not providing the proper position-buffer
information, and it also doesn't provide FIFO register required by
VIACOMBO fix. It's better to use LPIB for these.
There are no signs of a dmic at node 0x0b, so the user is left with
an additional internal mic which does not exist. This commit removes
that non-existing mic.
BugLink: http://bugs.launchpad.net/bugs/731706 Reported-by: James Page <james.page@canonical.com> Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fastpath can do a speculative access to a page that CONFIG_DEBUG_PAGE_ALLOC may have
marked as invalid to retrieve the pointer to the next free object.
Use probe_kernel_read in that case in order not to cause a page fault.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It has been reported on some laptops that kswapd is consuming large
amounts of CPU and not being scheduled when SLUB is enabled during large
amounts of file copying. It is expected that this is due to kswapd
missing every cond_resched() point because;
shrink_page_list() calls cond_resched() if inactive pages were isolated
which in turn may not happen if all_unreclaimable is set in
shrink_zones(). If for whatver reason, all_unreclaimable is
set on all zones, we can miss calling cond_resched().
balance_pgdat() only calls cond_resched if the zones are not
balanced. For a high-order allocation that is balanced, it
checks order-0 again. During that window, order-0 might have
become unbalanced so it loops again for order-0 and returns
that it was reclaiming for order-0 to kswapd(). It can then
find that a caller has rewoken kswapd for a high-order and
re-enters balance_pgdat() without ever calling cond_resched().
shrink_slab only calls cond_resched() if we are reclaiming slab
pages. If there are a large number of direct reclaimers, the
shrinker_rwsem can be contended and prevent kswapd calling
cond_resched().
This patch modifies the shrink_slab() case. If the semaphore is
contended, the caller will still check cond_resched(). After each
successful call into a shrinker, the check for cond_resched() remains in
case one shrinker is particularly slow.
[mgorman@suse.de: preserve call to cond_resched after each call into shrinker] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Tested-by: Colin King <colin.king@canonical.com> Cc: Raghavendra D Prabhu <raghu.prabhu13@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Chris Mason <chris.mason@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are a few reports of people experiencing hangs when copying large
amounts of data with kswapd using a large amount of CPU which appear to be
due to recent reclaim changes. SLUB using high orders is the trigger but
not the root cause as SLUB has been using high orders for a while. The
root cause was bugs introduced into reclaim which are addressed by the
following two patches.
Patch 1 corrects logic introduced by commit 1741c877 ("mm: kswapd:
keep kswapd awake for high-order allocations until a percentage of
the node is balanced") to allow kswapd to go to sleep when
balanced for high orders.
Patch 2 notes that it is possible for kswapd to miss every
cond_resched() and updates shrink_slab() so it'll at least reach
that scheduling point.
Chris Wood reports that these two patches in isolation are sufficient to
prevent the system hanging. AFAIK, they should also resolve similar hangs
experienced by James Bottomley.
This patch:
Johannes Weiner poined out that the logic in commit 1741c877 ("mm: kswapd:
keep kswapd awake for high-order allocations until a percentage of the
node is balanced") is backwards. Instead of allowing kswapd to go to
sleep when balancing for high order allocations, it keeps it kswapd
running uselessly.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Tested-by: Colin King <colin.king@canonical.com> Cc: Raghavendra D Prabhu <raghu.prabhu13@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Chris Mason <chris.mason@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a bitmap is found to be 'stale' the events_cleared value
is set to match 'events'.
However if the array is degraded this does not get stored on disk.
This can subsequently lead to incorrect behaviour.
So change bitmap_update_sb to always update events_cleared in the
superblock from the known events_cleared.
For neatness also set ->state from ->flags.
This requires updating ->state whenever we update ->flags, which makes
sense anyway.
There is a race when creating an md device by opening /dev/mdXX.
If two processes do this at much the same time they will follow the
call path
__blkdev_get -> get_gendisk -> kobj_lookup
The first will call
-> md_probe -> md_alloc -> add_disk -> blk_register_region
and the race happens when the second gets to kobj_lookup after
add_disk has called blk_register_region but before it returns to
md_alloc.
In the case the second will not call md_probe (as the probe is already
done) but will get a handle on the gendisk, return to __blkdev_get
which will then call md_open (via the ->open) pointer.
As mddev->gendisk hasn't been set yet, md_open will think something is
wrong an return with ERESTARTSYS.
This can loop endlessly while the first thread makes no progress
through add_disk. Nothing is blocking it, but due to scheduler
behaviour it doesn't get a turn.
So this is essentially a live-lock.
We fix this by simply moving the assignment to mddev->gendisk before
the call the add_disk() so md_open doesn't get confused.
Also move blk_queue_flush earlier because add_disk should be as late
as possible.
To make sure that md_open doesn't complete until md_alloc has done all
that is needed, we take mddev->open_mutex during the last part of
md_alloc. md_open will wait for this.
This can cause a lock-up on boot so Cc:ing for stable.
For 2.6.36 and earlier a different patch will be needed as the
'blk_queue_flush' call isn't there.
Signed-off-by: NeilBrown <neilb@suse.de> Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.
A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active. Bisection showed af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.
Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.
A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.
At first I was confused why the refactor in 3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock. I defer that the future.
While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.
Signed-off-by: Milton Miller <miltonm@bga.com> Cc: <linuxppc-dev@lists.ozlabs.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Anton Blanchard <anton@samba.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As Ben Hutchings discovered [1], the patch for CVE-2011-1017 (buffer
overflow in ldm_frag_add) is not sufficient. The original patch in
commit c340b1d64000 ("fs/partitions/ldm.c: fix oops caused by corrupted
partition table") does not consider that, for subsequent fragments,
previously allocated memory is used.
[1] http://lkml.org/lkml/2011/5/6/407
Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Timo Warns <warns@pre-sense.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Caused by brcmsmac.ko suppling a 0 value to Mac80211. Mac80211 subsequently
divides by this number. Bug only occurred on AMPDU traffic. This is a fix
for https://bugzilla.kernel.org/show_bug.cgi?id=32032, titled
'Divide error in minstrel_ht_tx_status followed by hang', reported by
Wouter Cloetens.
Signed-off-by: Roland Vossen <rvossen@broadcom.com> Reviewed-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Driver r8712u is unable to handle ad-hoc mode. The issue is that when
the driver first starts, there will not be an SSID for association.
The fix is to always call the "select and join from scan" routine when
in ad-hoc mode.
Note: Ad-hoc mode worked intermittently before. If the driver had
previously been associated, then things were OK.
Signed-off-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When mandatory encryption is configured in samba server on a
share (smb.conf parameter "smb encrypt = mandatory") the
server will hang up the tcp session when we try to send
the first frame after the tree connect if it is not a
QueryFSUnixInfo, this causes cifs mount to hang (it must
be killed with ctl-c). Move the QueryFSUnixInfo call
earlier in the mount sequence, and check whether the SetFSUnixInfo
fails due to mandatory encryption so we can return a sensible
error (EACCES) on mount.
In a future patch (for 2.6.40) we will support mandatory
encryption.
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
HARDIRQ_ENTER() maps to irq_enter() which calls rcu_irq_enter().
But HARDIRQ_EXIT() maps to __irq_exit() which doesn't call
rcu_irq_exit().
So for every locking selftest that simulates hardirq disabled,
we create an imbalance in the rcu extended quiescent state
internal state.
As a result, after the first missing rcu_irq_exit(), subsequent
irqs won't exit dyntick-idle mode after leaving the interrupt
handler. This means that RCU won't see the affected CPU as being
in an extended quiescent state, resulting in long grace-period
delays (as in grace periods extending for hours).
To fix this, just use __irq_enter() to simulate the hardirq
context. This is sufficient for the locking selftests as we
don't need to exit any extended quiescent state or perform
any check that irqs normally do when they wake up from idle.
As a side effect, this patch makes it possible to restore
"rcu: Decrease memory-barrier usage based on semi-formal proof",
which eventually helped finding this bug.
Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
IBS initialization is a mix of per-core register access and per-node
pci device setup. Register access should be pinned to the cpu, but pci
setup must run with preemption enabled.
This patch better separates the code into non-/preemptible sections
and fixes sleeping with preemption disabled. See bug message below.
Fixes also freeing the eilvt entry by introducing put_eilvt().
The Intel manual changed the name of the CPUID bit to match the
instruction name. We should follow suit for sanity's sake. (See Intel SDM
Volume 2, Table 3-20 "Feature Information Returned in the ECX Register".)
[ hpa: we can only do this at this time because there are currently no CPUs
with this feature on the market, hence this is pre-hardware enabling.
However, Cc:'ing stable so that stable can present a consistent ABI. ]
Signed-off-by: Kees Cook <kees.cook@canonical.com> Link: http://lkml.kernel.org/r/20110524232926.GA27728@outflux.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
UEFI stands for "Unified Extensible Firmware Interface", where "Firmware"
is an ancient African word meaning "Why do something right when you can
do it so wrong that children will weep and brave adults will cower before
you", and "UEI" is Celtic for "We missed DOS so we burned it into your
ROMs". The UEFI specification provides for runtime services (ie, another
way for the operating system to be forced to depend on the firmware) and
we rely on these for certain trivial tasks such as setting up the
bootloader. But some hardware fails to work if we attempt to use these
runtime services from physical mode, and so we have to switch into virtual
mode. So far so dreadful.
The specification makes it clear that the operating system is free to do
whatever it wants with boot services code after ExitBootServices() has been
called. SetVirtualAddressMap() can't be called until ExitBootServices() has
been. So, obviously, a whole bunch of EFI implementations call into boot
services code when we do that. Since we've been charmingly naive and
trusted that the specification may be somehow relevant to the real world,
we've already stuffed a picture of a penguin or something in that address
space. And just to make things more entertaining, we've also marked it
non-executable.
This patch allocates the boot services regions during EFI init and makes
sure that they're executable. Then, after SetVirtualAddressMap(), it
discards them and everyone lives happily ever after. Except for the ones
who have to work on EFI, who live sad lives haunted by the knowledge that
someone's eventually going to write yet another firmware specification.
[ hpa: adding this to urgent with a stable tag since it fixes currently-broken
hardware. However, I do not know what the dependencies are and so I do
not know which -stable versions this may be a candidate for. ]
introduced a read and a write to the MC4 mask msr.
Unfortunatly this MSR is not emulated by the KVM hypervisor
so that the kernel will get a #GP and crashes when applying
this workaround when running inside KVM.
This issue was reported as:
https://bugzilla.kernel.org/show_bug.cgi?id=35132
and is fixed with this patch. The change just let the kernel
ignore any #GP it gets while accessing this MSR by using the
_safe msr access methods.
Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 added support for
ARAT (Always Running APIC timer) on AMD processors that are not
affected by erratum 400. This erratum is present on certain processor
families and prevents APIC timer from waking up the CPU when it
is in a deep C state, including C1E state.
Determining whether a processor is affected by this erratum may
have some corner cases and handling these cases is somewhat
complicated. In the interest of simplicity we won't claim ARAT
support on processor families below 0x12 and will go back to
broadcasting timer when going idle.
Signed-off-by: Boris Ostrovsky <ostr@amd64.org> Link: http://lkml.kernel.org/r/1306423192-19774-1-git-send-email-ostr@amd64.org Tested-by: Boris Petkov <borislav.petkov@amd.com> Cc: Hans Rosenfeld <Hans.Rosenfeld@amd.com> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix a potential deadlock when resuming; here the calling
function has disabled interrupts, so we cannot sleep.
Change the memory allocation flag from GFP_KERNEL to GFP_ATOMIC.
TODO: We can do away with this memory allocation during resume
by reusing the ioapic suspend/resume code that uses boot time
allocated buffers, but we want to keep this -stable patch
simple.
This patch fixes a bug where task->task_execute_queue=1 was not being
cleared once se_task had been removed from se_device->execute_task_list,
resulting in an OOPs in core_tmr_lun_reset() for the task->task_active=0
case where transport_remove_task_from_execute_queue() was incorrectly
being called.
This patch fixes two cases in transport_get_task_from_execute_queue()
and transport_remove_task_from_execute_queue() to properly clear
task->task_execute_queue=0 once list_del(&task->t_execute_list) has
been called.
It also adds an explict check in transport_remove_task_from_execute_queue()
to dump_stack + return if called with task->task_execute_queue=0.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch addresses a bug in the target core release path for HW
operation where transport_free_dev_tasks() was incorrectly being called
from transport_lun_remove_cmd() while releasing a se_cmd reference and
calling struct target_core_fabric_ops->queue_data_in().
This would result in a OOPs with HW target mode when the release of
se_task->task_sg[] would happen before pci_unmap_sg() can be called in
HW target mode fabric module code. This patch addresses the issue by
moving transport_free_dev_tasks() from transport_lun_remove_cmd() into
transport_generic_free_cmd(), and adding TRANSPORT_FREE_CMD_INTR and
transport_generic_free_cmd_intr() to allow se_cmd descriptor release
to happen fromfrom within transport_processing_thread() process context
when release of se_cmd is not possible from HW interrupt context.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes two bugs wrt to the interrupt context usage of target
core with HW target mode drivers. It first converts the usage of struct
se_device->stats_lock in transport_get_lun_for_cmd() and core_tmr_lun_reset()
to properly use spin_lock_irq() to address an BUG with CONFIG_LOCKDEP_SUPPORT=y
enabled.
This patch also adds a 'in_interrupt()' check to allow GFP_ATOMIC usage from
core_tmr_alloc_req() to fix a 'sleeping in interrupt context' BUG with HW
target fabrics that require this logic to function.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a bug in transport_do_task_sg_chain() used by HW target
mode modules with sg_chain() to provide a single sg_next() walkable memory
layout for use with pci_map_sg() and friends. This patch addresses an
issue with mapping multiple small block max_sector tasks across multiple
struct se_task->task_sg[] mappings for HW target mode operation.
This was causing OOPs with (cmd->t_task->t_tasks_no > 1) I/O traffic for
HW target drivers using transport_do_task_sg_chain(), and has been tested
so far with tcm_fc(openfcoe), tcm_qla2xxx, and ib_srpt fabrics with
t_tasks_no > 1 IBLOCK backends using a smaller max_sectors to trigger the
original issue.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Acked-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
with comment "The following patch fixes it by using the '+' operator on
the (*field) operand, marking it as read-write to gcc."
'+' was actually forgotten. This really puts it.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Modified the 10s wait time for inflight offload connections to
advance to the next state to 2s based on test result.
Modified the 20s shutdown timeout to 30s based on test result.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The number of chip's internal command cell, which is use to generate
SCSI cmd packets to the target, was not initialized correctly by
the driver when the sq_size is changed from the default 128.
This, in turn, will create a problem where the chip's transmit pipe
will erroneously reuse an old command cell that is no longer valid.
The fix is to correctly initialize the chip's command cell upon setup.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <jbottomley@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Driver was a sending a SEP request during interrupt context which
required to go to sleep.
The fix is to rearrange the code so a fake event
MPT2SAS_TURN_ON_FAULT_LED is fired from interrupt context, then later
during the kernel worker threads processing, the SEP request is issued
to firmware.