Added Vendor/Device Id of Motorola Rokr E6 (22b8:6027) so it can be
recognized by the "zaurus" USBNet driver.
Applies to Linux 3.2.13 and 2.6.39.4. Signed-off-by: Guan Xin <guanx.bac@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
TWL6030 family of PMIC use a shadow interrupt status register
while kernel processes the current interrupt event.
However, any write(0 or 1) to register INT_STS_A, INT_STS_B or
INT_STS_C clears all 3 interrupt status registers.
Since clear of the interrupt is done on 32k clk, depending on I2C
bus speed, we could in-adverently clear the status of a interrupt
status pending on shadow register in the current implementation.
This is due to the fact that multi-byte i2c write operation into
three seperate status register could result in multiple load
and clear of status and result in lost interrupts.
Instead, doing a single byte write to INT_STS_A register with 0x0
will clear all three interrupt status registers without the related
risk.
When a machine boots up, the TSC generally gets reset. However,
when kexec is used to boot into a kernel, the TSC value would be
carried over from the previous kernel. The computation of
cycns_offset in set_cyc2ns_scale is prone to an overflow, if the
machine has been up more than 208 days prior to the kexec. The
overflow happens when we multiply *scale, even though there is
enough room to store the final answer.
We fix this issue by decomposing tsc_now into the quotient and
remainder of division by CYC2NS_SCALE_FACTOR and then performing
the multiplication separately on the two components.
Refactor code to share the calculation with the previous
fix in __cycles_2_ns().
Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WARNING: at drivers/pci/pci.c:118 pci_ioremap_bar+0x24/0x52()
when probing his sound card, and sound did not work. After adding
pci=use_crs to the kernel command line, no more trouble.
Ok, we can add a quirk. dmidecode output reveals that this is an MSI
MS-7253, for which we already have a quirk, but the short-sighted
author tied the quirk to a single BIOS version, making it not kick in
on Carlos's machine with BIOS V1.2. If a later BIOS update makes it
no longer necessary to look at the _CRS info it will still be
harmless, so let's stop trying to guess which versions have and don't
have accurate _CRS tables.
Addresses https://bugtrack.alsa-project.org/alsa-bug/view.php?id=5533
Also see <https://bugzilla.kernel.org/show_bug.cgi?id=42619>.
Reported-by: Carlos Luna <caralu74@gmail.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the spirit of commit 29cf7a30f8a0 ("x86/PCI: use host bridge _CRS
info on ASUS M2V-MX SE"), this DMI quirk turns on "pci_use_crs" by
default on a board that needs it.
This fixes boot failures and oopses introduced in 3e3da00c01d0
("x86/pci: AMD one chain system to use pci read out res"). The quirk
is quite targetted (to a specific board and BIOS version) for two
reasons:
(1) to emphasize that this method of tackling the problem one quirk
at a time is a little insane
(2) to give BIOS vendors an opportunity to use simpler tables and
allow us to return to generic behavior (whatever that happens to
be) with a later BIOS update
In other words, I am not at all happy with having quirks like this.
But it is even worse for the kernel not to work out of the box on
these machines, so...
Commit f02e8a6596b7 ("module: Sort exported symbols") sorts symbols
placing each of them in its own elf section. This sorting and merging
into the canonical sections are done by the linker.
Unfortunately modpost to generate Module.symvers file parses vmlinux.o
(which is not linked yet) and all modules object files (which aren't
linked yet). These aren't sanitized by the linker yet. That breaks
modpost that can't detect license properly for modules.
This patch makes modpost aware of the new exported symbols structure.
[ This above is a slightly corrected version of the explanation of the
problem, copied from commit 62a2635610db ("modpost: Fix modpost's
license checking V3"). That commit fixed the problem for module
object files, but not for vmlinux.o. This patch fixes modpost for
vmlinux.o. ]
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Alessio Igor Bogani <abogani@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit f02e8a6 sorts symbols placing each of them in its own elf section.
The sorting and merging into the canonical sections are done by the linker.
Unfortunately modpost to generate Module.symvers file parses vmlinux
(already linked) and all modules object files (which aren't linked yet).
These aren't sanitized by the linker yet. That breaks modpost that can't
detect license properly for modules. This patch makes modpost aware of
the new exported symbols structure.
Thanks to Arnaud Lacombe <lacombar@gmail.com> and Anders Kaseorg
<andersk@ksplice.com> for providing useful suggestions about code.
This work was supported by a hardware donation from the CE Linux Forum.
Reported-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Alessio Igor Bogani <abogani@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit bfdc0b4 adds code to restrict access to dmesg_restrict,
however, it incorrectly alters kptr_restrict rather than
dmesg_restrict.
The original patch from Richard Weinberger
(https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as
expected, and so the patch seems to have been misapplied.
This adds the CAP_SYS_ADMIN check to both dmesg_restrict and
kptr_restrict, since both are sensitive.
There has long been a limitation using software breakpoints with a
kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For
this particular patch, it will apply cleanly and has been tested all
the way back to 2.6.36.
The kprobes code uses the text_poke() function which accommodates
writing a breakpoint into a read-only page. The x86 kgdb code can
solve the problem similarly by overriding the default breakpoint
set/remove routines and using text_poke() directly.
The x86 kgdb code will first attempt to use the traditional
probe_kernel_write(), and next try using a the text_poke() function.
The break point install method is tracked such that the correct break
point removal routine will get called later on.
Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Inspried-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The do_fork and sys_open tests have never worked properly on anything
other than a UP configuration with the kgdb test suite. This is
because the test suite did not fully implement the behavior of a real
debugger. A real debugger tracks the state of what thread it asked to
single step and can correctly continue other threads of execution or
conditionally stop while waiting for the original thread single step
request to return.
Below is a simple method to cause a fatal kernel oops with the kgdb
test suite on a 2 processor ARM system:
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
echo V1I1F100 > /sys/module/kgdbts/parameters/kgdbts
Very soon after starting the test the kernel will start warning with
messages like:
kgdbts: BP mismatch c002487c expected c0024878
------------[ cut here ]------------
WARNING: at drivers/misc/kgdbts.c:317 check_and_rewind_pc+0x9c/0xc4()
[<c01f6520>] (check_and_rewind_pc+0x9c/0xc4)
[<c01f595c>] (validate_simple_test+0x3c/0xc4)
[<c01f60d4>] (run_simple_test+0x1e8/0x274)
The kernel will eventually recovers, but the test suite has completely
failed to test anything useful.
This patch implements behavior similar to a real debugger that does
not rely on hardware single stepping by using only software planted
breakpoints.
In order to mimic a real debugger, the kgdb test suite now tracks the
most recent thread that was continued (cont_thread_id), with the
intent to single step just this thread. When the response to the
single step request stops in a different thread that hit the original
break point that thread will now get continued, while the debugger
waits for the thread with the single step pending. Here is a high
level description of the sequence of events.
cont_instead_of_sstep = 0;
1) set breakpoint at do_fork
2) continue
3) Save the thread id where we stop to cont_thread_id
4) Remove breakpoint at do_fork
5) Reset the PC if needed depending on kernel exception type
6) soft single step
7) Check where we stopped
if current thread != cont_thread_id {
if (here for more than 2 times for the same thead) {
### must be a really busy system, start test again ###
goto step 1
}
goto step 5
} else {
cont_instead_of_sstep = 0;
}
8) clean up and run test again if needed
9) Clear out any threads that were waiting on a break point at the
point in time the test is ended with get_cont_catch(). This
happens sometimes because breakpoints are used in place of single
stepping and some threads could have been in the debugger exception
handling queue because breakpoints were hit concurrently on
different CPUs. This also means we wait at least one second before
unplumbing the debugger connection at the very end, so as respond
to any debug threads waiting to be serviced.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The do_fork and sys_open tests have never worked properly on anything
other than a UP configuration with the kgdb test suite. This is
because the test suite did not fully implement the behavior of a real
debugger. A real debugger tracks the state of what thread it asked to
single step and can correctly continue other threads of execution or
conditionally stop while waiting for the original thread single step
request to return.
Below is a simple method to cause a fatal kernel oops with the kgdb
test suite on a 4 processor x86 system:
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
echo V1I1F1000 > /sys/module/kgdbts/parameters/kgdbts
Very soon after starting the test the kernel will oops with a message like:
The warn will turn to a hard kernel crash shortly after that because
the pc will not get properly rewound to the right value after hitting
a breakpoint leading to a hard lockup.
This change is broken up into 2 pieces because archs that have hw
single stepping (2.6.26 and up) need different changes than archs that
do not have hw single stepping (3.0 and up). This change implements
the correct behavior for an arch that supports hw single stepping.
A minor defect was fixed where sys_open should be do_sys_open
for the sys_open break point test. This solves the problem of running
a 64 bit with a 32 bit user space. The sys_open() never gets called
when using the 32 bit file system for the kgdb testsuite because the
32 bit binaries invoke the compat_sys_open() call leading to the test
never completing.
In order to mimic a real debugger, the kgdb test suite now tracks the
most recent thread that was continued (cont_thread_id), with the
intent to single step just this thread. When the response to the
single step request stops in a different thread that hit the original
break point that thread will now get continued, while the debugger
waits for the thread with the single step pending. Here is a high
level description of the sequence of events.
cont_instead_of_sstep = 0;
1) set breakpoint at do_fork
2) continue
3) Save the thread id where we stop to cont_thread_id
4) Remove breakpoint at do_fork
5) Reset the PC if needed depending on kernel exception type
6) if (cont_instead_of_sstep) { continue } else { single step }
7) Check where we stopped
if current thread != cont_thread_id {
cont_instead_of_sstep = 1;
goto step 5
} else {
cont_instead_of_sstep = 0;
}
8) clean up and run test again if needed
Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On x86 the kgdb test suite will oops when the kernel is compiled with
CONFIG_DEBUG_RODATA and you run the tests after boot time. This is
regression has existed since 2.6.26 by commit: b33cb815 (kgdbts: Use
HW breakpoints with CONFIG_DEBUG_RODATA).
The test suite can use hw breakpoints for all the tests, but it has to
execute the hardware breakpoint specific tests first in order to
determine that the hw breakpoints actually work. Specifically the
very first test causes an oops:
# echo V1I1 > /sys/module/kgdbts/parameters/kgdbts
kgdb: Registered I/O driver kgdbts.
kgdbts:RUN plant and detach test
Entering kdb (current=0xffff880017aa9320, pid 1078) on processor 0 due to Keyboard Entry
[0]kdb> kgdbts: ERROR PUT: end of test buffer on 'plant_and_detach_test' line 1 expected OK got $E14#aa
WARNING: at drivers/misc/kgdbts.c:730 run_simple_test+0x151/0x2c0()
[...oops clipped...]
This commit re-orders the running of the tests and puts the RODATA
check into its own function so as to correctly avoid the kernel oops
by detecting and using the hw breakpoints.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is extra state information that needs to be exposed in the
kgdb_bpt structure for tracking how a breakpoint was installed. The
debug_core only uses the the probe_kernel_write() to install
breakpoints, but this is not enough for all the archs. Some arch such
as x86 need to use text_poke() in order to install a breakpoint into a
read only page.
Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and
kgdb_arch_remove_breakpoint() allows other archs to set the type
variable which indicates how the breakpoint was installed.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
i915_drm_thaw was not locking the mode_config lock when calling
drm_helper_resume_force_mode. When there were multiple wake sources,
this caused FDI training failure on SNB which in turn corrupted the
display.
Signed-off-by: Sean Paul <seanpaul@chromium.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Used to delay the frame start signal that is sent to the display planes.
Care must be taken to insure that there are enough lines during VBLANK
to support this setting.
An instance of the BIOS leaving these bits set was found in the wild,
where it caused our modesetting to go all squiffy and skewiff.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47271 Reported-and-tested-by: Eva Wang <evawang@linpus.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43012 Reported-and-tested-by: Carl Richell <carl@system76.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mplayer -vo fbdev tries to create a screen that is twice as tall as the
allocated framebuffer for "doublebuffering". By default, and all in-tree
users, only sufficient memory is allocated and mapped to satisfy the
smallest framebuffer and the virtual size is no larger than the actual.
For these users, we should therefore reject any userspace request to
create a screen that requires a buffer larger than the framebuffer
originally allocated.
References: https://bugs.freedesktop.org/show_bug.cgi?id=38138 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Is possible that we will arm the tid_rx->reorder_timer after
del_timer_sync() in ___ieee80211_stop_rx_ba_session(). We need to stop
timer after RCU grace period finish, so move it to
ieee80211_free_tid_rx(). Timer will not be armed again, as
rcu_dereference(sta->ampdu_mlme.tid_rx[tid]) will return NULL.
Debug object detected problem with the following warning:
ODEBUG: free active (active state 0) object type: timer_list hint: sta_rx_agg_reorder_timer_expired+0x0/0xf0 [mac80211]
Bug report (with all warning messages):
https://bugzilla.redhat.com/show_bug.cgi?id=804007
Reported-by: "jan p. springer" <jsd@igroup.org> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
irq_move_masked_irq() checks the return code of
chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is
also a valid return code, which is there to avoid a redundant copy of
the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid
the redundant copy, we also fail to adjust the thread affinity of an
eventually threaded interrupt handler.
Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return
values correctly by checking the valid return values seperately.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Keping Chen <chenkeping@huawei.com> Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was lacking a comma between two supposed to be separate strings.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64b3db22c04586997ab4be46dd5a5b99f8a2d390 (2.6.39),
"Remove use of unreliable FADT revision field" causes regression
for old P4 systems because now cst_control and other fields are
not reset to 0.
The effect is that acpi_processor_power_init will notice
cst_control != 0 and a write to CST_CNT register is performed
that should not happen. As result, the system oopses after the
"No _CST, giving up" message, sometimes in acpi_ns_internalize_name,
sometimes in acpi_ns_get_type, usually at random places. May be
during migration to CPU 1 in acpi_processor_get_throttling.
Every one of these settings help to avoid this problem:
- acpi=off
- processor.nocst=1
- maxcpus=1
The fix is to update acpi_gbl_FADT.header.length after
the original value is used to check for old revisions.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During testing pci root bus removal, found some root bus bridge is not freed.
If booting with pnpacpi=off, those hostbridge could be freed without problem.
It turns out that some devices reference are not released during acpi_pnp_match.
that match should not hold one device ref during every calling.
Add pu_device calling before returning.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Intel CPUs the processor typically uses the highest frequency
set by any logical CPU. When the system overheats
Linux first forces the frequency to the lowest available one
to lower the temperature.
However this was done only per logical CPU, which means all
logical CPUs in a package would need to go through this before
the frequency is actually lowered.
Worse this delay actually prevents real throttling, because
the real throttle code only proceeds when the lowest frequency
is already reached.
So when a throttle event happens force the lowest frequency
for all CPUs in the package where it happened. The per CPU
state is now kept per package, not per logical CPU. An alternative
would be to do it per cpufreq unit, but since we want to bring
down the temperature of the complete chip it's better
to do it for all.
In principle it may even make sense to do it for all CPUs,
but I kept it on the package for now.
With this change the frequency is actually lowered, which
in terms also allows real throttling to proceed.
I also removed an unnecessary per cpu variable initialization.
v2: Fix package mapping
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The writebufsize concept was introduce by commit
"0e4ca7e mtd: add writebufsize field to mtd_info struct" and it represents
the maximum amount of data the device writes to the media at a time. This is
an important parameter for UBIFS which is used during recovery and which
basically defines how big a corruption caused by a power cut can be.
Set writebufsize to 4 because this drivers writes at max 4 bytes at a time.
The writebufsize concept was introduce by commit
"0e4ca7e mtd: add writebufsize field to mtd_info struct" and it represents
the maximum amount of data the device writes to the media at a time. This is
an important parameter for UBIFS which is used during recovery and which
basically defines how big a corruption caused by a power cut can be.
However, we forgot to set this parameter for block2mtd. Set it to PAGE_SIZE
because this is actually the amount of data we write at a time.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Joern Engel <joern@lazybastard.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The writebufsize concept was introduce by commit
"0e4ca7e mtd: add writebufsize field to mtd_info struct" and it represents
the maximum amount of data the device writes to the media at a time. This is
an important parameter for UBIFS which is used during recovery and which
basically defines how big a corruption caused by a power cut can be.
Set writebufsize to the flash page size because it is the maximum amount of
data it writes at a time.
Make CDC EEM recalculate the hard_mtu after adjusting the
hard_header_len.
Without this, usbnet adjusts the MTU down to 1494 bytes, and the host is
unable to receive standard 1500-byte frames from the device.
Tested with the Linux USB Ethernet gadget.
Cc: Oliver Neukum <oliver@neukum.name> Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If both addresses equal, nothing needs to be done. If the device is down,
then we simply copy the new address to dev->dev_addr. If the device is up,
then we add another loopback device with the new address, and if that does
not fail, we remove the loopback device with the old address. And only
then, we update the dev->dev_addr.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When K >= 0xFFFF0000, AND needs the two least significant bytes of K as
its operand, but EMIT2() gives it the least significant byte of K and
0x2. EMIT() should be used here to replace EMIT2().
Signed-off-by: Feiran Zhuang <zhuangfeiran@ict.ac.cn> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
Some other systems using the pata_jmicron driver fail to boot because no
disks are detected. Passing pcie_aspm=force on the kernel command line
works around it.
The cause: commit 4949be16822e ("PCI: ignore pre-1.1 ASPM quirking when
ASPM is disabled") changed the behaviour of pcie_aspm_sanity_check() to
always return 0 if aspm is disabled, in order to avoid cases where we
changed ASPM state on pre-PCIe 1.1 devices.
This skipped the secondary function of pcie_aspm_sanity_check which was
to avoid us enabling ASPM on devices that had non-PCIe children, causing
trouble later on. Move the aspm_disabled check so we continue to honour
that scenario.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=42979 and
http://bugs.debian.org/665420
Reported-by: Romain Francoise <romain@orebokech.com> # kernel panic Reported-by: Chris Holland <bandidoirlandes@gmail.com> # disk detection trouble Signed-off-by: Matthew Garrett <mjg@redhat.com> Tested-by: Hatem Masmoudi <hatem.masmoudi@gmail.com> # Dell Latitude E5520 Tested-by: janek <jan0x6c@gmail.com> # pata_jmicron with JMB362/JMB363
[jn: with more symptoms in log message] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When DMA is enabled, sh-sci transfer begins with
uart_start()
sci_start_tx()
if (cookie_tx < 0) schedule_work()
Then, starts DMA when wq scheduled, -- (A)
process_one_work()
work_fn_rx()
cookie_tx = desc->submit_tx()
And finishes when DMA transfer ends, -- (B)
sci_dma_tx_complete()
async_tx_ack()
cookie_tx = -EINVAL
(possible another schedule_work())
This A to B sequence is not reentrant, since controlling variables
(for example, cookie_tx above) are not queues nor lists. So, they
must be invoked as A B A B..., otherwise results in kernel crash.
To ensure the sequence, sci_start_tx() seems to test if cookie_tx < 0
(represents "not used") to call schedule_work().
But cookie_tx will not be set (to a cookie, also means "used") until
in the middle of work queue scheduled function work_fn_tx().
This gap between the test and set allows the breakage of the sequence
under the very frequently call of uart_start().
Another gap between async_tx_ack() and another schedule_work() results
in the same issue, too.
This patch introduces a new condition "cookie_tx == 0" just to mark
it is "busy" and assign it within spin-locked region to fill the gaps.
There is no point in passing a zero length string here and quite a
few of that cache_parse() implementations will Oops if count is
zero.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
<asm-generic/unistd.h> was set up to use sys_sendfile() for the 32-bit
compat API instead of sys_sendfile64(), but in fact the right thing to
do is to use sys_sendfile64() in all cases. The 32-bit sendfile64() API
in glibc uses the sendfile64 syscall, so it has to be capable of doing
full 64-bit operations. But the sys_sendfile() kernel implementation
has a MAX_NON_LFS test in it which explicitly limits the offset to 2^32.
So, we need to use the sys_sendfile64() implementation in the kernel
for this case.
While running the latest Linux as guest under VMware in highly
over-committed situations, we have seen cases when the refined TSC
algorithm fails to get a valid tsc_start value in
tsc_refine_calibration_work from multiple attempts. As a result the
kernel keeps on scheduling the tsc_irqwork task for later. Subsequently
after several attempts when it gets a valid start value it goes through
the refined calibration and either bails out or uses the new results.
Given that the kernel originally read the TSC frequency from the
platform, which is the best it can get, I don't think there is much
value in refining it.
So for systems which get the TSC frequency from the platform we
should skip the refined tsc algorithm.
We can use the TSC_RELIABLE cpu cap flag to detect this, right now it is
set only on VMware and for Moorestown Penwell both of which have there
own TSC calibration methods.
Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Dirk Brandewie <dirk.brandewie@gmail.com> Cc: Alan Cox <alan@linux.intel.com>
[jstultz: Reworked to simply not schedule the refining work,
rather then scheduling the work and bombing out later] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We call the wrong replay notify function when we use ESN replay
handling. This leads to the fact that we don't send notifications
if we use ESN. Fix this by calling the registered callbacks instead
of xfrm_replay_notify().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some BIOS's don't setup power management correctly (what else is
new) and don't allow use of PCI Express power control. Add a special
exception module parameter to allow working around this issue.
Based on slightly different patch by Knut Petersen.
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
no socket layer outputs a message for this error and neither should rds.
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
napi->skb is allocated in napi_get_frags() using
netdev_alloc_skb_ip_align(), with a reserve of NET_SKB_PAD +
NET_IP_ALIGN bytes.
However, when such skb is recycled in napi_reuse_skb(), it ends with a
reserve of NET_IP_ALIGN which is suboptimal.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit f2c31e32b378 (net: fix NULL dereferences in check_peer_redir() )
added a regression in rt6_fill_node(), leading to rcu_read_lock()
imbalance.
Thats because NLA_PUT() can make a jump to nla_put_failure label.
Fix this by using nla_put()
Many thanks to Ben Greear for his help
Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thanks to Indan Zupancic to bring back this issue.
Reported-by: Matt Evans <matt@ozlabs.org> Reported-by: Indan Zupancic <indan@nul.nu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While testing L2TP functionality, I came across a bug in getsockname(). The
IP address returned within the pppol2tp_addr's addr memember was not being
set to the IP address in use. This bug is caused by using inet_sk() on the
wrong socket (the L2TP socket rather than the underlying UDP socket), and was
likely introduced during the addition of L2TPv3 support.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Looking at hibernate overwriting I though it looked like a cursor,
so I tracked down this missing piece to stop the cursor blink
timer. I've no idea if this is sufficient to fix the hibernate
problems people are seeing, but please test it.
Both radeon and nouveau have done this for a long time.
I've run this personally all night hib/resume cycles with no fails.
Reviewed-by: Keith Packard <keithp@keithp.com> Reported-by: Petr Tesarik <kernel@tesarici.cz> Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Reported-by: Lots of misc segfaults after hibernate across the world.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=37142 Tested-by: Dave Airlie <airlied@redhat.com> Tested-by: Bojan Smojver <bojan@rexursive.com> Tested-by: Andreas Hartmann <andihartmann@01019freenet.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For high-speed/super-speed isochronous endpoints, the bInterval
value is used as exponent, 2^(bInterval-1). Luckily we have
usb_fill_int_urb() function that handles it correctly. So we just
call this function to fill in the RX URB.
sysfs_slab_add() calls various sysfs functions that actually may
end up in userspace doing all sorts of things.
Release the slub_lock after adding the kmem_cache structure to the list.
At that point the address of the kmem_cache is not known so we are
guaranteed exlusive access to the following modifications to the
kmem_cache structure.
If the sysfs_slab_add fails then reacquire the slub_lock to
remove the kmem_cache structure from the list.
Reported-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an IO error happens during inode deletion run from
xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent
attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not
count with the fact that read of a buffer which was read a while ago can
really fail which results in the oops on
agi = XFS_BUF_TO_AGI(agibp);
Fix the problem by cleaning up the buffer handling in
xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer
lock but keep buffer reference to AG buffer. That is enough for buffer to stay
pinned in memory and we don't have to call xfs_read_agi() all the time.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Always set io->error to -EIO when an error is detected in dm-crypt.
There were cases where an error code would be set only if we finish
processing the last sector. If there were other encryption operations in
flight, the error would be ignored and bio would be returned with
success as if no error happened.
This bug is present in kcryptd_crypt_write_convert, kcryptd_crypt_read_convert
and kcryptd_async_done.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a possible deadlock in dm-crypt's mempool use.
Currently, dm-crypt reserves a mempool of MIN_BIO_PAGES reserved pages.
It allocates first MIN_BIO_PAGES with non-failing allocation (the allocation
cannot fail and waits until the mempool is refilled). Further pages are
allocated with different gfp flags that allow failing.
Because allocations may be done in parallel, this code can deadlock. Example:
There are two processes, each tries to allocate MIN_BIO_PAGES and the processes
run simultaneously.
It may end up in a situation where each process allocates (MIN_BIO_PAGES / 2)
pages. The mempool is exhausted. Each process waits for more pages to be freed
to the mempool, which never happens.
To avoid this deadlock scenario, this patch changes the code so that only
the first page is allocated with non-failing gfp mask. Allocation of further
pages may fail.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
udf_release_file() can be called from munmap() path with mmap_sem held. Thus
we cannot take i_mutex there because that ranks above mmap_sem. Luckily,
i_mutex is not needed in udf_release_file() anymore since protection by
i_data_sem is enough to protect from races with write and truncate.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In d_materialise_unique() there are 3 subcases to the 'aliased dentry'
case; in two subcases the inode i_lock is properly released but this
does not occur in the -ELOOP subcase.
This seems to have been introduced by commit 1836750115f2 ("fix loop
checks in d_materialise_unique()").
Signed-off-by: Michel Lespinasse <walken@google.com>
[ Added a comment, and moved the unlock to where we generate the -ELOOP,
which seems to be more natural.
You probably can't actually trigger this without a buggy network file
server - d_materialize_unique() is for finding aliases on non-local
filesystems, and the d_ancestor() case is for a hardlinked directory
loop.
But we should be robust in the case of such buggy servers anyway. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Explicitly test for an extent whose length is zero, and flag that as a
corrupted extent.
This avoids a kernel BUG_ON assertion failure.
Tested: Without this patch, the file system image found in
tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a
kernel panic. With this patch, an ext4 file system error is noted
instead, and the file system is marked as being corrupted.
Ext4 does not support data journalling with delayed allocation enabled.
We even do not allow to mount the file system with delayed allocation
and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
system mounted with delayed allocation (default) and that's where
problem arises. The easies way to reproduce this problem is with the
following set of commands:
Additionally it can be reproduced quite reliably with xfstests 272 and
269. In fact the above reproducer is a part of test 272.
To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
the file system is mounted with delayed allocation. This can be easily
done by fixing ext4_should_*_data() functions do ignore data journal
flag when delalloc is set (suggested by Ted). We also have to set the
appropriate address space operations for the inode (again, ignoring data
journal flag if delalloc enabled).
Additionally this commit introduces ext4_inode_journal_mode() function
because ext4_should_*_data() has already had a lot of common code and
this change is putting it all into one function so it is easier to
read.
Successfully tested with xfstests in following configurations:
journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.
This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one. Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.
Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb
and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go
away, because buried within that large change is some more flag
clearing. I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.
There is a race between ext4 buffer write and direct_IO read with
dioread_nolock mount option enabled. The problem is that we clear
PageWriteback flag during end_io time but will do
uninitialized-to-initialized extent conversion later with dioread_nolock.
If an O_direct read request comes in during this period, ext4 will return
zero instead of the recently written data.
This patch checks whether there are any pending uninitialized-to-initialized
extent conversion requests before doing O_direct read to close the race.
Note that this is just a bandaid fix. The fundamental issue is that we
clear PageWriteback flag before we really complete an IO, which is
problem-prone. To fix the fundamental issue, we may need to implement an
extent tree cache that we can use to look up pending to-be-converted extents.
Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If create_basic_memory_bitmaps() fails, usermodehelpers are not re-enabled
before returning. Fix this. And while at it, reword the goto labels so that
they look more meaningful.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on the original patch submitted my Michael Wang
<wangyun@linux.vnet.ibm.com>.
Descriptors may not be write-back while checking TX hang with flag
FLAG2_DMA_BURST on.
So when we detect hang, we just flush the descriptor and detect
again for once.
-v2 change 1 to true and 0 to false and remove extra ()
CC: Michael Wang <wangyun@linux.vnet.ibm.com> CC: Flavio Leitner <fbl@redhat.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The D1F5 revision of the WinTV HVR-1900 uses a tda18271c2 tuner
instead of a tda18271c1 tuner as used in revision D1E9. To
account for this, we must hardcode the frontend configuration
to use the same IF frequency configuration for both revisions
of the device.
6MHz DVB-T is unaffected by this issue, as the recommended
IF Frequency configuration for 6MHz DVB-T is the same on both
c1 and c2 revisions of the tda18271 tuner.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Cc: Mike Isely <isely@pobox.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The error handling in lgdt3303_read_status() and lgdt330x_read_ucblocks()
doesn't work, because i2c_read_demod_bytes() returns a u8 and (err < 0)
is always false.
The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace. Keeping that network namespace from being freed
when the last user goes away. Leaving things like vlan devices in the
leaked network namespace.
If you use ip netns add for much real work this problem becomes apparent
pretty quickly. It light testing the problem hides because frequently
you simply don't notice the leak.
Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.
This issue exists back to 3.0.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Reported-by: Justin Pettit <jpettit@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem occurs on !CONFIG_VM86 kernels [1] when a kernel-mode task
returns from a system call with a pending signal.
A real-life scenario is a child of 'khelper' returning from a failed
kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ].
kernel_execve() fails due to a pending SIGKILL, which is the result of
"kill -9 -1" (at least, busybox's init does it upon reboot).
The loop is as follows:
* syscall_exit_work:
- work_pending: // start_of_the_loop
- work_notify_sig:
- do_notify_resume()
- do_signal()
- if (!user_mode(regs)) return;
- resume_userspace // TIF_SIGPENDING is still set
- work_pending // so we call work_pending => goto
// start_of_the_loop
More information can be found in another LKML thread:
http://www.serverphorums.com/read.php?12,457826
[1] the problem was also seen on MIPS.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Link: http://lkml.kernel.org/r/1332448765.2299.68.camel@dimm Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
URB unlinking is always racing with its completion and tx_complete
may be called before or during running usb_unlink_urb, so tx_complete
must not clear urb->dev since it will be used in unlink path,
otherwise invalid memory accesses or usb device leak may be caused
inside usb_unlink_urb.
Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Oliver Neukum <oliver@neukum.org> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
recursive locking in usbnet_stop()) fixes the recursive locking
problem by releasing the skb queue lock, but it makes usb_unlink_urb
racing with defer_bh, and the URB to being unlinked may be freed before
or during calling usb_unlink_urb, so use-after-free problem may be
triggerd inside usb_unlink_urb.
The patch fixes the use-after-free problem by increasing URB
reference count with skb queue lock held before calling
usb_unlink_urb, so the URB won't be freed until return from
usb_unlink_urb.
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Oliver Neukum <oliver@neukum.org> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem is that for the case of priority queues, we
have to assume that __rpc_remove_wait_queue_priority will move new
elements from the tk_wait.links lists into the queue->tasks[] list.
We therefore cannot use list_for_each_entry_safe() on queue->tasks[],
since that will skip these new tasks that __rpc_remove_wait_queue_priority
is adding.
Without this fix, rpc_wake_up and rpc_wake_up_status will both fail
to wake up all functions on priority wait queues, which can result
in some nasty hangs.
The 'find_wl_entry()' function expects the maximum difference as the second
argument, not the maximum absolute value. So the "unknown" eraseblock picking
was incorrect, as Shmulik Ladkani spotted. This patch fixes the issue.
Two bad things can happen in ubi_scan():
1. If kmem_cache_create() fails we jump to out_si and call
ubi_scan_destroy_si() which calls kmem_cache_destroy().
But si->scan_leb_slab is NULL.
2. If process_eb() fails we jump to out_vidh, call
kmem_cache_destroy() and ubi_scan_destroy_si() which calls
again kmem_cache_destroy().
This patch fixes an issue when cifs_mount receives a
STATUS_BAD_NETWORK_NAME error during cifs_get_tcon but is able to
continue after an DFS ROOT referral. In this case, the return code
variable is not reset prior to trying to mount from the system referred
to. Thus, is_path_accessible is not executed and the final DFS referral
is not performed causing a mount error.
Use case: In DNS, example.com resolves to the secondary AD server
ad2.example.com Our primary domain controller is ad1.example.com and has
a DFS redirection set up from \\ad1\share\Users to \\files\share\Users.
Mounting \\example.com\share\Users fails.
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru Signed-off-by: Thomas Hadig <thomas@intapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we get concurrent lookups of the same inode that is not in the
per-AG inode cache, there is a race condition that triggers warnings
in unlock_new_inode() indicating that we are initialising an inode
that isn't in a the correct state for a new inode.
When we do an inode lookup via a file handle or a bulkstat, we don't
serialise lookups at a higher level through the dentry cache (i.e.
pathless lookup), and so we can get concurrent lookups of the same
inode.
The race condition is between the insertion of the inode into the
cache in the case of a cache miss and a concurrently lookup:
Thread 1 Thread 2
xfs_iget()
xfs_iget_cache_miss()
xfs_iread()
lock radix tree
radix_tree_insert()
rcu_read_lock
radix_tree_lookup
lock inode flags
XFS_INEW not set
igrab()
unlock inode flags
rcu_read_unlock
use uninitialised inode
.....
lock inode flags
set XFS_INEW
unlock inode flags
unlock radix tree
xfs_setup_inode()
inode flags = I_NEW
unlock_new_inode()
WARNING as inode flags != I_NEW
This can lead to inode corruption, inode list corruption, etc, and
is generally a bad thing to occur.
Fix this by setting XFS_INEW before inserting the inode into the
radix tree. This will ensure any concurrent lookup will find the new
inode with XFS_INEW set and that forces the lookup to wait until the
XFS_INEW flag is removed before allowing the lookup to succeed.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a setattr() fails because of an NFS4ERR_OPENMODE error, it is
probably due to us holding a read delegation. Ensure that the
recovery routines return that delegation in this case.
If we know that the delegation stateid is bad or revoked, we need to
remove that delegation as soon as possible, and then mark all the
stateids that relied on that delegation for recovery. We cannot use
the delegation as part of the recovery process.
Also note that NFSv4.1 uses a different error code (NFS4ERR_DELEG_REVOKED)
to indicate that the delegation was revoked.
Finally, ensure that setlk() and setattr() can both recover safely from
a revoked delegation.
On hosts without this patch, 32bit guests will crash (and 64bit guests
may behave in a wrong way) for example by simply executing following
nasm-demo-application:
[bits 32]
global _start
SECTION .text
_start: syscall
(I tested it with winxp and linux - both always crashed)
The reason seems a missing "invalid opcode"-trap (int6) for the
syscall opcode "0f05", which is not available on Intel CPUs
within non-longmodes, as also on some AMD CPUs within legacy-mode.
(depending on CPU vendor, MSR_EFER and cpuid)
Because previous mentioned OSs may not engage corresponding
syscall target-registers (STAR, LSTAR, CSTAR), they remain
NULL and (non trapping) syscalls are leading to multiple
faults and finally crashs.
Depending on the architecture (AMD or Intel) pretended by
guests, various checks according to vendor's documentation
are implemented to overcome the current issue and behave
like the CPUs physical counterparts.
[mtosatti: cleanup/beautify code]
Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In order to be able to proceed checks on CPU-specific properties
within the emulator, function "get_cpuid" is introduced.
With "get_cpuid" it is possible to virtually call the guests
"cpuid"-opcode without changing the VM's context.
[mtosatti: cleanup/beautify code]
Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
handle_ir_buffer_fill() assumed that a completed descriptor would be
indicated by a non-zero transfer_status (as in most other descriptors).
However, this field is written by the controller as soon as (the end of)
the first packet has been written into the buffer. As a consequence, if
we happen to run into such a descriptor when the interrupt handler is
executed after such a packet has completed, the descriptor would be
taken out of the list of active descriptors as soon as the buffer had
been partially filled, so the event for the buffer being completely
filled would never be sent.
To fix this, handle descriptors only when they have been completely
filled, i.e., when res_count == 0. (This also matches the condition
that is reported by the controller with an interrupt.)
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the HT6560H datasheet, the recovery timing field is 4-bit wide,
with a value of 0 meaning 16 cycles. Correct obvious thinko in the recovery
field mask.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The old code did (MSB << 8) & 0xff, which always evaluates to 0. Just use
get_unaligned_be16() so we don't have to worry about whether our open-coded
version is correct or not.
Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The meanings of these fields are specific to SPI-5 (see 6.4.3).
For SCSI transport protocols other than the SCSI Parallel
Interface, these fields are reserved.
We don't have a SPI fabric module, so we should never set these bits.
(The comment was misleading, since it only mentioned Sync but the
actual code set WBUS16 too).
Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hardware only takes 27 bits for the offset, so larger offsets are
truncated, and the hardware cursor shows random bits other than the intended
ones.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
If RAID1 or RAID10 is used under LVM or some other stacking
block device, it is possible to enter a deadlock during
resync or recovery.
This can happen if the upper level block device creates
two requests to the RAID1 or RAID10. The first request gets
processed, blocks recovery and queue requests for underlying
requests in current->bio_list. A resync request then starts
which will wait for those requests and block new IO.
But then the second request to the RAID1/10 will be attempted
and it cannot progress until the resync request completes,
which cannot progress until the underlying device requests complete,
which are on a queue behind that second request.
So allow that second request to proceed even though there is
a resync request about to start.
This is suitable for any -stable kernel.
Reported-by: Ray Morris <support@bettercgi.com> Tested-by: Ray Morris <support@bettercgi.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When commit 69e51b449d383e (md/bitmap: separate out loading a bitmap...)
created bitmap_load, it missed calling it after bitmap_create when a
bitmap is created through the sysfs interface.
So if a bitmap is added this way, we don't allocate memory properly
and can crash.
This is suitable for any -stable release since 2.6.35. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes a bug in tcm_fc where fc_exch memory from fc_exch_mgr->ep_pool
is currently being leaked by ft_send_resp_status() usage. Following current
code in ft_queue_status() response path, using lport->tt.seq_send() needs to be
followed by a lport->tt.exch_done() in order to release fc_exch memory back into
libfc_em kmem_cache.
ft_send_resp_status() code is currently used in pre submit se_cmd ft_send_work()
error exceptions, TM request setup exceptions, and main TM response callback
path in ft_queue_tm_resp(). This bugfix addresses the leak in these cases.
Cc: Mark D Rustad <mark.d.rustad@intel.com> Cc: Kiran Patil <kiran.patil@intel.com> Cc: Robert Love <robert.w.love@intel.com> Cc: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The USB graphics card driver delays the unregistering of the framebuffer
device to a workqueue, which breaks the userspace visible remove uevent
sequence. Recent userspace tools started to support USB graphics card
hotplug out-of-the-box and rely on proper events sent by the kernel.
The framebuffer device is a direct child of the USB interface which is
removed immediately after the USB .disconnect() callback. But the fb device
in /sys stays around until its final cleanup, at a time where all the parent
devices have been removed already.
To work around that, we remove the sysfs fb device directly in the USB
.disconnect() callback and leave only the cleanup of the internal fb
data to the delayed work.