Many cards based on CY7C68300A/B/C use the USB ID 04b4:6830 but only the
B and C variants (EZ-USB AT2LP) support the ATA Command Block
functionality, according to the data sheets. The A variant (EZ-USB AT2)
locks up if ATACB is attempted, until a typical 30 seconds timeout runs
out and a USB reset is performed.
https://bugs.launchpad.net/bugs/428469
It seems that one way to spot a CY7C68300A (at least where the card
manufacturer left Cypress' EEPROM default vaules, against Cypress'
recommendations) is to look at the USB string descriptor indices.
A http://media.digikey.com/pdf/Data%20Sheets/Cypress%20PDFs/CY7C68300A.pdf
B http://www.farnell.com/datasheets/43456.pdf
C http://www.cypress.com/?rID=14189
Note that a CY7C68300B/C chip appears as CY7C68300A if it is running
in Backward Compatibility Mode, and if ATACB would be supported in this
case there is anyway no way to tell which chip it really is.
For 5 years my external USB drive has been locking up for half a minute
when plugged in and ata_id is run by udev, or anytime hdparm or similar
is run on it.
Finally looking at the /correct/ datasheet I think I found the reason. I
am aware the quirk in this patch is a bit hacky, but the hardware
manufacturers haven't made it easy for us.
The current ST Micro Connect Lite uses the FT4232H hi-speed quad USB
UART FTDI chip. It is also possible to drive STM reference targets
populated with an on-board JTAG debugger based on the FT2232H chip with
the same STMicroelectronics tools.
For this reason, the ST Micro Connect Lite PIDs should be
ST_STMCLT_2232_PID: 0x3746
ST_STMCLT_4232_PID: 0x3747
Signed-off-by: Adrian Thomasset <adrian.thomasset@st.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the
change made by commit a0b30c1229: ext4: use module parameters instead
of debugfs for mballoc_debug
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
After a server reboot, the reclaimer thread will recover all the existing
locks. For locks that are blocked, however, it will change the value
of block->b_status to nlm_lck_denied_grace_period in order to signal that
they need to wake up and resend the original blocking lock request.
Due to a bug, however, the block->b_status never gets reset after the
blocked locks have been woken up, and so the process goes into an
infinite loop of resends until the blocked lock is satisfied.
Reported-by: Marc Eshel <eshel@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
As discussed in this thread
http://lists.freedesktop.org/archives/dri-devel/2013-April/037411.html
GMBUS based DVO transmitter detection seems to be unreliable which could
result in an unusable DVO port.
The attached patch fixes this by falling back to bit banging mode for
the time DVO transmitter detection is in progress.
Signed-off-by: David Müller <d.mueller@elsoft.ch> Tested-by: David Müller <d.mueller@elsoft.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When the stations are being restored because of unassoc
RXON, the LQ cmd may not have been initialized because it
is initialized only after association.
Sending zeroed LQ_CMD makes the fw unhappy: it raises
SYSASSERT_2078.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com>
[move zero_lq and make static const] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Backlight cleanup in the eDP connector destroy callback caused the
backlight device to be removed on some systems that first initialized LVDS
and then attempted to initialize eDP. Prevent multiple backlight
initializations, and ensure backlight cleanup is only done once by moving
it to modeset cleanup.
A small wrinkle is the introduced asymmetry in backlight
setup/cleanup. This could be solved by adding refcounting, but it seems
overkill considering that there should only ever be one backlight device.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=55701 Signed-off-by: Jani Nikula <jani.nikula@intel.com> Tested-by: Peter Verthez <peter.verthez@skynet.be> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2:
- Adjust context
- s/dev_priv->backlight\.device/dev_priv->backlight/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
manually flush writes to memory prior to writing the fence register in
conjunction with the memory barriers placed around the register write.
Fixes i-g-t/gem_fence_thrash
v2: Bring a bigger gun
v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
v4: Remove changes for working generations.
v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
v6: Rewrite comments to ellide forgotten history.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Tested-by: Jon Bloomfield <jon.bloomfield@intel.com> (v2) Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2: insert the cache flush in i915_gem_object_get_fence()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When usbfs receives a ctrl-request from userspace it calls check_ctrlrecip,
which for a request with USB_RECIP_ENDPOINT tries to map this to an interface
to see if this interface is claimed, except for ctrl-requests with a type of
USB_TYPE_VENDOR.
When trying to use this device: http://www.akaipro.com/eiepro
redirected to a Windows vm running on qemu on top of Linux.
The windows driver makes a ctrl-req with USB_TYPE_CLASS and
USB_RECIP_ENDPOINT with index 0, and the mapping of the endpoint (0) to
the interface fails since ep 0 is the ctrl endpoint and thus never is
part of an interface.
This patch fixes this ctrl-req failing by skipping the checkintf call for
USB_RECIP_ENDPOINT ctrl-reqs on the ctrl endpoint.
Reported-by: Dave Stikkolorum <d.r.stikkolorum@hhs.nl> Tested-by: Dave Stikkolorum <d.r.stikkolorum@hhs.nl> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The hardware revision of the codec is based at 0x40. Subtract that
before convering to ASCII. The same as it is done for 98095.
Signed-off-by: Dylan Reid <dgreid@chromium.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
smpboot: Booting Node 0 Processor 1 APIC 0x2
installing Xen timer for CPU 1
BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
Call Trace:
[<ffffffff810c1fea>] __might_sleep+0xda/0x100
[<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff813036eb>] kvasprintf+0x5b/0x90
[<ffffffff81303758>] kasprintf+0x38/0x40
[<ffffffff81044510>] xen_setup_timer+0x30/0xb0
[<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
[<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
The solution to that is use kasprintf in the CPU hotplug path
that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
and remove the call to in xen_hvm_setup_cpu_clockevents.
Unfortunatly the later is not a good idea as the bootup path
does not use xen_hvm_cpu_notify so we would end up never allocating
timer%d interrupt lines when booting. As such add the check for
atomic() to continue.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
While we don't use the spinlock interrupt line (see for details
commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 -
xen: disable PV spinlocks on HVM) - we should still do the proper
init / deinit sequence. We did not do that correctly and for the
CPU init for PVHVM guest we would allocate an interrupt line - but
failed to deallocate the old interrupt line.
This resulted in leakage of an irq_desc but more importantly this splat
as we online an offlined CPU:
There is the oddity of the 'spinlock1' still being present.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the PVHVM path when we do CPU online/offline path we would
leak the timer%d IRQ line everytime we do a offline event. The
online path (xen_hvm_setup_cpu_clockevents via
x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
line for the timer%d.
But we would still use the old interrupt line leading to:
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When a full scan 2.4 and 5 GHz scan is scheduled, but then the 2.4 GHz
part of the scan disables a 5.2 GHz channel due to, e.g. receiving
country or frequency information, that 5.2 GHz channel might already
be in the list of channels to scan next. Then, when the driver checks
if it should do a passive scan, that will return false and attempt an
active scan. This is not only wrong but can also lead to the iwlwifi
device firmware crashing since it checks regulatory as well.
Fix this by not setting the channel flags to just disabled but rather
OR'ing in the disabled flag. That way, even if the race happens, the
channel will be scanned passively which is still (mostly) correct.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit b51306c (PCI: Set device power state to PCI_D0 for device
without native PM support) modified pci_platform_power_transition()
by adding code causing dev->current_state for devices that don't
support native PCI PM but are power-manageable by the platform to be
changed to PCI_D0 regardless of the value returned by the preceding
platform_pci_set_power_state(). In particular, that also is done
if the platform_pci_set_power_state() has been successful, which
causes the correct power state of the device set by
pci_update_current_state() in that case to be overwritten by PCI_D0.
Fix that mistake by making the fallback to PCI_D0 only happen if
the platform_pci_set_power_state() has returned an error.
[bhelgaas: folded in Yinghai's simplification, added URL & stable info]
Reference: http://lkml.kernel.org/r/27806FC4E5928A408B78E88BBC67A2306F466BBA@ORSMSX101.amr.corp.intel.com Reported-by: Chris J. Benenati <chris.j.benenati@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The serial core uses device_find_child() but does not drop the reference to
the retrieved child after using it. This patch add the missing put_device().
What I have done to test this issue.
I used a machine with an AMBA PL011 serial driver. I tested the patch on
next-20120408 because the last branch [next-20120415] does not boot on this
board.
For test purpose, I added some pr_info() messages to print the refcount
after device_find_child() (lines: 1937,2009), and after put_device()
(lines: 1947, 2021).
Commit 88a8516a2128 (ALSA: usbaudio: implement USB autosuspend)
introduced autopm for all USB audio/MIDI devices. However, many MIDI
devices, such as synthesizers, do not merely transmit MIDI messages but
use their MIDI inputs to control other functions. With autopm, these
devices would get powered down as soon as the last MIDI port device is
closed on the host.
Even some plain MIDI interfaces could get broken: they automatically
send Active Sensing messages while powered up, but as soon as these
messages cease, the receiving device would interpret this as an
accidental disconnection.
Commit f5f165418cab (ALSA: usb-audio: Fix missing autopm for MIDI input)
introduced another regression: some devices (e.g. the Roland GAIA SH-01)
are self-powered but do a reset whenever the USB interface's power state
changes.
To work around all this, just disable autopm for all USB MIDI devices.
Reported-by: Laurens Holst Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When recording at 176.2KHz or 192Khz, the device adds a 32-bit length
header to the capture packets, which obviously needs to be ignored for
recording to work properly.
Also, the last byte of the length header being interpreted as L0 of
the first sample caused spikes every 0.5ms, resulting in a loud 16KHz
tone (about the highest 'B' on a piano) being present throughout
captures.
Tested at all sample rates on an E-Mu 0404USB, and tested for
regressions on a generic USB headset.
Signed-off-by: Calvin Owens <jcalvinowens@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust filenames, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Don't actually close any opens until we don't need them at all.
This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
One can trigger an overflow when using ktime_add_ns() on a 32bit
architecture not supporting CONFIG_KTIME_SCALAR.
When passing a very high value for u64 nsec, e.g. 7881299347898368000
the do_div() function converts this value to seconds (7881299347) which
is still to high to pass to the ktime_set() function as long. The result
in is a negative value.
The problem on my system occurs in the tick-sched.c,
tick_nohz_stop_sched_tick() when time_delta is set to
timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
valid, thus ktime_add_ns() is called with a too large value resulting in
a negative expire value. This leads to an endless loop in the ticker code:
time_delta: 7881299347898368000
expires = ktime_add_ns(last_update, time_delta)
expires: negative value
This fix caps the value to KTIME_MAX.
This error doesn't occurs on 64bit or architectures supporting
CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
Signed-off-by: David Engraf <david.engraf@sysgo.com>
[jstultz: Minor tweaks to commit message & header] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When we change the system time to a low value like this, the value of
timekeeper->offs_real will be a negative value.
It seems that the WARN occurs because an hrtimer has been started in the time
between the releasing of the timekeeper lock and the IPI call (via a call to
on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result
is that a REALTIME_CLOCK timer has been added with softexpires = expires =
KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at
kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the
clock base's offset (which was set to timekeeper->offs_real in
do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
was KTIME_MAX):
KTIME_MAX - (a negative value) = overflow
A simple check for an overflow can resolve this problem. Using KTIME_MAX
instead of the overflow value will result in the hrtimer function being run,
and the reprogramming of the timer after that.
Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweaked commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The "Mobile Sandy Bridge CPUs" in the Fujitsu Esprimo Q900
mini desktop PCs are probably misleading the LVDS detection
code in intel_lvds_supported. Nothing is connected to the
LVDS ports in these systems.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk. This makes callback longer and race
window becomes wider.
In order to fix this we should mark transaction as finished only after
callbacks have completed
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Backported to 3.2: s/jbd2_journal_free_transaction/kfree/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Ben Hutchings <ben@decadent.org.uk> Reported-by: George Barnett <gbarnett@atlassian.com>
[bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
we should return error status directly when nfs4_preprocess_stateid_op
return error.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The inode->i_mutex isn't hold when updating filp->f_pos
in read()/write(), so the filp->f_pos might be read as
0 or 1 in readdir() when there is concurrent read()/write()
on this same file, then may cause use after free in readdir().
The bug can be reproduced with Li Zefan's test code on the
link:
This patch fixes the use after free under this situation.
Reported-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: file position is child inode number, not hash] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
But this was hideously wrong. First of all these initializations
are now down far too late. Specifically after all the other cpus
have been brought up and initialized their own CMC vectors from
smp_callin(). Also ia64_mca_late_init() may be called from any cpu
so the line:
ia64_mca_cmc_vector_setup(); /* Setup vector on BSP */
is generally not executed on the BSP, and so the CMC vector isn't
setup at all on that processor.
Make use of the arch_early_irq_init() hook to get this code executed
at just the right moment: not too early, not too late.
Reported-by: Fred Hartnett <fred.hartnett@hp.com> Tested-by: Fred Hartnett <fred.hartnett@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Monitors seem to prefer it. Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37696
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2:
- Adjust context
- Add to pll->flags, not radeon_crtc->pll_flags] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Found problem on system that firmware that could handle pci aer.
Firmware get error reporting after pci injecting error, before os boots.
But after os boots, firmware can not get report anymore, even pci=noaer
is passed.
Root cause: BIOS _OSC has problem with query bit checking.
It turns out that BIOS vendor is copying example code from ACPI Spec.
In ACPI Spec 5.0, page 290:
If (Not(And(CDW1,1))) // Query flag clear?
{ // Disable GPEs for features granted native control.
If (And(CTRL,0x01)) // Hot plug control granted?
{
Store(0,HPCE) // clear the hot plug SCI enable bit
Store(1,HPCS) // clear the hot plug SCI status bit
}
...
}
When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
So it will get into code path that should be for control set only.
BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"
Current kernel code is using _OSC query to notify firmware about support
from OS and then use _OSC to set control bits.
During query support, current code is using all possible controls.
So will execute code that should be only for control set stage.
That will have problem when pci=noaer or aer firmware_first is used.
As firmware have that control set for os aer already in query support stage,
but later will not os aer handling.
We should avoid passing all possible controls, just use osc_control_set
instead.
That should workaround BIOS bugs with affected systems on the field
as more bios vendors are copying sample code from ACPI spec.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The 3rd parameter of flex_array_prealloc() is the number of elements,
not the index of the last element.
The effect of the bug is, when opening cgroup.procs, a flex array will
be allocated and all elements of the array is allocated with
GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
allocate memory for it, it'll trigger a BUG_ON().
Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The list of output registers is
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are iha, pte on the example.
If the predicate p7 is true, the 8th assembly instruction
"(p7) mov %0=r0;"
is the first one which writes to a register which is maintained by the
register constraints; it sets %0. %0 means the first register operand;
it is ret here.
This instruction might overwrite the %2 register (pte) which is needed
by the next instruction:
"(p7) st8 [%2]=r9;;"
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The attached patch fixes the register operand constraints in
arch/ia64/kvm/vtlb.c.
The register constraints should be
: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
The & means that GCC must not use any of the input registers to place
this output register in.
This is Debian bug#702639
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).
The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The list of output registers is
: "=r" (r8), "=r" (prev)
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are uaddr, newval, oldval on the
example.
The second assembly instruction
" mov %0=r0 \n"
is the first one which writes to a register; it sets %0 to 0. %0 means
the first register operand; it is r8 here. (The r0 is read-only and
always 0 on the Itanium; it can be used if an immediate zero value is
needed.)
This instruction might overwrite one of the other registers which are
still needed.
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The objdump utility can give us disassembly.
The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
look for a module that uses the funtion. This is the
cmpxchg_futex_value_locked() function in
kernel/futex.c:
static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
u32 uval, u32 newval)
{
int ret;
pagefault_disable();
ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
pagefault_enable();
return ret;
}
Now the disassembly. At first from the Kernel package 3.2.23 which has
been compiled with GCC 4.4, remeber this Kernel seemed to work:
objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
The lines
2b0: 0a 00 00 00 22 00 [MMI] mf;;
2b6: 80 00 00 00 42 00 mov r8=r0
2bc: 00 00 04 00 nop.i 0x0
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv
2cc: 00 00 04 00 nop.i 0x0;;
are the instructions of the assembly block.
The line
2b6: 80 00 00 00 42 00 mov r8=r0
sets the r8 register to 0 and after that
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
is wrong.
What happened here is what I explained above: An input register is
overwritten which is still needed.
The register operand constraints in futex.h are wrong.
(The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
The attached patch fixes the register operand constraints in futex.h.
The code after patching of it:
static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
I also initialized the 'r8' var with the C programming language.
The _asm qualifier on the definition of the 'r8' var forces GCC to use
the r8 processor register for it.
I don't believe that we should use inline assembly for zeroing out a
local variable.
The constraint is
"+r" (r8)
what means that it is both an input register and an output register.
Note that the page fault handler will modify the r8 register which
will be the return value of the function.
The real fix is
"=&r" (prev)
The & means that GCC must not use any of the input registers to place
this output register in.
Patched the Kernel 3.2.23 and compiled it with GCC4.4:
Much better.
There is a
270: 05 40 00 00 00 e1 [MLX] mov r8=r0
which was generated by C code r8 = 0. Below
2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34
what means that oldval is no longer overwritten.
This is Debian bug#702641
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ftrace_dump() had a lot of issues. What ftrace_dump() does, is when
ftrace_dump_on_oops is set (via a kernel parameter or sysctl), it
will dump out the ftrace buffers to the console when either a oops,
panic, or a sysrq-z occurs.
This was written a long time ago when ftrace was fragile to recursion.
But it wasn't written well even for that.
There's a possible deadlock that can occur if a ftrace_dump() is happening
and an NMI triggers another dump. This is because it grabs a lock
before checking if the dump ran.
It also totally disables ftrace, and tracing for no good reasons.
As the ring_buffer now checks if it is read via a oops or NMI, where
there's a chance that the buffer gets corrupted, it will disable
itself. No need to have ftrace_dump() do the same.
ftrace_dump() is now cleaned up where it uses an atomic counter to
make sure only one dump happens at a time. A simple atomic_inc_return()
is enough that is needed for both other CPUs and NMIs. No need for
a spinlock, as if one CPU is running the dump, no other CPU needs
to do it too.
The tracing_on variable is turned off and not turned on. The original
code did this, but it wasn't pretty. By just disabling this variable
we get the result of not seeing traces that happen between crashes.
For sysrq-z, it doesn't get turned on, but the user can always write
a '1' to the tracing_on file. If they are using sysrq-z, then they should
know about tracing_on.
The new code is much easier to read and less error prone. No more
deadlock possibility when an NMI triggers here.
Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently, the depth reported in the stack tracer stack_trace file
does not match the stack_max_size file. This is because the stack_max_size
includes the overhead of stack tracer itself while the depth does not.
The first time a max is triggered, a calculation is not performed that
figures out the overhead of the stack tracer and subtracts it from
the stack_max_size variable. The overhead is stored and is subtracted
from the reported stack size for comparing for a new max.
Now the stack_max_size corresponds to the reported depth:
While testing against and older gcc on x86 that uses mcount instead
of fentry, I found that pasing in ip + MCOUNT_INSN_SIZE let the
stack trace show one more function deep which was missing before.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When gcc 4.6 on x86 is used, the function tracer will use the new
option -mfentry which does a call to "fentry" at every function
instead of "mcount". The significance of this is that fentry is
called as the first operation of the function instead of the mcount
usage of being called after the stack.
This causes the stack tracer to show some bogus results for the size
of the last function traced, as well as showing "ftrace_call" instead
of the function. This is due to the stack frame not being set up
by the function that is about to be traced.
The 216 size for ftrace_call includes both the ftrace_call stack
(which includes the saving of registers it does), as well as the
stack size of the parent.
To fix this, if CC_USING_FENTRY is defined, then the stack_tracer
will reserve the first item in stack_dump_trace[] array when
calling save_stack_trace(), and it will fill it in with the parent ip.
Then the code will look for the parent pointer on the stack and
give the real size of the parent's stack pointer:
I'm Cc'ing stable, although it's not urgent, as it only shows bogus
size for item #0, the rest of the trace is legit. It should still be
corrected in previous stable releases.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Use the stack of stack_trace_call() instead of check_stack() as
the test pointer for max stack size. It makes it a bit cleaner
and a little more accurate.
Adding stable, as a later fix depends on this patch.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
dprintk() shouldn't access @ring after it's unmapped.
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: keep the second argument to kunmap_atomic()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The current code does not set the msg_namelen member to 0 and therefore
makes net/socket.c leak the local sockaddr_storage variable to userland
-- 128 bytes of kernel stack memory. Fix that.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
try_to_wake_up_local() should only be invoked to wake up another
task in the same runqueue and BUG_ON()s are used to enforce the
rule. Missing try_to_wake_up_local() can stall workqueue
execution but such stalls are likely to be finite either by
another work item being queued or the one blocked getting
unblocked. There's no reason to trigger BUG while holding rq
lock crashing the whole system.
Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130318192234.GD3042@htj.dyndns.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page. If the range falls within
the same memslot, then this will be a fast operation. If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.
Tested: Test against kvm_clock unit tests.
Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
[bwh: Backported to 3.2:
- Drop change in lapic.c
- Keep using __gfn_to_memslot() in kvm_gfn_to_hva_cache_init()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
that request. ioapic_read_indirect contains an
ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
non-debug builds. In recent kernels this allows a guest to cause a kernel
oops by reading invalid memory. In older kernels (pre-3.3) this allows a
guest to read from large ranges of host memory.
Tested: tested against apic unit tests.
Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There is a potential use after free issue with the handling of
MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable
memory such as frame buffers then KVM might continue to write to that
address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins
the page in memory so it's unlikely to cause an issue, but if the user
space component re-purposes the memory previously used for the guest, then
the guest will be able to corrupt that memory.
Tested: Tested against kvmclock unit test
Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- We do not implement the PVCLOCK_GUEST_STOPPED flag] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page. The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls. Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.
Tested: Tested against kvmclock unit test.
Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Okay so Alan's patch handled the case where there was no registered fbcon,
however the other path entered in set_con2fb_map pit.
In there we called fbcon_takeover, but we also took the console lock in a couple
of places. So push the console lock out to the callers of set_con2fb_map,
this means fbmem and switcheroo needed to take the lock around the fb notifier
entry points that lead to this.
This should fix the efifb regression seen by Maarten.
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Tested-by: Lu Hua <huax.lu@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This code was broken because it assumed that all MTD devices were map-based.
Disable it for now, until it can be fixed properly for the next merge window.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
It would cause no link after suspending or shutdowning when the
nic changes the speed to 10M and connects to a link partner which
forces the speed to 100M.
Check the link partner ability to determine which speed to set.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
As found by gcc-4.8, the QUEUE_SYSFS_BIT_FNS macro creates functions
that use a value generated by queue_var_store independent of whether
that value was set or not.
block/blk-sysfs.c: In function 'queue_store_nonrot':
block/blk-sysfs.c:244:385: warning: 'val' may be used uninitialized in this function [-Wmaybe-uninitialized]
Unlike most other such warnings, this one is not a false positive,
writing any non-number string into the sysfs files indeed has
an undefined result, rather than returning an error.
* Added the device ID to the modalias list and assinged ALC662 patches
for it
* Added 4 port support for the device ID 0671 in alc662_parse_auto_config
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44156 Reported-by: Alan Zimmerman <alan.zimm@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=31522#c35
[Note: There are more than one broken setups in the bug. This fixes one.] Reported-by: Martins <andrissr@inbox.lv> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59628 Reported-by: Roland Gruber <post@rolandgruber.de> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
NCR machines with LVDS panels using Intel chipsets need to have the
QUIRK_INVERT_BRIGHTNESS bit set.
Unfortunately NCR doesn't set a meaningful subvendor/subdevice ID,
therefore we add a DMI dependent quirk list.
Signed-off-by: Egbert Eich <eich@suse.de>
[danvet: fixup whitespace fail.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- Add #include <linux/dmi.h>] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Mark the Acer Aspire 5734Z that this machines requires the module to
invert the panel backlight brightness value after reading from and prior
to writing to the PCI configuration space.
Signed-off-by: Carsten Emde <C.Emde@osadl.org> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
A machine may need to invert the panel backlight brightness value. This
patch adds the infrastructure for a quirk to do so.
Signed-off-by: Carsten Emde <C.Emde@osadl.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Following the documentation of the Legacy Backlight Brightness (LBB)
Register in the configuration space of some Intel PCI graphics adapters,
setting the LBB register with the value 0x0 causes the backlight to be
turned off, and 0xFF causes the backlight to be set to 100% intensity
(http://download.intel.com/embedded/processors/Whitepaper/324567.pdf).
The Acer Aspire 5734Z, however, turns the backlight off at 0xFF and sets
it to maximum intensity at 0. In consequence, the screen of this systems
becomes dark at an early boot stage which makes it unusable. The same
inversion applies to the BLC_PWM_CTL I915 register. This problem was
introduced in kernel version 2.6.38 when the PCI device of this system
was first supported by the i915 KMS module.
This patch adds a parameter to the i915 module to enable inversion of
the brightness variable (i915.invert_brightness).
Signed-off-by: Carsten Emde <C.Emde@osadl.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Btrfs uses page_mkwrite to ensure stable pages during
crc calculations and mmap workloads. We call clear_page_dirty_for_io
before we do any crcs, and this forces any application with the file
mapped to wait for the crc to finish before it is allowed to change
the file.
With compression on, the clear_page_dirty_for_io step is happening after
we've compressed the pages. This means the applications might be
changing the pages while we are compressing them, and some of those
modifications might not hit the disk.
This commit adds the clear_page_dirty_for_io before compression starts
and makes sure to redirty the page if we have to fallback to
uncompressed IO as well.
Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Alexandre Oliva <oliva@gnu.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
De-account the accumulative dirty counters on page redirty.
Page redirties (very common in ext4) will introduce mismatch between
counters (a) and (b)
a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
b) NR_WRITTEN, BDI_WRITTEN
This will introduce systematic errors in balanced_rate and result in
dirty page position errors (ie. the dirty pages are no longer balanced
around the global/bdi setpoints).
Acked-by: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The return code from the registration of the thermal class is used to
unallocate resources, but this failure isn't passed back to the caller of
thermal_init. Return this failure back to the caller.
This bug was introduced in changeset 4cb18728 which overwrote the return code
when the variable was re-used to catch the return code of the registration of
the genetlink thermal socket family.
Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 257b5358b32f ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.
Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.
This just undoes that (presumably unintentional) part of the commit.
Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: scm_set_cred() does user namespace conversion
of euid/egid using cred_to_ucred(). Add and use cred_real_to_ucred() to
do the same thing for real uid/gid.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access the
error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) in
get_page().
The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit
layout between hugepage hwpoison entry and pmd entry, so
follow_hugetlb_page() which is called in get_dump_page() returns a wrong
page from a given address.
With this patch, we can call hugetlb_fault() and take proper actions (we
wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for
hwpoisoned entries,) and as the result we can dump all hugepages except
for hwpoisoned ones.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Events may be created with attr->disabled == 1 and attr->enable_on_exec
== 1, which confuses the group validation code because events with the
PERF_EVENT_STATE_OFF are not considered candidates for scheduling, which
may lead to failure at group scheduling time.
This patch fixes the validation check for ARM, so that events in the
OFF state are still considered when enable_on_exec is true.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Jiri Olsa <jolsa@redhat.com> Reported-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On Feroceon the L2 cache becomes non-coherent with the CPU
when the L1 caches are disabled. Thus the L2 needs to be invalidated
after both L1 caches are disabled.
On kexec before the starting the code for relocation the kernel,
the L1 caches are disabled in cpu_froc_fin (cpu_v7_proc_fin for Feroceon),
but after L2 cache is never invalidated, because inv_all is not set
in cache-feroceon-l2.c.
So kernel relocation and decompression may has (and usually has) errors.
Setting the function enables L2 invalidation and fixes the issue.
Signed-off-by: Illia Ragozin <illia.ragozin@grapecom.com> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The hardware parsing of Control Wrapper Frames needs to be disabled, as
it has been causing spurious decryption error reports. The initvals for
other chips have been updated to disable it, but AR9580 was left out for
some reason.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
To get correct endianes on little endian cpus (like arm) while reading device
tree properties, this patch replaces of_get_property() with
of_property_read_u32(). While there use of_property_read_bool() for the
handling of the boolean "nxp,no-comparator-bypass" property.
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Since the firmware has been open sourced, the minor version has been
bumped to 1.4 and the API/ABI will stay compatible across further 1.x
releases.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Let's do the changes properly and fix the same problem everywhere, not
just for one case.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: mohawk doesn't support suspend at all] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Revert commit 62a3ddef6181 ("vfs: fix spinning prevention in prune_icache_sb").
This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.
Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.
Anatol Pomozov identified a race condition that hits module unloading
and re-loading. To quote Anatol:
"This is a race codition that exists between kset_find_obj() and
kobject_put(). kset_find_obj() might return kobject that has refcount
equal to 0 if this kobject is freeing by kobject_put() in other
thread.
Here is timeline for the crash in case if kset_find_obj() searches for
an object tht nobody holds and other thread is doing kobject_put() on
the same kobject:
THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put())
splin_lock()
atomic_dec_return(kobj->kref), counter gets zero here
... starts kobject cleanup ....
spin_lock() // WAIT thread A in kobj_kset_leave()
iterate over kset->list
atomic_inc(kobj->kref) (counter becomes 1)
spin_unlock()
spin_lock() // taken
// it does not know that thread A increased counter so it
remove obj from list
spin_unlock()
vfree(module) // frees module object with containing kobj
// kobj points to freed memory area!!
kobject_put(kobj) // OOPS!!!!
The race above happens because module.c tries to use kset_find_obj()
when somebody unloads module. The module.c code was introduced in
commit 6494a93d55fa"
Anatol supplied a patch specific for module.c that worked around the
problem by simply not using kset_find_obj() at all, but rather than make
a local band-aid, this just fixes kset_find_obj() to be thread-safe
using the proper model of refusing the get a new reference if the
refcount has already dropped to zero.
See examples of this proper refcount handling not only in the kref
documentation, but in various other equivalent uses of this pattern by
grepping for atomic_inc_not_zero().
[ Side note: the module race does indicate that module loading and
unloading is not properly serialized wrt sysfs information using the
module mutex. That may require further thought, but this is the
correct fix at the kobject layer regardless. ]
Reported-analyzed-and-tested-by: Anatol Pomozov <anatol.pomozov@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This function is intended to simplify locking around refcounting for
objects that can be looked up from a lookup structure, and which are
removed from that lookup structure in the object destructor.
Operations on such objects require at least a read lock around
lookup + kref_get, and a write lock around kref_put + remove from lookup
structure. Furthermore, RCU implementations become extremely tricky.
With a lookup followed by a kref_get_unless_zero *with return value check*
locking in the kref_put path can be deferred to the actual removal from
the lookup structure and RCU lookups become trivial.
v2: Formatting fixes.
v3: Invert the return value.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Add #include <linux/atomic.h>] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file. That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it. So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent. This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct. With this
I'm no longer getting nbytes errors out of btrfsck.
Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
[bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Currently set_ftrace_pid and set_graph_function files use seq_lseek
for their fops. However seq_open() is called only for FMODE_READ in
the fops->open() so that if an user tries to seek one of those file
when she open it for writing, it sees NULL seq_file and then panic.
It can be easily reproduced with following command:
$ cd /sys/kernel/debug/tracing
$ echo 1234 | sudo tee -a set_ftrace_pid
In this example, GNU coreutils' tee opens the file with fopen(, "a")
and then the fopen() internally calls lseek().
Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: ftrace_regex_lseek() is static] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch fixes a bug where a handful of informational / control CDBs
that should be allowed during ALUA access state Standby/Offline/Transition
where incorrectly returning CHECK_CONDITION + ASCQ_04H_ALUA_TG_PT_*.
This includes INQUIRY + REPORT_LUNS, which would end up preventing LUN
registration when LUN scanning occured during these ALUA access states.
Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This patch fixes the MAINTENANCE_IN service action type checks to only
look at the proper lower 5 bits of cdb byte 1. This addresses the case
where MI_REPORT_TARGET_PGS w/ extended header using the upper three bits of
cdb byte 1 was not processed correctly in transport_generic_cmd_sequencer,
as well as the three cases for standby, unavailable, and transition ALUA
primary access state checks.
Also add MAINTENANCE_IN to the excluded list in transport_generic_prepare_cdb()
to prevent the PARAMETER DATA FORMAT bits from being cleared.
Cc: Hannes Reinecke <hare@suse.de> Cc: Rob Evers <revers@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.
Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.
[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
updates" may cause a minor performance regression on
bare metal. This patch resolves that performance regression. It is
somewhat unclear to me if this is a good -stable candidate. ]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.
One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:
- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries
In UP and non-preempt respectively, the spinlocks and preemption
disable/enable points are stubbed out entirely, because there is no
regular code that can ever hit the kind of concurrency they are meant to
protect against.
However, while there is no regular code that can cause scheduling, we
_do_ end up having some exceptional (literally!) code that can do so,
and that we need to make sure does not ever get moved into the critical
region by the compiler.
In particular, get_user() and put_user() is generally implemented as
inline asm statements (even if the inline asm may then make a call
instruction to call out-of-line), and can obviously cause a page fault
and IO as a result. If that inline asm has been scheduled into the
middle of a preemption-safe (or spinlock-protected) code region, we
obviously lose.
Now, admittedly this is *very* unlikely to actually ever happen, and
we've not seen examples of actual bugs related to this. But partly
exactly because it's so hard to trigger and the resulting bug is so
subtle, we should be extra careful to get this right.
So make sure that even when preemption is disabled, and we don't have to
generate any actual *code* to explicitly tell the system that we are in
a preemption-disabled region, we need to at least tell the compiler not
to move things around the critical region.
This patch grew out of the same discussion that caused commits 79e5f05edcbf ("ARC: Add implicit compiler barrier to raw_local_irq*
functions") and 3e2e0d2c222b ("tile: comment assumption about
__insn_mtspr for <asm/irqflags.h>") to come about.
Note for stable: use discretion when/if applying this. As mentioned,
this bug may never have actually bitten anybody, and gcc may never have
done the required code motion for it to possibly ever trigger in
practice.
Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: drop sched_preempt_enable_no_resched()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The Charge Pump needs the DSP clock to work properly, without it the
bypass to HP/LINEOUT is not working properly. This requirement is not
mentioned in the datasheet but has been confirmed by Mark Brown from
Wolfson.
Signed-off-by: Alban Bedel <alban.bedel@avionic-design.de> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
As commit 40dc166c (PM / Core: Introduce struct syscore_ops for core
subsystems PM) say, syscore_ops operations should be carried with one
CPU on-line and interrupts disabled. However, after commit f96972f2d
(kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()),
syscore_shutdown() is called before disable_nonboot_cpus(), so break
the rules. We have a MIPS machine with a 8259A PIC, and there is an
external timer (HPET) linked at 8259A. Since 8259A has been shutdown
too early (by syscore_shutdown()), disable_nonboot_cpus() runs without
timer interrupt, so it hangs and reboot fails. This patch call
syscore_shutdown() a little later (after disable_nonboot_cpus()) to
avoid reboot failure, this is the same way as poweroff does.
For consistency, add disable_nonboot_cpus() to kernel_halt().
Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
If we reenable ftrace via syctl, we currently set ftrace_trace_function
based on the previous simplistic algorithm. This is inconsistent with
what update_ftrace_function does. So better call that helper instead.
Link: http://lkml.kernel.org/r/5151D26F.1070702@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The sched_clock_remote() implementation has the following inatomicity
problem on 32bit systems when accessing the remote scd->clock, which
is a 64bit value.
While the update of scd->clock is using an atomic64 mechanism, the
readout on the remote cpu is not, which can cause completely bogus
readouts.
It is a quite rare problem, because it requires the update to hit the
narrow race window between the low/high readout and the update must go
across the 32bit boundary.
The resulting misbehaviour is, that CPU1 will see the sched_clock on
CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value
to this bogus timestamp. This stays that way due to the clamping
implementation for about 4 seconds until the synchronization with
CLOCK_MONOTONIC undoes the problem.
The issue is hard to observe, because it might only result in a less
accurate SCHED_OTHER timeslicing behaviour. To create observable
damage on realtime scheduling classes, it is necessary that the bogus
update of CPU1 sched_clock happens in the context of an realtime
thread, which then gets charged 4 seconds of RT runtime, which results
in the RT throttler mechanism to trigger and prevent scheduling of RT
tasks for a little less than 4 seconds. So this is quite unlikely as
well.
The issue was quite hard to decode as the reproduction time is between
2 days and 3 weeks and intrusive tracing makes it less likely, but the
following trace recorded with trace_clock=global, which uses
sched_clock_local(), gave the final hint:
What happens is that CPU0 goes idle and invokes
sched_clock_idle_sleep_event() which invokes sched_clock_local() and
CPU1 runs a remote wakeup for CPU0 at the same time, which invokes
sched_remote_clock(). The time jump gets propagated to CPU0 via
sched_remote_clock() and stays stale on both cores for ~4 seconds.
There are only two other possibilities, which could cause a stale
sched clock:
1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic
wrong value.
2) sched_clock() which reads the TSC returns a sporadic wrong value.
#1 can be excluded because sched_clock would continue to increase for
one jiffy and then go stale.
#2 can be excluded because it would not make the clock jump
forward. It would just result in a stale sched_clock for one jiffy.
After quite some brain twisting and finding the same pattern on other
traces, sched_clock_remote() remained the only place which could cause
such a problem and as explained above it's indeed racy on 32bit
systems.
So while on 64bit systems the readout is atomic, we need to verify the
remote readout on 32bit machines. We need to protect the local->clock
readout in sched_clock_remote() on 32bit as well because an NMI could
hit between the low and the high readout, call sched_clock_local() and
modify local->clock.
Thanks to Siegfried Wulsch for bearing with my debug requests and
going through the tedious tasks of running a bunch of reproducer
systems to generate the debug information which let me decode the
issue.
Reported-by: Siegfried Wulsch <Siegfried.Wulsch@rovema.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Some versions of pHyp will perform the adjunct partition test before the
ANDCOND test. The result of this is that H_RESOURCE can be returned and
cause the BUG_ON condition to occur. The HPTE is not removed. So add a
check for H_RESOURCE, it is ok if this HPTE is not removed as
pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a
specific HPTE to remove. So it is ok to just move on to the next slot
and try again.
Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>