Increase the retry count for the hard reset function to 100 but
shorten the time out period to 500 ms. See the comment for
ahci_highbank_hardreset for the reasons why those vaulues were
chosen.
For some reason, a lot of port-multipliers have issues with softreset.
SIMG [34]7x series port-multipliers have been quite erratic in this
regard. I recall that it was better with some firmware revisions and
the current list of quirks worked fine for a while. I think it got
worse with later firmwares or maybe my test coverage wasn't good
enough. Anyways, HPA is reporting that his 3726 setup suffers SRST
failures and then the PMP gets confused and fails to probe the last
port.
The hope was that we try to stick to the standard as much as possible
and soonish the PMPs and their firmwares will improve in quality, so
the quirk list was kept to minimum. Well, it seems like that's never
gonna happen.
Let's set NO_SRST for all [34]7x PMPs so that whatever remaining
userbase of the device suffer the least. Maybe we should do the same
for 57xx's but unfortunately I don't have any device left to test and
I'm not even sure 57xx's have ever been made widely available, so
let's leave those alone for now.
There are some SATA controllers which have both devices 0 and 1 but this module
just zeroes out taskfile and sets then ATA_TFLAG_DEVICE (not sure that's needed)
which could lead to a wrong device being selected just before issuing command.
Thus we should call ata_tf_init() which sets up the device register value
properly, like all other users of ata_exec_internal() do...
Driver displays wrong alarms for temperature attributes.
Turns out that temperature alarm bits are not fixed, but determined
by temperature source mapping. To fix the problem, walk through
the temperature sources to determine the correct alarm bit associated
with a given attribute.
We've got bug reports that the module loading stuck on Debian system
with 3.10 kernel. The debugging session revealed that the initial
registration of OSS sequencer clients stuck at module loading time,
which involves again with request_module() at the init phase. This is
triggered only by special --install stuff Debian is using, but it's
still not good to have such loops.
As a workaround, call the registration part asynchronously. This is a
better approach irrespective of the hang fix, in anyway.
add_control_with_pfx() in hda_generic.c assumes a shorter name string
for the control element, and this resulted in the truncation of the
long but valid string like "Headphone Surround Switch" in the middle.
This patch aligns the max size to the actual limit of snd_ctl_elem_id,
44.
Some VIA codecs like VT1708S have Mic boost amps in the mic pins but
they aren't exposed in the capability bits. In the past driver code,
we override the pin caps and create mic boost controls forcibly.
While transition to the generic parser, we lost the mic boost controls
although the pin caps are still overridden, because the generic parser
code checks the widget caps, too.
So this patch adds a new helper function to allow the override of the
given widget capability bits, and makes VIA codecs driver to add the
missing input-amp capability bit.
When a selection to a converter MUX is changed in hdmi_pcm_open(), it
should be cached so that the given connection can be restored properly
at PM resume. We need just to replace the corresponding
snd_hda_codec_write() call with snd_hda_codec_write_cache().
The refactoring by commit 9040d102 introduced the new function
snd_hda_check_power_state(). This function is supposed to return true
if the state already reached to the target state, but it actually
returns false for that. An utterly stupid typo while copy & paste.
Fortunately this didn't influence on much behavior because powering up
AFG usually powers up the child widgets, too. But the finer power
control must have been broken by this bug.
ad1884_fixup_hp_eapd() tries to set the NID for controlling the
speaker EAPD from the pin configuration. But the current code can't
work expectedly since it sets spec->eapd_nid before calling the
generic parser where the autocfg pins are set up.
This patch changes the function to set spec->eapd_nid after the
generic parser call while it sets vmaster hook unconditionally. The
spec->eapd_nid check is moved in the hook function itself instead.
When reading IIO_CHAN_INFO_OFFSET, the return value of iio_channel_read() for
success will be IIO_VAL*, checking for 0 is not correct.
Without this fix the offset applied by iio drivers will be ignored when
converting a raw value to one in appropriate base units (e.g mV) in
a IIO client drivers that use iio_convert_raw_to_processed including
iio-hwmon.
Since the info_mask split, iio_channel_has_info() is not working correctly.
info_mask_separate and info_mask_shared_by_type, it is not possible to compare
them directly with the iio_chan_info_enum enum. Correct that bit using the BIT()
macro.
On arm64, cache maintenance faults appear as data aborts with the CM
bit set in the ESR. The WnR bit, usually used to distinguish between
faulting loads and stores, always reads as 1 and (slightly confusingly)
the instructions are treated as reads by the architecture.
This patch fixes our fault handling code to treat cache maintenance
faults in the same way as loads.
commit 2f7021a8 "cpufreq: protect 'policy->cpus' from offlining
during __gov_queue_work()" caused a regression in CPU hotplug,
because it lead to a deadlock between cpufreq governor worker thread
and the CPU hotplug writer task.
Lockdep splat corresponding to this deadlock is shown below:
[ 60.277396] ======================================================
[ 60.277400] [ INFO: possible circular locking dependency detected ]
[ 60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
[ 60.277411] -------------------------------------------------------
[ 60.277417] bash/2225 is trying to acquire lock:
[ 60.277422] ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
[ 60.277444] but task is already holding lock:
[ 60.277449] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
[ 60.277465] which lock already depends on the new lock.
The intention of that commit was to avoid warnings during CPU
hotplug, which indicated that offline CPUs were getting IPIs from the
cpufreq governor's work items. But the real root-cause of that
problem was commit a66b2e5 (cpufreq: Preserve sysfs files across
suspend/resume) because it totally skipped all the cpufreq callbacks
during CPU hotplug in the suspend/resume path, and hence it never
actually shut down the cpufreq governor's worker threads during CPU
offline in the suspend/resume path.
Reflecting back, the reason why we never suspected that commit as the
root-cause earlier, was that the original issue was reported with
just the halt command and nobody had brought in suspend/resume to the
equation.
The reason for _that_ in turn, as it turns out, is that earlier
halt/shutdown was being done by disabling non-boot CPUs while tasks
were frozen, just like suspend/resume.... but commit cf7df378a
(reboot: migrate shutdown/reboot to boot cpu) which came somewhere
along that very same time changed that logic: shutdown/halt no longer
takes CPUs offline. Thus, the test-cases for reproducing the bug
were vastly different and thus we went totally off the trail.
Overall, it was one hell of a confusion with so many commits
affecting each other and also affecting the symptoms of the problems
in subtle ways. Finally, now since the original problematic commit
(a66b2e5) has been completely reverted, revert this intermediate fix
too (2f7021a8), to fix the CPU hotplug deadlock. Phew!
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Peter Wu <lekensteyn@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume)
has unfortunately caused several things in the cpufreq subsystem to
break subtly after a suspend/resume cycle.
The intention of that patch was to retain the file permissions of the
cpufreq related sysfs files across suspend/resume. To achieve that,
the commit completely removed the calls to cpufreq_add_dev() and
__cpufreq_remove_dev() during suspend/resume transitions. But the
problem is that those functions do 2 kinds of things:
1. Low-level initialization/tear-down that are critical to the
correct functioning of cpufreq-core.
2. Kobject and sysfs related initialization/teardown.
Ideally we should have reorganized the code to cleanly separate these
two responsibilities, and skipped only the sysfs related parts during
suspend/resume. Since we skipped the entire callbacks instead (which
also included some CPU and cpufreq-specific critical components),
cpufreq subsystem started behaving erratically after suspend/resume.
So revert the commit to fix the regression. We'll revisit and address
the original goal of that commit separately, since it involves quite a
bit of careful code reorganization and appears to be non-trivial.
(While reverting the commit, note that another commit f51e1eb
(cpufreq: Fix cpufreq regression after suspend/resume) already
reverted part of the original set of changes. So revert only the
remaining ones).
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In power_pmu_enable() we still enable the PMU even if we have zero
events. This should have no effect but doesn't make much sense. Instead
just return after telling the hypervisor that we are not using the PMCs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
Occurred) set. When we write that value back it will cause an interrupt
to occur. We will then end up in the PMU interrupt handler even though
we are supposed to have just disabled the PMU.
We can avoid this by making sure we never write PMAO back. We should not
lose interrupts because when the PMU is re-enabled the overflowed values
will cause another interrupt.
We also reorder the clearing of SAMPLE_ENABLE so that is done after the
PMU is frozen. Otherwise there is a small window between the clearing of
SAMPLE_ENABLE and the setting of FC where we could take an interrupt and
incorrectly see SAMPLE_ENABLE not set. This would for example change the
logic in perf_read_regs().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A mistake we have made in the past is that we pull out the fields we
need from the event code, but don't check that there are no unknown bits
set. This means that we can't ever assign meaning to those unknown bits
in future.
Although we have once again failed to do this at release, it is still
early days for Power8 so I think we can still slip this in and get away
with it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The topology update code that updates the cpu node registration in sysfs
should not be called while in stop_machine(). The register/unregister
calls take a lock and may sleep.
This patch moves these calls outside of the call to stop_machine().
the smp_release_cpus is a normal funciton and called in normal environments,
but it calls the __initdata spinning_secondaries.
need modify spinning_secondaries to match smp_release_cpus.
the related warning:
(the linker report boot_paca.33377, but it should be spinning_secondaries)
WARNING: arch/powerpc/kernel/built-in.o(.text+0x23176): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
The function .smp_release_cpus() references
the variable __initdata boot_paca.33377.
This is often because .smp_release_cpus lacks a __initdata
annotation or the annotation of boot_paca.33377 is wrong.
WARNING: arch/powerpc/kernel/built-in.o(.text+0x231fe): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
The function .smp_release_cpus() references
the variable __initdata boot_paca.33377.
This is often because .smp_release_cpus lacks a __initdata
annotation or the annotation of boot_paca.33377 is wrong.
KVMTEST is a macro which checks whether we are taking an exception from
guest context, if so we branch out of line and eventually call into the
KVM code to handle the switch.
When running real guests on bare metal (HV KVM) the hardware ensures
that we never take a relocation on exception when transitioning from
guest to host. For PR KVM we disable relocation on exceptions ourself in
kvmppc_core_init_vm(), as of commit a413f47 "Disable relocation on
exceptions whenever PR KVM is active".
So convert all the RELON macros to use NOTEST, and drop the remaining
KVM_HANDLER() definitions we have for 0xe40 and 0xe80.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have relocation on exception handlers defined for h_data_storage and
h_instr_storage. However we will never take relocation on exceptions for
these because they can only come from a guest, and we never take
relocation on exceptions when we transition from guest to host.
We also have a handler for hmi_exception (Hypervisor Maintenance) which
is defined in the architecture to never be delivered with relocation on,
see see v2.07 Book III-S section 6.5.
So remove the handlers, leaving a branch to self just to be double extra
paranoid.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we only restore signals which are transactionally suspended but it's
possible that the transaction can be restored even when it's active. Most
likely this will result in a transactional rollback by the hardware as the
transaction will have been doomed by an earlier treclaim.
The current code is a legacy of earlier kernel implementations which did
software rollback of active transactions in the kernel. That code has now gone
but we didn't correctly fix up this part of the signals code which still makes
assumptions based on having software rollback.
This changes the signal return code to always restore both contexts on 64 bit
signal return. It also ensures that the MSR TM bits are properly restored from
the signal context which they are not currently.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we only restore signals which are transactionally suspended but it's
possible that the transaction can be restored even when it's active. Most
likely this will result in a transactional rollback by the hardware as the
transaction will have been doomed by an earlier treclaim.
The current code is a legacy of earlier kernel implementations which did
software rollback of active transactions in the kernel. That code has now gone
but we didn't correctly fix up this part of the signals code which still makes
assumptions based on having software rollback.
This changes the signal return code to always restore both contexts on 32 bit
rt signal return.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we clear out the MSR TM bits on signal return assuming that the
signal should never return to an active transaction.
This is bogus as the user may do this. It's most likely the transaction will
be doomed due to a treclaim but that's a problem for the HW not the kernel.
The current code is a legacy of earlier kernel implementations which did
software rollback of active transactions in the kernel. That code has now gone
but we didn't correctly fix up this part of the signals code which still makes
the assumption that it must be returning to a suspended transaction.
This pulls out both MSR TM bits from the user supplied context rather than just
setting TM suspend. We pull out only the bits needed to ensure the user can't
do anything dangerous to the MSR.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently sys_sigreturn() is TM unaware. Therefore, if we take a 32 bit signal
without SIGINFO (non RT) inside a transaction, on signal return we don't
restore the signal frame correctly.
This checks if the signal frame being restoring is an active transaction, and
if so, it copies the additional state to ptregs so it can be restored.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The MSR TM controls are in the top 32 bits of the MSR hence on 32 bit signals,
we stick the top half of the MSR in the checkpointed signal context so that the
user can access it.
Unfortunately, we don't currently write anything to the checkpointed signal
context when coming in a from a non transactional process and hence the top MSR
bits can contain junk.
This updates the 32 bit signal handling code to always write something to the
top MSR bits so that users know if the process is transactional or not and the
kernel can use it on signal return.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
So because those things always end up in trainwrecks... In 7846de406
we moved back the iommu initialization earlier, essentially undoing 37f02195b which was causing us endless trouble... except that in the
meantime we had merged 959c9bdd58 (to workaround the original breakage)
which is now ... broken :-)
This fixes it by doing a partial revert of the latter (we keep the
ppc_md. path which will be needed in the hotplug case, which happens
also during some EEH error recovery situations).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Data Address Watchpoint Register (DAWR) on POWER8 can take a 512
byte range but this range must not cross a 512 byte boundary.
Unfortunately we were off by one when calculating the end of the region,
hence we were not allowing some breakpoint regions which were actually
valid. This fixes this error.
Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The smallest match region for both the DABR and DAWR is 8 bytes, so the
kernel needs to filter matches when users want to look at regions smaller than
this.
Currently we set the length of PPC_BREAKPOINT_MODE_EXACT breakpoints to 8.
This is wrong as in exact mode we should only match on 1 address, hence the
length should be 1.
This ensures that the kernel will filter out any exact mode hardware breakpoint
matches on any addresses other than the requested one.
Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When mounting an UBIFS R/W volume, we have the message:
UBIFS: mounted UBI device 0, volume 1, name "rootfs"(null)
With this patch, we'll have:
UBIFS: mounted UBI device 0, volume 1, name "rootfs"
Which is, I think, what was intended.
This is RH bug 970891
Uppercasing of username during calculation of ntlmv2 hash fails
because UniStrupr function does not handle big endian wchars.
Also fix a comment in the same code to reflect its correct usage.
[To make it easier for stable (rather than require 2nd patch) fixed
this patch of Shirish's to remove endian warning generated
by sparse -- steve f.]
Reported-by: steve <sanpatr1@in.ibm.com> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The logic for the memory-remove code fails to correctly account the
Total High Memory when a memory block which contains High Memory is
offlined as shown in the example below. The following patch fixes it.
mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem fails.
This is not correct because only memcg_propagate_kmem takes an
additional reference while mem_cgroup_sockets_init is allowed to fail as
well (although no current implementation fails) but it doesn't take any
reference. This all suggests that it should be memcg_propagate_kmem
that should clean up after itself so this patch moves mem_cgroup_put
over there.
Unfortunately this is not that easy (as pointed out by Li Zefan) because
memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it is
marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if
memcg_propagate_kmem fails so the additional reference is dropped in
that case in kmem_cgroup_destroy which means that the reference would be
dropped two times.
The easiest way then would be to simply remove mem_cgrroup_put from
mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right
thing.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Glauber Costa <glommer@openvz.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 83db0384 (mm/ARM: use common help functions to free reserved
pages) broke booting on the Assabet by trying to convert a PFN to
a virtual address using the __va() macro. This macro takes the
physical address, not a PFN. Fix this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Takanari Hayama <taki@igel.co.jp> Acked-by: Magnus Damm <damm@opensource.se>
[ horms+renesas@verge.net.au: Add information about commit and version
this bug was added in ] Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the local timer freq changed, the twd_update_frequency function
should be run all the CPUs include itself, otherwise, the twd freq will
not get updated and the local timer will not run correcttly.
smp_call_function will run functions on all other CPUs, but not include
himself, this is not correct,use on_each_cpu instead to fix this issue.
Acked-by: Linus Walleij <linus.walleij@linaro.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Jason Liu <r64343@freescale.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Looking into the active_asids array is not enough, as we also need
to look into the reserved_asids array (they both represent processes
that are currently running).
Also, not holding the ASID allocator lock is racy, as another CPU
could schedule that process and trigger a rollover, making the erratum
workaround miss an IPI.
Exposing this outside of context.c is a little ugly on the side, so
let's define a new entry point that the erratum workaround can call
to obtain the cpumask.
Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On a CPU that never ran anything, both the active and reserved ASID
fields are set to zero. In this case the ASID_TO_IDX() macro will
return -1, which is not a very useful value to index a bitmap.
Instead of trying to offset the ASID so that ASID #1 is actually
bit 0 in the asid_map bitmap, just always ignore bit 0 and start
the search from bit 1. This makes the code a bit more readable,
and without risk of OoB access.
Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a CPU is running a process, the ASID for that process is
held in a per-CPU variable (the "active ASIDs" array). When
the ASID allocator handles a rollover, it copies the active
ASIDs into a "reserved ASIDs" array to ensure that a process
currently running on another CPU will continue to run unaffected.
The active array is zero-ed to indicate that a rollover occurred.
Because of this mechanism, a reserved ASID is only remembered for
a single rollover. A subsequent rollover will completely refill
the reserved ASIDs array.
In a severely oversubscribed environment where a CPU can be
prevented from running for extended periods of time (think virtual
machines), the above has a horrible side effect:
[P{a} denotes process P running with ASID a]
CPU-0 CPU-1
A{x} [active = <x 0>]
[suspended] runs B{y} [active = <x y>]
[rollover:
active = <0 0>
reserved = <x y>]
runs B{y} [active = <0 y>
reserved = <x y>]
[rollover:
active = <0 0>
reserved = <0 y>]
runs C{x} [active = <0 x>]
[resumes]
runs A{x}
At that stage, both A and C have the same ASID, with deadly
consequences.
The fix is to preserve reserved ASIDs across rollovers if
the CPU doesn't have an active ASID when the rollover occurs.
Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Carinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With this change, we no longer lose the innermost entry in the user-mode
part of the call chain. See also the x86 port, which includes the ip.
It's possible to partially work around this problem by post-processing
the data to use the PERF_SAMPLE_IP value, but this works only if the CPU
wasn't in the kernel when the sample was taken.
Signed-off-by: Jed Davis <jld@mozilla.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The filesystem should not be marked inconsistent if ext4_free_blocks()
is not able to allocate memory. Unfortunately some callers (most
notably ext4_truncate) don't have a way to reflect an error back up to
the VFS. And even if we did, most userspace applications won't deal
with most system calls returning ENOMEM anyway.
We now print mount options in a generic fashion in
ext4_show_options(), so we shouldn't be explicitly printing the
{usr,grp}quota options in ext4_show_quota_options().
Without this patch, /proc/mounts can look like this:
The function ext4_get_group_number() was introduced as an optimization
in commit bd86298e60b8. Unfortunately, this commit incorrectly
calculate the group number for file systems with a 1k block size (when
s_first_data_block is 1 instead of zero). This could cause the
following kernel BUG:
The arithmetics adding delalloc blocks to the number of used blocks in
ext4_getattr() can easily overflow on 32-bit archs as we first multiply
number of blocks by blocksize and then divide back by 512. Make the
arithmetics more clever and also use proper type (unsigned long long
instead of unsigned long).
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 32-bit architectures with 32-bit sector_t computation of data offset
in ext4_xattr_fiemap() can overflow resulting in reporting bogus data
location. Fix the problem by typing block number to proper type before
shifting.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ext4_lblk_t is just u32 so multiplying it by blocksize can easily
overflow for files larger than 4 GB. Fix that by properly typing the
block offsets before shifting.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On 32-bit archs when sector_t is defined as 32-bit the logic computing
data offset in ext4_inline_data_fiemap(). Fix that by properly typing
the shifted value.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is another bug in the tree mod log stuff in that we're calling
tree_mod_log_free_eb every single time a block is cow'ed. The problem with this
is that if this block is shared by multiple snapshots we will call this multiple
times per block, so if we go to rewind the mod log for this block we'll BUG_ON()
in __tree_mod_log_rewind because we try to rewind a free twice. We only want to
call tree_mod_log_free_eb if we are actually freeing the block. With this patch
I no longer hit the panic in __tree_mod_log_rewind. Thanks,
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk
forward in the tree mod entries, otherwise we'll end up with random entries and
trip the BUG_ON() at the front of __tree_mod_log_rewind. This fixes the panics
people were seeing when running
find /whatever -type f -exec btrfs fi defrag {} \;
Thansk,
Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes bugzilla 57491. If we take a snapshot of a fs with a unlink ongoing
and then try to send that root we will run into problems. When comparing with a
parent root we will search the parents and the send roots commit_root, which if
we've just created the snapshot will include the file that needs to be evicted
by the orphan cleanup. So when we find a changed extent we will try and copy
that info into the send stream, but when we lookup the inode we use the normal
root, which no longer has the inode because the orphan cleanup deleted it. The
best solution I have for this is to check our otransid with the generation of
the commit root and if they match just commit the transaction again, that way we
get the changes from the orphan cleanup. With this patch the reproducer I made
for this bugzilla no longer returns ESTALE when trying to do the send. Thanks,
Reported-by: Chris Wilson <jakdaw@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Direct compare of jiffies related values does not work in the wrap
around case. Replace it with time_is_after_jiffies().
Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/519BC066.5080600@acm.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ata_link_online() check in ahci_error_intr() is unnecessary, it should
be removed otherwise may lead to lockup with FBS enabled PMP.
http://marc.info/?l=linux-ide&m=137050421603272&w=2
When the queue is unmapped while it was so loaded that
mac80211's was stopped, we need to wake the queue after
having freed all the packets in the queue.
Not doing so can result in weird stuff like:
* run lots of traffic (mac80211's queue gets stopped)
* RFKILL
* de-assert RFKILL
* no traffic
When a queue is disabled, it frees all its entries. Later,
the op_mode might still get notifications from the firmware
that triggers to free entries in the tx queue. The transport
should be prepared for these races and know to ignore
reclaim calls on queues that have been disabled and whose
entries have been freed.
Commit 4f535093cf "PCI: Put pci_dev in device tree as early as possible"
moves device registering from pci_bus_add_devices() to pci_device_add().
That causes problems for virtual functions because device_add(&virtfn->dev)
is called before setting the virtfn->is_virtfn flag, which then causes Xen
to report PCI virtual functions as PCI physical functions.
Fix it by setting virtfn->is_virtfn before calling pci_device_add().
[Jiang Liu]: Move the setting of virtfn->is_virtfn ahead further for better
readability and modify changelog.
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes). This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.
This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect. This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Acked-by: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Inlined xattr shared free space of inode block with inlined data or data
extent record, so the size of the later two should be adjusted when
inlined xattr is enabled. See ocfs2_xattr_ibody_init(). But this isn't
done well when reflink. For inode with inlined data, its max inlined
data size is adjusted in ocfs2_duplicate_inline_data(), no problem. But
for inode with data extent record, its record count isn't adjusted. Fix
it, or data extent record and inlined xattr may overwrite each other,
then cause data corruption or xattr failure.
One panic caused by this bug in our test environment is the following:
arch/c6x/mm/init.c: In function `paging_init':
arch/c6x/mm/init.c:46:2: error: implicit declaration of function `set_fs' [-Werror=implicit-function-declaration]
arch/c6x/mm/init.c:46:9: error: `KERNEL_DS' undeclared (first use in this function)
arch/c6x/mm/init.c:46:9: note: each undeclared identifier is reported only once for each function it appears in
Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The June 2013 Macbook Air (13'') has a new trackpad protocol; four new
values are inserted in the header, and the mode switch is no longer
needed. This patch adds support for the new devices.
Reported-and-tested-by: Brad Ford <plymouthffl@gmail.com> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch adds keyboard support for MacbookAir6,2 as WELLSPRING8
(0x0291, 0x0292, 0x0293). The touchpad is handled in a separate
bcm5974 patch, as usual.
Reported-and-tested-by: Brad Ford <plymouthffl@gmail.com> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The role of acpi_bus_update_power() is to update the given ACPI
device object's power.state field to reflect the current physical
state of the device (as inferred from the configuration of power
resources and _PSC, if available). For this purpose it calls
acpi_device_set_power() that should update the power resources'
reference counters and set power.state as appropriate. However,
that doesn't work if the "new" state is D1, D2 or D3hot and the
the current value of power.state means D3cold, because in that
case acpi_device_set_power() will refuse to transition the device
from D3cold to non-D0.
To address this problem, make acpi_bus_update_power() call
acpi_power_transition() directly to update the power resources'
reference counters and only use acpi_device_set_power() to put
the device into D0 if the current physical state of it cannot
be determined.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previous implementation incorrectly used the ACPI 5.0 extended
sleep registers if they were simply populated. This caused
problems on some non-HW-reduced machines. As per the ACPI spec,
they should only be used if the HW-reduced bit is set. Lv Zheng,
ACPICA BZ 1020.
Reported-by: Daniel Rowe <bart@fathom13.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=54181
References: https://bugs.acpica.org/show_bug.cgi?id=1020 Bisected-by: Brint E. Kriebel <kernel@bekit.net> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
HP Folio 13's BIOS defines CMOS RTC Operation Region and the EC's
_REG method will access that region. To allow the CMOS RTC region
handler to be installed before the EC _REG method is first invoked,
add ec_skip_dsdt_scan() as HP Folio 13's callback to ec_dmi_table.
References: https://bugzilla.kernel.org/show_bug.cgi?id=54621 Reported-and-tested-by: Stefan Nagy <public@stefan-nagy.at> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On HP Folio 13-2000, the BIOS defines a CMOS RTC Operation Region and
the EC's _REG methord accesses that region. Thus an appropriate
address space handler must be registered for that region before the
EC driver is loaded.
Introduce a mechanism for adding CMOS RTC address space handlers.
Register an ACPI scan handler for CMOS RTC devices such that, when
a device of that kind is detected during an ACPI namespace scan, a
common CMOS RTC operation region address space handler will be
installed for it.
References: https://bugzilla.kernel.org/show_bug.cgi?id=54621 Reported-and-tested-by: Stefan Nagy <public@stefan-nagy.at> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 02725e7471b8 ('genirq: Use irq_get/put functions'),
inadvertently changed can_request_irq() to return 0 for IRQs that have
no action. This causes pcibios_lookup_irq() to select only IRQs that
already have an action with IRQF_SHARED set, or to fail if there are
none. Change can_request_irq() to return 1 for IRQs that have no
action (if the first two conditions are met).
There are two tool-stack that can instruct the Xen PCI frontend
and backend to change states: 'xm' (Python code with a daemon),
and 'xl' (C library - does not keep state changes).
With the 'xm', the path to disconnect a single PCI device (xm pci-detach
<guest> <BDF>) is:
[xen-pcifront ignores the 2,3 state changes and only acts when
4 (Connected) has been reached]
Note that this is for a _single_ PCI device. If there were two
PCI devices and only one was disconnected 'xm' would show the same
state changes.
The problem is that git commit 3d925320e9e2de162bd138bf97816bda8c3f71be
("xen/pcifront: Use Xen-SWIOTLB when initting if required") introduced
a mechanism to initialize the SWIOTLB when the Xen PCI front moves to
Connected state. It also had some aggressive seatbelt code check that
would warn the user if one tried to change to Connected state without
hitting first the Closing state:
pcifront pci-0: PCI frontend already installed!
However, that code can be relaxed and we can continue on working
even if the frontend is instructed to be the 'Connected' state with
no devices and then gets tickled to be in 'Connected' state again.
In other words, this 4(Connected)->5(Closing)->4(Connected) state
was expected, while 4(Connected)->.... anything but 5(Closing)->4(Connected)
was not. This patch removes that aggressive check and allows
Xen pcifront to work with the 'xl' toolstack (for one or more
PCI devices) and with 'xm' toolstack (for more than two PCI
devices).
Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org
[v2: Added in the description about two PCI devices] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
... because the "clock_event_device framework" already accounts for idle
time through the "event_handler" function pointer in
xen_timer_interrupt().
The patch is intended as the completion of [1]. It should fix the double
idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
stolen time accounting (the removed code seems to be isolated).
John took the time to retest this patch on top of v3.10 and reported:
"idle time is correctly incremented for pv and hvm for the normal
case, nohz=off and nohz=idle." so lets put this patch in.
Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: John Haxby <john.haxby@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ExitBootServices is absolutely supposed to return a failure if any
ExitBootServices event handler changes the memory map. Basically the
get_map loop should run again if ExitBootServices returns an error the
first time. I would say it would be fair that if ExitBootServices gives
an error the second time then Linux would be fine in returning control
back to BIOS.
The second change is the following line:
again:
size += sizeof(*mem_map) * 2;
Originally you were incrementing it by the size of one memory map entry.
The issue here is all related to the low_alloc routine you are using.
In this routine you are making allocations to get the memory map itself.
Doing this allocation or allocations can affect the memory map by more
than one record.
The LMMIO length reported by PAT and the length given by the LBA MASK
register are not consistent. This leads e.g. to a not-working ATI FireGL
card with the radeon DRM driver since the memory can't be mapped.
Fix this by correctly adjusting the resource sizes.
I still see the occasional random segv on rp3440. Looking at one of
these (a code 15), it appeared the problem must be with the cache
handling of anonymous pages. Reviewing this, I noticed that the space
register %sr1 might be being clobbered when we flush an anonymous page.
Register %sr1 is used for TLB purges in a couple of places. These
purges are needed on PA8800 and PA8900 processors to ensure cache
consistency of flushed cache lines.
The solution here is simply to move the %sr1 load into the TLB lock
region needed to ensure that one purge executes at a time on SMP
systems. This was already the case for one use. After a few days of
operation, I haven't had a random segv on my rp3440.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some architectures (e.g. powerpc built with CONFIG_PPC_256K_PAGES=y
CONFIG_FORCE_MAX_ZONEORDER=11) get PAGE_SHIFT + MAX_ORDER > 26.
In 3.10 kernels, CONFIG_LOCKDEP=y with PAGE_SHIFT + MAX_ORDER > 26 makes
init_lock_keys() dereference beyond kmalloc_caches[26].
This leads to an unbootable system (kernel panic at initializing SLAB)
if one of kmalloc_caches[26...PAGE_SHIFT+MAX_ORDER-1] is not NULL.
Fix this by making sure that init_lock_keys() does not dereference beyond
kmalloc_caches[26] arrays.
Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Tetsuo Handa <penguin-kernel@I-Love.SAKURA.ne.jp> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reason for this crash is a gcc miscompilation in the fault handlers of
pa_memcpy() which return the fault address instead of the copied bytes.
Since this seems to be a generic problem with gcc-4.7.x (and below), it's
better to simplify the fault handlers in pa_memcpy to avoid this problem.
In addition, John David Anglin wrote:
There is no gcc PR as pa_memcpy is not legitimate C code. There is an
implicit assumption that certain variables will contain correct values
when an exception occurs and the code randomly jumps to one of the
exception blocks. There is no guarantee of this. If a PR was filed, it
would likely be marked as invalid.
Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
task->cgroups is a RCU pointer pointing to struct css_set. A task
switches to a different css_set on cgroup migration but a css_set
doesn't change once created and its pointers to cgroup_subsys_states
aren't RCU protected.
task_subsys_state[_check]() is the macro to acquire css given a task
and subsys_id pair. It RCU-dereferences task->cgroups->subsys[] not
task->cgroups, so the RCU pointer task->cgroups ends up being
dereferenced without read_barrier_depends() after it. It's broken.
Fix it by introducing task_css_set[_check]() which does
RCU-dereference on task->cgroups. task_subsys_state[_check]() is
reimplemented to directly dereference ->subsys[] of the css_set
returned from task_css_set[_check]().
This removes some of sparse RCU warnings in cgroup.
v2: Fixed unbalanced parenthsis and there's no need to use
rcu_dereference_raw() when !CONFIG_PROVE_RCU. Both spotted by Li.