At the moment the boot CPU stores the cpuinfo long before the
PERCPU areas are initialised by the kernel. This could be problematic
as the non-boot CPU data structures might get copied with the data
from the boot CPU, giving us no chance to detect if a particular CPU
updated its cpuinfo. This patch delays the boot cpu store to
smp_prepare_boot_cpu().
Also kills the setup_processor() which no longer does meaningful
work.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64: Delay ELF HWCAP initialisation until all CPUs are up
Delay the ELF HWCAP initialisation until all the (enabled) CPUs are
up, i.e, smp_cpus_done(). This is in preparation for detecting the
common features across the CPUS and creating a consistent ELF HWCAP
for the system.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
At early boot, we print the CPU version/revision. On a heterogeneous
system, we could have different types of CPUs. Print the CPU info for
all active cpus. Also, the secondary CPUs prints the message only when
they turn online.
Also, remove the redundant 'revision' information which doesn't
make any sense without the 'variant' field.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Tue, 20 Oct 2015 13:59:20 +0000 (14:59 +0100)]
arm64: Make 36-bit VA depend on EXPERT
Commit 215399392fe4 (arm64: 36 bit VA) introduced 36-bit VA support for
the arm64 kernel when the 16KB page configuration is enabled. While this
is a valid hardware configuration, it's not something we want to
encourage since it reduces the memory (and I/O) range that the kernel
can access. Make this depend on EXPERT to avoid complaints of Linux not
mapping the whole RAM, especially on platforms following the ARM
recommended memory map.
Jungseok Lee [Sat, 17 Oct 2015 14:28:11 +0000 (14:28 +0000)]
arm64: Synchronise dump_backtrace() with perf callchain
Unlike perf callchain relying on walk_stackframe(), dump_backtrace()
has its own backtrace logic. A major difference between them is the
moment a symbol is recorded. Perf writes down a symbol *before*
calling unwind_frame(), but dump_backtrace() prints it out *after*
unwind_frame(). As a result, the last valid symbol cannot be hooked
in case of dump_backtrace(). This patch addresses the issue as
synchronising dump_backtrace() with perf callchain.
Note that this patch does not cover a case where MMU is disabled. The
last stack frame of swapper, for example, has PC in a form of physical
address. Unfortunately, a simple conversion using phys_to_virt() cannot
cover all scenarios since PC is retrieved from LR - 4, not LR. It is
a big tradeoff to change both head.S and unwind_frame() for only a few
of symbols in *.S. Thus, this hunk does not take care of the case.
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: James Morse <james.morse@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, if cpuidle is disabled or not supported, powertop reports
zero wakeups and zero events. This is due to the cpu_idle tracepoints
are missing.
This patch is to make cpu_idle tracepoints always available even if
cpuidle is disabled or not supported.
This patch turns on the 16K page support in the kernel. We
support 48bit VA (4 level page tables) and 47bit VA (3 level
page tables).
With 16K we can map 128 entries using contiguous bit hint
at level 3 to map 2M using single TLB entry.
TODO: 16K supports 32 contiguous entries at level 2 to get us
1G(which is not yet supported by the infrastructure). That should
be a separate patch altogether.
Cc: Will Deacon <will.deacon@arm.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel [Mon, 19 Oct 2015 13:19:36 +0000 (14:19 +0100)]
arm64: Add page size to the kernel image header
This patch adds the page size to the arm64 kernel image header
so that one can infer the PAGESIZE used by the kernel. This will
be helpful to diagnose failures to boot the kernel with page size
not supported by the CPU.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We use !CONFIG_ARM64_64K_PAGES for CONFIG_ARM64_4K_PAGES
(and vice versa) in code. It all worked well, so far since
we only had two options. Now, with the introduction of 16K,
these cases will break. This patch cleans up the code to
use the required CONFIG symbol expression without the assumption
that !64K => 4K (and vice versa)
Cc: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
At the moment, we only support maximum of 3-level page table for
swapper. With 48bit VA, 64K has only 3 levels and 4K uses section
mapping. Add support for 4-level page table for swapper, needed
by 16K pages.
Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64: Calculate size for idmap_pg_dir at compile time
Now that we can calculate the number of levels required for
mapping a va width, reserve exact number of pages that would
be required to cover the idmap. The idmap should be able to handle
the maximum physical address size supported.
Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We use section maps with 4K page size to create the swapper/idmaps.
So far we have used !64K or 4K checks to handle the case where we
use the section maps.
This patch adds a new symbol, ARM64_SWAPPER_USES_SECTION_MAPS, to
handle cases where we use section maps, instead of using the page size
symbols.
Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Salyzyn [Tue, 13 Oct 2015 21:30:51 +0000 (14:30 -0700)]
arm64: AArch32 user space PC alignment exception
ARMv7 does not have a PC alignment exception. ARMv8 AArch32
user space however can produce a PC alignment exception. Add
handler so that we do not dump an unexpected stack trace in
the logs.
Signed-off-by: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
yalin wang [Mon, 12 Oct 2015 06:52:59 +0000 (14:52 +0800)]
arm64: add kc_offset_to_vaddr and kc_vaddr_to_offset macro
This patch add kc_offset_to_vaddr() and kc_vaddr_to_offset(),
the default version doesn't work on arm64, because arm64 kernel address
is below the PAGE_OFFSET, like module address and vmemmap address are
all below PAGE_OFFSET address.
Signed-off-by: yalin wang <yalin.wang2010@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
yalin wang [Mon, 12 Oct 2015 02:28:18 +0000 (10:28 +0800)]
arm64: ioremap: add ioremap_cache macro
Add ioremap_cache macro, because some code will test if this macro
is defined or not, and will generate a generric version if not defined,
for example, memremap.c do like this.
Signed-off-by: yalin wang <yalin.wang2010@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Tue, 13 Oct 2015 13:01:06 +0000 (14:01 +0100)]
arm64: kasan: fix issues reported by sparse
Sparse reports some new issues introduced by the kasan patches:
arch/arm64/mm/kasan_init.c:91:13: warning: no previous prototype for
'kasan_early_init' [-Wmissing-prototypes] void __init kasan_early_init(void)
^
arch/arm64/mm/kasan_init.c:91:13: warning: symbol 'kasan_early_init'
was not declared. Should it be static? [sparse]
This patch resolves the problem by adding a prototype for
kasan_early_init and marking the function as asmlinkage, since it's only
called from head.S.
Andrey Ryabinin [Mon, 12 Oct 2015 15:52:58 +0000 (18:52 +0300)]
arm64: add KASAN support
This patch adds arch specific code for kernel address sanitizer
(see Documentation/kasan.txt).
1/8 of kernel addresses reserved for shadow memory. There was no
big enough hole for this, so virtual addresses for shadow were
stolen from vmalloc area.
At early boot stage the whole shadow region populated with just
one physical page (kasan_zero_page). Later, this page reused
as readonly zero shadow for some memory that KASan currently
don't track (vmalloc).
After mapping the physical memory, pages for shadow memory are
allocated and mapped.
Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.
KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.
Some files built without kasan instrumentation (e.g. mm/slub.c).
Original mem* function replaced (via #define) with prefixed variants
to disable memory access checks for such files.
Commit 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}()
variants of some atomic operation") introduced a relaxed atomic API to
Linux that maps nicely onto the arm64 memory model, including the new
ARMv8.1 atomic instructions.
This patch hooks up the API to our relaxed atomic instructions, rather
than have them all expand to the full-barrier variants as they do
currently.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel [Thu, 8 Oct 2015 19:02:04 +0000 (20:02 +0100)]
arm64/efi: isolate EFI stub from the kernel proper
Since arm64 does not use a builtin decompressor, the EFI stub is built
into the kernel proper. So far, this has been working fine, but actually,
since the stub is in fact a PE/COFF relocatable binary that is executed
at an unknown offset in the 1:1 mapping provided by the UEFI firmware, we
should not be seamlessly sharing code with the kernel proper, which is a
position dependent executable linked at a high virtual offset.
So instead, separate the contents of libstub and its dependencies, by
putting them into their own namespace by prefixing all of its symbols
with __efistub. This way, we have tight control over what parts of the
kernel proper are referenced by the stub.
Ard Biesheuvel [Thu, 8 Oct 2015 19:02:03 +0000 (20:02 +0100)]
arm64: use ENDPIPROC() to annotate position independent assembler routines
For more control over which functions are called with the MMU off or
with the UEFI 1:1 mapping active, annotate some assembler routines as
position independent. This is done by introducing ENDPIPROC(), which
replaces the ENDPROC() declaration of those routines.
With the stub to kernel interface being promoted to a proper interface
so that other agents than the stub can boot the kernel proper in EFI
mode, we can remove the linux,uefi-stub-kern-ver field, considering
that its original purpose was to prevent this from happening in the
first place.
Catalin Marinas [Mon, 12 Oct 2015 11:10:53 +0000 (12:10 +0100)]
arm64: Fix missing #include in hw_breakpoint.c
A prior commit used to detect the hw breakpoint ABI behaviour based on
the target state missed the asm/compat.h include and the build fails
with !CONFIG_COMPAT.
Fixes: 8f48c0629049 ("arm64: hw_breakpoint: use target state to determine ABI behaviour") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Yang Yingliang [Thu, 24 Sep 2015 09:32:14 +0000 (17:32 +0800)]
arm64: fix a migrating irq bug when hotplug cpu
When cpu is disabled, all irqs will be migratged to another cpu.
In some cases, a new affinity is different, the old affinity need
to be updated and if irq_set_affinity's return value is IRQ_SET_MASK_OK_DONE,
the old affinity can not be updated. Fix it by using irq_do_set_affinity.
And migrating interrupts is a core code matter, so use the generic
function irq_migrate_all_off_this_cpu() to migrate interrupts in
kernel/irq/migration.c.
Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Hanjun Guo <hanjun.guo@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jeremy Linton [Wed, 7 Oct 2015 17:00:25 +0000 (12:00 -0500)]
arm64: Mark kernel page ranges contiguous
With 64k pages, the next larger segment size is 512M. The linux
kernel also uses different protection flags to cover its code and data.
Because of this requirement, the vast majority of the kernel code and
data structures end up being mapped with 64k pages instead of the larger
pages common with a 4k page kernel.
Recent ARM processors support a contiguous bit in the
page tables which allows the a TLB to cover a range larger than a
single PTE if that range is mapped into physically contiguous
ram.
So, for the kernel its a good idea to set this flag. Some basic
micro benchmarks show it can significantly reduce the number of
L1 dTLB refills.
Add boot option to enable/disable CONT marking, as well as fix a
bug found by Steve Capper.
Jeremy Linton [Wed, 7 Oct 2015 17:00:22 +0000 (12:00 -0500)]
arm64: Default kernel pages should be contiguous
The default page attributes for a PMD being broken should have the CONT bit
set. Create a new definition for an early boot range of PTE's that are
contiguous.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Rutland [Fri, 2 Oct 2015 09:55:07 +0000 (10:55 +0100)]
MAINTAINERS: update ARM PMU profiling and debugging for arm64
Will Deacon maintains the profiling and debugging code under both
arch/arm and arch/arm64. Update MAINTAINERS to reflect this, in
preparation for adding myself as a reviewer of said code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Rutland [Fri, 2 Oct 2015 09:55:03 +0000 (10:55 +0100)]
arm64: perf: move to shared arm_pmu framework
Now that the arm_pmu framework has been factored out to drivers/perf we
can make use of it for arm64, gaining support for heterogeneous PMUs
and unifying the two codebases before they diverge further.
The as yet unused PMU name for PMUv3 is changed to armv8_pmuv3, matching
the style previously applied to the 32-bit PMUs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Wed, 7 Oct 2015 10:37:36 +0000 (11:37 +0100)]
arm64: hw_breakpoint: use target state to determine ABI behaviour
The arm64 hw_breakpoint interface is slightly less flexible than its
32-bit counterpart, thanks to some changes in the architecture rendering
unaligned watchpoint addresses obselete for AArch64.
However, in a multi-arch environment (i.e. debugging a 32-bit target
with a 64-bit GDB under a 64-bit kernel), we need to provide a feature
compatible interface to GDB in order for debugging to function correctly.
This patch adds a new helper, is_compat_bp, to our hw_breakpoint
implementation which changes the interface behaviour based on the
architecture of the debug target as opposed to the debugger itself.
This allows debugged to function as expected for multi-arch
configurations without relying on deprecated architectural behaviours
when debugging native applications.
Cc: Yao Qi <yao.qi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Tue, 6 Oct 2015 17:46:30 +0000 (18:46 +0100)]
arm64: mm: remove dsb from update_mmu_cache
update_mmu_cache() consists of a dsb(ishst) instruction so that new user
mappings are guaranteed to be visible to the page table walker on
exception return.
In reality this can be a very expensive operation which is rarely needed.
Removing this barrier shows a modest improvement in hackbench scores and
, in the worst case, we re-take the user fault and establish that there
was nothing to do.
Will Deacon [Tue, 6 Oct 2015 17:46:29 +0000 (18:46 +0100)]
arm64: tlb: remove redundant barrier from __flush_tlb_pgtable
__flush_tlb_pgtable is used to invalidate intermediate page table
entries after they have been cleared and are about to be freed. Since
pXd_clear imply memory barriers, we don't need the extra one here.
Will Deacon [Tue, 6 Oct 2015 17:46:27 +0000 (18:46 +0100)]
arm64: switch_mm: simplify mm and CPU checks
switch_mm performs some checks to try and avoid entering the ASID
allocator:
(1) If we're switching to the init_mm (no user mappings), then simply
set a reserved TTBR0 value with no page table (the zero page)
(2) If prev == next *and* the mm_cpumask indicates that we've run on
this CPU before, then we can skip the allocator.
However, there is plenty of redundancy here. With the new ASID allocator,
if prev == next, then we know that our ASID is valid and do not need to
worry about re-allocation. Consequently, we can drop the mm_cpumask check
in (2) and move the prev == next check before the init_mm check, since
if prev == next == init_mm then there's nothing to do.
Will Deacon [Tue, 6 Oct 2015 17:46:26 +0000 (18:46 +0100)]
arm64: tlbflush: avoid flushing when fullmm == 1
The TLB gather code sets fullmm=1 when tearing down the entire address
space for an mm_struct on exit or execve. Given that the ASID allocator
will never re-allocate a dirty ASID, this flushing is not needed and can
simply be avoided in the flushing code.
Will Deacon [Tue, 6 Oct 2015 17:46:24 +0000 (18:46 +0100)]
arm64: mm: rewrite ASID allocator and MM context-switching code
Our current switch_mm implementation suffers from a number of problems:
(1) The ASID allocator relies on IPIs to synchronise the CPUs on a
rollover event
(2) Because of (1), we cannot allocate ASIDs with interrupts disabled
and therefore make use of a TIF_SWITCH_MM flag to postpone the
actual switch to finish_arch_post_lock_switch
(3) We run context switch with a reserved (invalid) TTBR0 value, even
though the ASID and pgd are updated atomically
(4) We take a global spinlock (cpu_asid_lock) during context-switch
(5) We use h/w broadcast TLB operations when they are not required
(e.g. in flush_context)
This patch addresses these problems by rewriting the ASID algorithm to
match the bitmap-based arch/arm/ implementation more closely. This in
turn allows us to remove much of the complications surrounding switch_mm,
including the ugly thread flag.
Will Deacon [Tue, 6 Oct 2015 17:46:23 +0000 (18:46 +0100)]
arm64: flush: use local TLB and I-cache invalidation
There are a number of places where a single CPU is running with a
private page-table and we need to perform maintenance on the TLB and
I-cache in order to ensure correctness, but do not require the operation
to be broadcast to other CPUs.
This patch adds local variants of tlb_flush_all and __flush_icache_all
to support these use-cases and updates the callers respectively.
__local_flush_icache_all also implies an isb, since it is intended to be
used synchronously.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Daney <david.daney@cavium.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Tue, 6 Oct 2015 17:46:22 +0000 (18:46 +0100)]
arm64: proc: de-scope TLBI operation during cold boot
When cold-booting a CPU, we must invalidate any junk entries from the
local TLB prior to enabling the MMU. This doesn't require broadcasting
within the inner-shareable domain, so de-scope the operation to apply
only to the local CPU.
Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Feng Kan [Wed, 23 Sep 2015 18:55:39 +0000 (11:55 -0700)]
arm64: copy_to-from-in_user optimization using copy template
This patch optimize copy_to-from-in_user for arm 64bit architecture. The
copy template is used as template file for all the copy*.S files. Minor
change was made to it to accommodate the copy to/from/in user files.
Feng Kan [Wed, 23 Sep 2015 18:55:38 +0000 (11:55 -0700)]
arm64: Change memcpy in kernel to use the copy template file
This converts the memcpy.S to use the copy template file. The copy
template file was based originally on the memcpy.S
Signed-off-by: Feng Kan <fkan@apm.com> Signed-off-by: Balamurugan Shanmugam <bshanmugam@apm.com>
[catalin.marinas@arm.com: removed tmp3(w) .req statements as they are not used] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Sun, 4 Oct 2015 15:31:13 +0000 (16:31 +0100)]
Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.
Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.
The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.
strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result. To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.
strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string. Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated. It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.
strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG. It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.
So why did I waffle about this for so long?
Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.
So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches. Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: use global strscpy() rather than private copy
string: provide strscpy()
Make asm/word-at-a-time.h available on all architectures
Linus Torvalds [Sun, 4 Oct 2015 10:47:28 +0000 (11:47 +0100)]
Merge tag 'md/4.3-fixes' of git://neil.brown.name/md
Pull md fixes from Neil Brown:
"Assorted fixes for md in 4.3-rc.
Two tagged for -stable, and one is really a cleanup to match and
improve kmemcache interface.
* tag 'md/4.3-fixes' of git://neil.brown.name/md:
md/bitmap: don't pass -1 to bitmap_storage_alloc.
md/raid1: Avoid raid1 resync getting stuck
md: drop null test before destroy functions
md: clear CHANGE_PENDING in readonly array
md/raid0: apply base queue limits *before* disk_stack_limits
md/raid5: don't index beyond end of array in need_this_block().
raid5: update analysis state for failed stripe
md: wait for pending superblock updates before switching to read-only
Linus Torvalds [Sun, 4 Oct 2015 10:41:58 +0000 (11:41 +0100)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"This week's round of MIPS fixes:
- Fix JZ4740 build
- Fix fallback to GFP_DMA
- FP seccomp in case of ENOSYS
- Fix bootmem panic
- A number of FP and CPS fixes
- Wire up new syscalls
- Make sure BPF assembler objects can properly be disassembled
- Fix BPF assembler code for MIPS I"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: scall: Always run the seccomp syscall filters
MIPS: Octeon: Fix kernel panic on startup from memory corruption
MIPS: Fix R2300 FP context switch handling
MIPS: Fix octeon FP context switch handling
MIPS: BPF: Fix load delay slots.
MIPS: BPF: Do all exports of symbols with FEXPORT().
MIPS: Fix the build on jz4740 after removing the custom gpio.h
MIPS: CPS: #ifdef on CONFIG_MIPS_MT_SMP rather than CONFIG_MIPS_MT
MIPS: CPS: Don't include MT code in non-MT kernels.
MIPS: CPS: Stop dangling delay slot from has_mt.
MIPS: dma-default: Fix 32-bit fall back to GFP_DMA
MIPS: Wire up userfaultfd and membarrier syscalls.
Linus Torvalds [Sun, 4 Oct 2015 10:40:09 +0000 (11:40 +0100)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"This update contains:
- Fix for a long standing race affecting /proc/irq/NNN
- One line fix for ARM GICV3-ITS counting the wrong data
- Warning silencing in ARM GICV3-ITS. Another GCC trying to be
overly clever issue"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Count additional LPIs for the aliased devices
irqchip/gic-v3-its: Silence warning when its_lpi_alloc_chunks gets inlined
genirq: Fix race in register_irq_proc()
MIPS: scall: Always run the seccomp syscall filters
The MIPS syscall handler code used to return -ENOSYS on invalid
syscalls. Whilst this is expected, it caused problems for seccomp
filters because the said filters never had the change to run since
the code returned -ENOSYS before triggering them. This caused
problems on the chromium testsuite for filters looking for invalid
syscalls. This has now changed and the seccomp filters are always
run even if the syscall is invalid. We return -ENOSYS once we
return from the seccomp filters. Moreover, similar codepaths have
been merged in the process which simplifies somewhat the overall
syscall code.
Linus Torvalds [Sat, 3 Oct 2015 14:53:05 +0000 (10:53 -0400)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two
build failure fixes for corner case configs, x32 header fix and a
speling fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
x86/mm: Set NX on gap between __ex_table and rodata
x86/kexec: Fix kexec crash in syscall kexec_file_load()
x86/process: Unify 32bit and 64bit implementations of get_wchan()
x86/process: Add proper bound checks in 64bit get_wchan()
x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case
x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
Linus Torvalds [Sat, 3 Oct 2015 14:46:41 +0000 (10:46 -0400)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Two EFI fixes: one for x86, one for ARM, fixing a boot crash bug that
can trigger under newer EFI firmware"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions
x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down
Linus Torvalds [Sat, 3 Oct 2015 14:39:31 +0000 (10:39 -0400)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Bunch of fixes all over the place, all pretty small: amdgpu, i915,
exynos, one qxl and one vmwgfx.
There is also a bunch of mst fixes, I left some cleanups in the series
as I didn't think it was worth splitting up the tested series"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
drm/dp/mst: add some defines for logical/physical ports
drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
drm/dp/mst: split connector registration into two parts (v2)
drm/dp/mst: update the link_address_sent before sending the link address (v3)
drm/dp/mst: fixup handling hotplug on port removal.
drm/dp/mst: don't pass port into the path builder function
drm/radeon: drop radeon_fb_helper_set_par
drm: handle cursor_set2 in restore_fbdev_mode
drm/exynos: Staticize local function in exynos_drm_gem.c
drm/exynos: fimd: actually disable dp clock
drm/exynos: dp: remove suspend/resume functions
drm/qxl: recreate the primary surface when the bo is not primary
drm/amdgpu: only print meaningful VM faults
drm/amdgpu/cgs: remove import_gpu_mem
drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2
drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
drm/vmwgfx: Fix a command submission hang regression
drm/exynos: remove unused mode_fixup() code
drm/exynos: remove decon_mode_fixup()
drm/exynos: remove fimd_mode_fixup()
...
Linus Torvalds [Fri, 2 Oct 2015 21:53:25 +0000 (17:53 -0400)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
"Fixes for two recent regressions (in Synaptics PS/2 and uinput
drivers) and some more driver fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: synaptics - fix handling of disabling gesture mode"
Input: psmouse - fix data race in __ps2_command
Input: elan_i2c - add all valid ic type for i2c/smbus
Input: zhenhua - ensure we have BITREVERSE
Input: omap4-keypad - fix memory leak
Input: serio - fix blocking of parport
Input: uinput - fix crash when using ABS events
Input: elan_i2c - expand maximum product_id form 0xFF to 0xFFFF
Input: elan_i2c - add ic type 0x03
Input: elan_i2c - don't require known iap version
Input: imx6ul_tsc - fix controller name
Input: imx6ul_tsc - use the preferred method for kzalloc()
Input: imx6ul_tsc - check for negative return value
Input: imx6ul_tsc - propagate the errors
Input: walkera0701 - fix abs() calculations on 64 bit values
Input: mms114 - remove unneded semicolons
Input: pm8941-pwrkey - remove unneded semicolon
Input: fix typo in MT documentation
Input: cyapa - fix address of Gen3 devices in device tree documentation
Linus Torvalds [Fri, 2 Oct 2015 18:51:46 +0000 (14:51 -0400)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
"Summary:
- Fix for accidental modification of arguments of syscall functions
- Wire up new syscalls
- Update defconfigs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/defconfig: Update defconfigs for v4.3-rc1
m68k: Define asmlinkage_protect
m68k: Wire up membarrier
m68k: Wire up userfaultfd
m68k: Wire up direct socket calls
Marc Zyngier [Fri, 2 Oct 2015 15:44:06 +0000 (16:44 +0100)]
irqchip/gic-v3-its: Count additional LPIs for the aliased devices
When configuring the interrupt mapping for a new device, we
iterate over all the possible aliases to account for their
maximum MSI allocation. This was introduced by e8137f4f5088
("irqchip: gicv3-its: Iterate over PCI aliases to generate ITS configuration").
Turns out that the code doing that is a bit braindead, and repeatedly
accounts for the same device over and over.
Fix this by counting the actual alias that is passed to us by the
core code.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Alex Shi <alex.shi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: David Daney <ddaney.cavm@gmail.com> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/1443800646-8074-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Marc Zyngier [Fri, 2 Oct 2015 15:44:05 +0000 (16:44 +0100)]
irqchip/gic-v3-its: Silence warning when its_lpi_alloc_chunks gets inlined
More agressive inlining in recent versions of GCC have uncovered
a new set of warnings:
drivers/irqchip/irq-gic-v3-its.c: In function its_msi_prepare:
drivers/irqchip/irq-gic-v3-its.c:1148:26: warning: lpi_base may be used
uninitialized in this function [-Wmaybe-uninitialized]
dev->event_map.lpi_base = lpi_base;
^
drivers/irqchip/irq-gic-v3-its.c:1116:6: note: lpi_base was declared here
int lpi_base;
^
drivers/irqchip/irq-gic-v3-its.c:1149:25: warning: nr_lpis may be used
uninitialized in this function [-Wmaybe-uninitialized]
dev->event_map.nr_lpis = nr_lpis;
^
drivers/irqchip/irq-gic-v3-its.c:1117:6: note: nr_lpis was declared here
int nr_lpis;
^
The warning is fairly benign (there is no code path that could
actually use uninitialized variables), but let's silence it anyway
by zeroing the variables on the error path.
Reported-by: Alex Shi <alex.shi@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: David Daney <ddaney.cavm@gmail.com> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/1443800646-8074-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Fri, 2 Oct 2015 18:46:15 +0000 (14:46 -0400)]
Merge tag 'dmaengine-fix-4.3-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"This contains fixes spread throughout the drivers, and also fixes one
more instance of privatecnt in dmaengine.
Driver fixes summary:
- bunch of pxa_dma fixes for reuse of descriptor issue, residue and
no-requestor
- odd fixes in xgene, idma, sun4i and zxdma
- at_xdmac fixes for cleaning descriptor and block addr mode"
* tag 'dmaengine-fix-4.3-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: pxa_dma: fix residue corner case
dmaengine: pxa_dma: fix the no-requestor case
dmaengine: zxdma: Fix off-by-one for testing valid pchan request
dmaengine: at_xdmac: clean used descriptor
dmaengine: at_xdmac: change block increment addressing mode
dmaengine: dw: properly read DWC_PARAMS register
dmaengine: xgene-dma: Fix overwritting DMA tx ring
dmaengine: fix balance of privatecnt
dmaengine: sun4i: fix unsafe list iteration
dmaengine: idma64: improve residue estimation
dmaengine: xgene-dma: fix handling xgene_dma_get_ring_size result
dmaengine: pxa_dma: fix initial list move
Linus Torvalds [Fri, 2 Oct 2015 18:40:57 +0000 (14:40 -0400)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Another week, another round of fixes.
These have been brewing for a bit and in various iterations, but I
feel pretty comfortable about the quality of them. They fix real
issues. The pull request is mostly blk-mq related, and the only one
not fixing a real bug, is the tag iterator abstraction from Christoph.
But it's pretty trivial, and we'll need it for another fix soon.
Apart from the blk-mq fixes, there's an NVMe affinity fix from Keith,
and a single fix for xen-blkback from Roger fixing failure to free
requests on disconnect"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: factor out a helper to iterate all tags for a request_queue
blk-mq: fix racy updates of rq->errors
blk-mq: fix deadlock when reading cpu_list
blk-mq: avoid inserting requests before establishing new mapping
blk-mq: fix q->mq_usage_counter access race
blk-mq: Fix use after of free q->mq_map
blk-mq: fix sysfs registration/unregistration race
blk-mq: avoid setting hctx->tags->cpumask before allocation
NVMe: Set affinity after allocating request queues
xen/blkback: free requests on disconnection
Dmitry Torokhov [Fri, 2 Oct 2015 17:31:32 +0000 (10:31 -0700)]
Revert "Input: synaptics - fix handling of disabling gesture mode"
This reverts commit e51e38494a8ecc18650efb0c840600637891de2c: we
actually do want the device to work in extended W mode, as this is the
mode that allows us receiving multiple contact information.
Matt Bennett [Wed, 30 Sep 2015 04:40:42 +0000 (17:40 +1300)]
MIPS: Octeon: Fix kernel panic on startup from memory corruption
During development it was found that a number of builds would panic
during the kernel init process, more specifically in 'delayed_fput()'.
The panic showed the kernel trying to access a memory address of
'0xb7fdc00' while traversing the 'delayed_fput_list' structure.
Comparing this memory address to the value of the pointer used on
builds that did not panic confirmed that the pointer on crashing
builds must have been corrupted at some stage earlier in the init
process.
By traversing the list earlier and earlier in the code it was found
that 'plat_mem_setup()' was responsible for corrupting the list.
Specifically the line:
Where 'new_ent_addr'=0x4800000 (the address of 'delayed_fput_list')
and the second argument (size)=0xb7fdc00 (the address causing the
kernel panic). The job of this part of 'plat_mem_setup()' is to
allocate chunks of memory for the kernel to use. At the start of
each chunk of memory the size of the chunk is written, hence the
value 0xb7fdc00 is written onto memory at 0x4800000, therefore the
kernel panics when it goes back to access 'delayed_fput_list' later
on in the initialisation process.
On builds that were not crashing it was found that the compiler had
placed 'delayed_fput_list' at 0x4800008, meaning it wasn't corrupted
(but something else in memory was overwritten).
As can be seen in the first function call above the code begins to
allocate chunks of memory beginning from the symbol '__init_end'.
The MIPS linker script (vmlinux.lds.S) however defines the .bss
section to begin after '__init_end'. Therefore memory within the
.bss section is allocated to the kernel to use (System.map shows
'delayed_fput_list' and other kernel structures to be in .bss).
To stop the kernel panic (and the .bss section being corrupted)
memory should begin being allocated from the symbol '_end'.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz> Acked-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: aleksey.makarov@auriga.com
Patchwork: https://patchwork.linux-mips.org/patch/11251/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Paul Burton [Mon, 21 Sep 2015 17:07:42 +0000 (10:07 -0700)]
MIPS: Fix R2300 FP context switch handling
Commit 1a3d59579b9f ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Remove it from the r2300_switch.S implementation too in order to prevent
attempting to save the FP context twice, which would likely lead to an
exception from the second save because the FPU had already been disabled
by the first save.
This patch has only been build tested, using rbtx49xx_defconfig.
Fixes: 1a3d59579b9f ("MIPS: Tidy up FPU context switching") Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: linux-kernel@vger.kernel.org Cc: Manuel Lauss <manuel.lauss@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/11167/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Paul Burton [Mon, 21 Sep 2015 17:07:41 +0000 (10:07 -0700)]
MIPS: Fix octeon FP context switch handling
Commit 1a3d59579b9f ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Octeon uses its own implementation in octeon_switch.S, so remove FP
context saving there too in order to prevent attempting to save context
twice. That formerly led to an exception from the second save as follows
because the FPU had already been disabled by the first save:
Linus Torvalds [Fri, 2 Oct 2015 12:03:04 +0000 (08:03 -0400)]
Merge tag 'mmc-v4.3-rc3' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"Here are some mmc fixes intended for v4.3 rc4:
MMC core:
- Allow users of mmc_of_parse() to succeed when CONFIG_GPIOLIB is
unset
- Prevent infinite loop of re-tuning for CRC-errors for CMD19 and
CMD21
* tag 'mmc-v4.3-rc3' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: core: fix dead loop of mmc_retune
mmc: pxamci: fix card detect with slot-gpio API
mmc: sunxi: Fix clk-delay settings
mmc: core: Don't return an error for CD/WP GPIOs when GPIOLIB is unset
Linus Torvalds [Fri, 2 Oct 2015 11:59:29 +0000 (07:59 -0400)]
Merge git://git.infradead.org/intel-iommu
Pull IOVA fixes from David Woodhouse:
"The main fix here is the first one, fixing the over-allocation of
size-aligned requests. The other patches simply make the existing
IOVA code available to users other than the Intel VT-d driver, with no
functional change.
I concede the latter really *should* have been submitted during the
merge window, but since it's basically risk-free and people are
waiting to build on top of it and it's my fault I didn't get it in, I
(and they) would be grateful if you'd take it"
* git://git.infradead.org/intel-iommu:
iommu: Make the iova library a module
iommu: iova: Export symbols
iommu: iova: Move iova cache management to the iova library
iommu/iova: Avoid over-allocating when size-aligned
This is because when using function graph tracer, if the traced
function return value is in multi regs ([x0-x7]), return_to_handler
may corrupt them. So in return_to_handler, the parameter regs should
be protected properly.
Cc: <stable@vger.kernel.org> # 3.18+ Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ralf Baechle [Fri, 2 Oct 2015 07:48:57 +0000 (09:48 +0200)]
MIPS: BPF: Fix load delay slots.
The entire bpf_jit_asm.S is written in noreorder mode because "we know
better" according to a comment. This also prevented the assembler from
throwing in the required NOPs for MIPS I processors which have no
load-use interlock, thus the load's consumer might end up using the
old value of the register from prior to the load.
Fixed by putting the assembler in reorder mode for just the affected
load instructions. This is not enough for gas to actually try to be
clever by looking at the next instruction and inserting a nop only
when needed but as the comment said "we know better", so getting gas
to unconditionally emit a NOP is just right in this case and prevents
adding further ifdefery.
Ben Hutchings [Thu, 1 Oct 2015 00:40:43 +0000 (01:40 +0100)]
x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
On x32, gcc predefines __x86_64__ but long is only 32-bit. Use
__ILP32__ to distinguish x32.
Fixes this compiler error in perf:
tools/include/asm-generic/bitops/__ffs.h: In function '__ffs':
tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow]
word >>= 32;
^
This isn't sufficient to build perf for x32, though.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
NeilBrown [Thu, 1 Oct 2015 06:03:38 +0000 (16:03 +1000)]
md/bitmap: don't pass -1 to bitmap_storage_alloc.
Passing -1 to bitmap_storage_alloc() causes page->index to be set to
-1, which is quite problematic.
So only pass ->cluster_slot if mddev_is_clustered().
Fixes: b97e92574c0b ("Use separate bitmaps for each nodes in the cluster") Cc: stable@vger.kernel.org (v4.1+) Signed-off-by: NeilBrown <neilb@suse.com>
close_sync() needs to set conf->next_resync to a large, but safe value
below MaxSector and use it to determine whether or not to set
start_next_window in wait_barrier()
Shaohua Li [Fri, 18 Sep 2015 17:20:12 +0000 (10:20 -0700)]
md: clear CHANGE_PENDING in readonly array
If faulty disks of an array are more than allowed degraded number, the
array enters error handling. It will be marked as read-only with
MD_CHANGE_PENDING/RECOVERY_NEEDED set. But currently recovery doesn't
clear CHANGE_PENDING bit for read-only array. If MD_CHANGE_PENDING is
set for a raid5 array, all returned IO will be hold on a list till the
bit is clear. But recovery nevery clears this bit, the IO is always in
pending state and nevery finish. This has bad effects like upper layer
can't get an IO error and the array can't be stopped.
Fixes: c3cce6cda162 ("md/raid5: ensure device failure recorded before write request returns.") Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>
md/raid0: apply base queue limits *before* disk_stack_limits
Calling e.g. blk_queue_max_hw_sectors() after calls to
disk_stack_limits() discards the settings determined by
disk_stack_limits().
So we need to make those calls first.
Fixes: 199dc6ed5179 ("md/raid0: update queue parameter in a safer location.") Cc: stable@vger.kernel.org (v2.6.35+ - please apply with 199dc6ed5179). Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
md/raid5: don't index beyond end of array in need_this_block().
When need_this_block probably shouldn't be called when there
are more than 2 failed devices, we really don't want it to try
indexing beyond the end of the failed_num[] of fdev[] arrays.
So limit the loops to at most 2 iterations.
Reported-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.de>
Shaohua Li [Fri, 18 Sep 2015 17:20:13 +0000 (10:20 -0700)]
raid5: update analysis state for failed stripe
handle_failed_stripe() makes the stripe fail, eg, all IO will return
with a failure, but it doesn't update stripe_head_state. Later
handle_stripe() has special handling for raid6 for handle_stripe_fill().
That check before handle_stripe_fill() doesn't skip the failed stripe
and we get a kernel crash in need_this_block. This patch clear the
analysis state to make sure no functions wrongly called after
handle_failed_stripe()
Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>
md: wait for pending superblock updates before switching to read-only
If a superblock update is pending, wait for it to complete before
letting md_set_readonly() switch to readonly.
Otherwise we might lose important information about a device having
failed.
For external arrays, waiting for superblock updates can wait on
user-space, so in that case, just return an error.
Reported-and-tested-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>
Stephen Smalley [Thu, 1 Oct 2015 13:04:22 +0000 (09:04 -0400)]
x86/mm: Set NX on gap between __ex_table and rodata
Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables. Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.
Before:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte
0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
0xffffffff82200000-0xffffffffa0000000 478M pmd
After:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte
0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
0xffffffff82200000-0xffffffffa0000000 478M pmd
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov Signed-off-by: Ingo Molnar <mingo@kernel.org>
x86/kexec: Fix kexec crash in syscall kexec_file_load()
The original bug is a page fault crash that sometimes happens
on big machines when preparing ELF headers:
BUG: unable to handle kernel paging request at ffffc90613fc9000
IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
The bug is caused by us under-counting the number of memory ranges
and subsequently not allocating enough ELF header space for them.
The bug is typically masked on smaller systems, because the ELF header
allocation is rounded up to the next page.
This patch modifies the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to correctly
count the max number of crash memory ranges. That's because the
walk_system_ram_range() filters out small memory regions that
reside in the same page, but walk_system_ram_res() does not.
Here's how I found the bug:
After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to the program header, so it counts those small memory regions that
reside in a page area.
But, when the kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions,
it filters out small regions.
I printed those small memory regions, for example:
kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
Based on the code in walk_system_ram_range(), this memory region
will be filtered out:
So, the max_nr_ranges that's counted by the kernel doesn't include
small memory regions - causing us to under-allocate the required space.
That causes the page fault crash that happens in a later code path
when preparing ELF headers.
This bug is not easy to reproduce on small machines that have few
CPUs, because the allocated page aligned ELF buffer has more free
space to cover those small memory regions' PT_LOAD headers.
Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>