arm/tegra: add support for tegra30 based board cardhu
Add support for the tegra30 based cardhu development board. Cardhu is a tablet
formfactor reference design for tegra30. The patch provides a device tree for
the board, updates Makefile.boot to build the dtb, includes the platform in
Kconfig and updates board-dt.c.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Add support for tegra30 SoC. This includes a device tree compatible type for
this SoC ("nvidia,tegra30") and adds L2 cache initialization for this new SoC.
The clock framework is still missing, which prevents most drivers from working.
The basic IRQs are the same, so remove the dependency on
CONFIG_ARCH_TEGRA_2x_SOC.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
arm/tegra: pinmux tables and definitions for tegra30
Define the pinmuxing and pindrive tables for tegra30. The pinmux table defines
the available functions for each pinmux group. The pindrive table defines the
default pullup or pulldowns for each group.
Derived from code by Scott Williams (scwilliams@nvidia.com)
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
arm/tegra: add new fields to struct tegra_pingroup_desc
Add new fields to struct tegra_pingroup_desc to support new hardware features
introduced in the tegra30 SoC. The pinmux driver won't use those fields yet,
but the tegra30 pinmux tables will already provide the necessary data.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
arm/tegra: prepare pinmux code for multiple tegra variants
This patch modifies the pinmux code to be useable for multiple tegra variants.
Some tegra20 specific constants will be replaced by variables which will be
initialized to the appropriate value at runtime.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Rename pinmux-t2.h and pinmux-t2-tables.c to the new tegra naming. This file
will be reworked somewhat in the next patch to support multiple tegra SoC
types.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Generalize L2 cache initialization and discover L2 cache associativity at
runtime.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Use PMC reset rather then CAR system reset as recommended by the hardware
team.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
arm/tegra: rename board-dt.c to board-dt-tegra20.c
Tegra20 based boards will be handled by the current board-dt.c file. Tegra30
based boards will be handled by a new board-dt-tegra30.c file. Hence rename
the existing board-dt.c to board-dt-tegra20.c to reflect its use.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
arm/tegra: prepare early init for multiple tegra variants
This patch splits the early init code in a common and a tegra20 specific part.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
don't export clk_measure_input_freq as its functionality is also available
using clk_get_rate().
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
arm/tegra: prepare clock code for multiple tegra variants
Rework the tegra20 clock code to support multiple tegra variants :
* remove tegra2_periph_reset_assert/tegra2_periph_reset_deassert. This
functionality should be in clock.c.
* remove tegra_sdmmc_tap_delay and export tegra2_sdmmc_tap_delay
directly. This feature is handled inside the sdmmc block from tegra30
onwards. So there is no need for support in the clock code beyond
tegra20. There are no in tree users of this function.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
* add a dependency to ARCH_TEGRA_2x_SOC in Kconfig to all tegra20 based boards
and TEGRA_PCI
* make powergating dependent on ARCH_TEGRA_2x_SOC
* remove dependency on ARCH_TEGRA_2x_SOC for clock.c
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
The timer and rtc-timer clocks aren't gated by default, so there is no reason
to crash the system if the dummy enable call failed.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net>
This patch adds the initial device tree for tegra30
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Acked-by: Colin Cross <ccross@android.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Will Deacon [Mon, 6 Jun 2011 14:49:23 +0000 (15:49 +0100)]
ARM: stop: execute platform callback from cpu_stop code
Sending IPI_CPU_STOP to a CPU causes it to execute a busy cpu_relax
loop forever. This makes it impossible to kexec successfully on an SMP
system since the secondary CPUs do not reset.
This patch adds a callback to platform_cpu_kill, defined when
CONFIG_HOTPLUG_CPU=y, from the ipi_cpu_stop handling code. This function
currently just returns 1 on all platforms that define it but allows them
to do something more sophisticated in the future.
Will Deacon [Mon, 6 Jun 2011 11:28:54 +0000 (12:28 +0100)]
ARM: reset: implement soft_restart for jumping to a physical address
Tools such as kexec and CPU hotplug require a way to reset the processor
and branch to some code in physical space. This requires various bits of
jiggery pokery with the caches and MMU which, when it goes wrong, tends
to lock up the system.
This patch fleshes out the soft_restart implementation so that it
branches to the reset code using the identity mapping. This requires us
to change to a temporary stack, held within the kernel image as a static
array, to avoid conflicting with the new view of memory.
Will Deacon [Wed, 8 Jun 2011 14:29:00 +0000 (15:29 +0100)]
ARM: lib: add call_with_stack function for safely changing stack
When disabling the MMU, it is necessary to take out a 1:1 identity map
of the reset code so that it can safely be executed with and without
the MMU active. To avoid the situation where the physical address of the
reset code aliases with the virtual address of the active stack (which
cannot be included in the 1:1 mapping), it is desirable to change to a
new stack at a location which is less likely to alias.
This code adds a new lib function, call_with_stack:
which changes the stack to point at the sp parameter, before invoking
fn(arg) with the new stack selected.
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Dave Martin <dave.martin@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
Jamie Iles [Thu, 1 Dec 2011 10:16:46 +0000 (11:16 +0100)]
ARM: 7183/1: vic: register the VIC for ST-modified VIC's
When probing the VIC, the ST variant has a different probing method to
account for the extra interrupts which meant we didn't previously call
vic_register() which registered the irq_domain.
Acked-by: Linus Walleij <linus.walleij@stericsson.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Fri, 9 Dec 2011 22:45:44 +0000 (14:45 -0800)]
Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
cifs: check for NULL last_entry before calling cifs_save_resume_key
cifs: attempt to freeze while looping on a receive attempt
cifs: Fix sparse warning when calling cifs_strtoUCS
CIFS: Add descriptions to the brlock cache functions
Linus Torvalds [Fri, 9 Dec 2011 22:45:12 +0000 (14:45 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Calling __pa() with an ioremap()ed address is invalid
x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
x86/intel_mid: Kconfig select fix
x86/intel_mid: Fix the Kconfig for MID selection
Linus Torvalds [Fri, 9 Dec 2011 22:41:50 +0000 (14:41 -0800)]
Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6
* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
spi/gpio: fix section mismatch warning
spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
spi/nuc900: Include linux/module.h
spi/ath79: fix compile error due to missing include
Linus Torvalds [Fri, 9 Dec 2011 16:18:08 +0000 (08:18 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: raid5 crash during degradation
md/raid5: never wait for bad-block acks on failed device.
md: ensure new badblocks are handled promptly.
md: bad blocks shouldn't cause a Blocked status on a Faulty device.
md: take a reference to mddev during sysfs access.
md: refine interpretation of "hold_active == UNTIL_IOCTL".
md/lock: ensure updates to page_attrs are properly locked.
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: use new generic {enable,disable}_percpu_irq() routines
drivers/net/ethernet/tile: use skb_frag_page() API
asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
arch/tile: fix double-free bug in homecache_free_pages()
arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.
Linus Torvalds [Fri, 9 Dec 2011 16:07:42 +0000 (08:07 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Fix lost speaker volume controls
ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
ALSA: hda/realtek - Don't create extra controls with channel suffix
ALSA: hda - Fix remaining VREF mute-LED NID check in post-3.1 changes
ALSA: hda - Fix GPIO LED setup for IDT 92HD75 codecs
ASoC: Provide a more complete DMA driver stub
ASoC: Remove references to corgi and spitz from machine driver document
ASoC: Make SND_SOC_MX27VIS_AIC32X4 depend on I2C
ASoC: Fix dependency for SND_SOC_RAUMFELD and SND_PXA2XX_SOC_HX4700
ASoC: uda1380: Return proper error in uda1380_modinit failure path
ASoC: kirkwood: Make SND_KIRKWOOD_SOC_OPENRD and SND_KIRKWOOD_SOC_T5325 depend on I2C
ASoC: Mark WM8994 ADC muxes as virtual
ALSA: hda/realtek - Fix Oops in alc_mux_select()
ALSA: sis7019 - give slow codecs more time to reset
Linus Torvalds [Fri, 9 Dec 2011 16:07:24 +0000 (08:07 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Do no try to schedule task events if there are none
lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
perf header: Use event_name() to get an event name
perf stat: Failure with "Operation not supported"
Modify initialization of PCIe capability registers in Tsi721 mport driver:
- change Completion Timeout value to avoid unexpected data transfer
aborts during intensive traffic.
- replace hardcoded offset of PCIe capability block by making it use the
common function.
This patch is applicable to kernel versions starting from 3.2-rc1.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification. Mailbox
resources has to be properly reported to allow use of all available
mailboxes (initial version reports only MBOX0).
This patch is applicable to kernel versions staring from 3.2-rc1.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 8 Dec 2011 22:34:32 +0000 (14:34 -0800)]
procfs: do not overflow get_{idle,iowait}_time for nohz
Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless. We rely on get_{idle,iowait}_time functions to retrieve
proper data.
These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.
When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int. Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.
This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load. The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.
Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).
Signed-off-by: Michal Hocko <mhocko@suse.cz> Tested-by: Artem S. Tashkinov <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Thu, 8 Dec 2011 22:34:30 +0000 (14:34 -0800)]
mm: vmalloc: check for page allocation failure before vmlist insertion
Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised. Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area. In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().
This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range. It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.
Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.
Michal Hocko [Thu, 8 Dec 2011 22:34:27 +0000 (14:34 -0800)]
mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks
setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:
We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.
The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").
Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Dang Bo <bdang@vmware.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Arve Hjnnevg <arve@android.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Youquan Song [Thu, 8 Dec 2011 22:34:18 +0000 (14:34 -0800)]
thp: set compound tail page _count to zero
Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized. So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.
Youquan Song [Thu, 8 Dec 2011 22:34:16 +0000 (14:34 -0800)]
thp: add compound tail page _mapcount when mapped
With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.
The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd. If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.
So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.
This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.
Peter Zijlstra [Thu, 8 Dec 2011 22:34:13 +0000 (14:34 -0800)]
printk: avoid double lock acquire
Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
introduced another silly bug where we would want to acquire an already
held lock. Avoid this.
Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
More players joined to memory cgroup developments and Johannes' great work
changed internal design of memory cgroup dramatically. And he will do
more works. Michal Hokko did many bug fixes and know memory cgroup very
well. Daisuke Nishimura helped us very much but he seems busy now.
Thanks to his works.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
khugepaged can sometimes cause suspend to fail, requiring that the user
retry the suspend operation.
Use wait_event_freezable_timeout() instead of
schedule_timeout_interruptible() to avoid missing freezer wakeups. A
try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
tight loop too in case of the allocation failing repeatedly, and
wait_event_freezable_timeout will provide it too.
khugepaged would still freeze just fine by trying again the next minute
but it's better if it freezes immediately.
Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Jiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock. For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this. So the next time around this shrinker can cause
really big pressure. Let's skip such shrinkers instead.
Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Fleming [Fri, 18 Nov 2011 13:09:11 +0000 (13:09 +0000)]
x86, efi: Calling __pa() with an ioremap()ed address is invalid
If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.
On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:
BUG: unable to handle kernel paging request at f7f22280
IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
[...]
A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().
Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,
Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.
Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeff Layton [Fri, 2 Dec 2011 01:23:34 +0000 (20:23 -0500)]
cifs: check for NULL last_entry before calling cifs_save_resume_key
Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer
checks at the top. It turns out that at least one of those NULL
pointer checks is needed after all.
When the LastNameOffset in a FIND reply appears to be beyond the end of
the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry
to NULL. Since eaf35b1, the code will now oops in this situation.
Fix this by having the callers check for a NULL last entry pointer
before calling cifs_save_resume_key. No change is needed for the
call site in cifs_readdir as it's not reachable with a NULL
current_entry pointer.
Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Reported-by: Adam G. Metzler <adamgmetzler@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
Jeff Layton [Fri, 2 Dec 2011 01:22:41 +0000 (20:22 -0500)]
cifs: attempt to freeze while looping on a receive attempt
In the recent overhaul of the demultiplex thread receive path, I
neglected to ensure that we attempt to freeze on each pass through the
receive loop.
Reported-and-Tested-by: Woody Suwalski <terraluna977@gmail.com> Reported-and-Tested-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
Linus Torvalds [Thu, 8 Dec 2011 21:18:59 +0000 (13:18 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: drop spin lock when memory alloc fails
Btrfs: check if the to-be-added device is writable
Btrfs: try cluster but don't advance in search list
Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE
Tetsuo Handa [Thu, 8 Dec 2011 12:24:06 +0000 (21:24 +0900)]
TOMOYO: Fix pathname handling of disconnected paths.
Current tomoyo_realpath_from_path() implementation returns strange pathname
when calculating pathname of a file which belongs to lazy unmounted tree.
Use local pathname rather than strange absolute pathname in that case.
Also, this patch fixes a regression by commit 02125a82 "fix apparmor
dereferencing potentially freed dentry, sanitize __d_path() API".
Mark Langsdorf [Fri, 18 Nov 2011 15:33:06 +0000 (16:33 +0100)]
x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
controls whether the HPET or the RTC delivers interrupts to irq8. When
the system goes into suspend, the RTC driver sends a signal to the
HPET driver so that the HPET releases control of irq8, allowing the
RTC to wake the system from suspend. The switchover is accomplished by
a write to the HPET configuration registers which currently only
occurs while servicing the HPET interrupt.
On some systems, I have seen the system suspend before an HPET
interrupt occurs, preventing the write to the HPET configuration
register and leaving the HPET in control of the irq8. As the HPET is
not active during suspend, it does not generate a wake signal and RTC
alarms do not work.
This patch forces the HPET driver to immediately transfer control of
the irq8 channel to the RTC instead of waiting until the next
interrupt event.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
Li Zefan [Thu, 8 Dec 2011 01:08:40 +0000 (20:08 -0500)]
Btrfs: check if the to-be-added device is writable
If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:
[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2
...
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Alexandre Oliva [Thu, 8 Dec 2011 01:08:40 +0000 (20:08 -0500)]
Btrfs: try cluster but don't advance in search list
When we find an existing cluster, we switch to its block group as the
current block group, possibly skipping multiple blocks in the process.
Furthermore, under heavy contention, multiple threads may fail to
allocate from a cluster and then release just-created clusters just to
proceed to create new ones in a different block group.
This patch tries to allocate from an existing cluster regardless of its
block group, and doesn't switch to that group, instead proceeding to
try to allocate a cluster from the group it was iterating before the
attempt.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Will Deacon [Tue, 22 Nov 2011 17:30:32 +0000 (17:30 +0000)]
ARM: LPAE: mark memory banks with start > ULONG_MAX as highmem
Memory banks living outside of the 32-bit physical address
space do not have a 1:1 pa <-> va mapping and therefore the
__va macro may wrap.
This patch ensures that such banks are marked as highmem so
that the Kernel doesn't try to split them up when it sees that
the wrapped virtual address overlaps the vmalloc space.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Nicolas Pitre <nico@linaro.org>
Catalin Marinas [Tue, 22 Nov 2011 17:30:32 +0000 (17:30 +0000)]
ARM: LPAE: Add identity mapping support for the 3-level page table format
With LPAE, the pgd is a separate page table with entries pointing to the
pmd. The identity_mapping_add() function needs to ensure that the pgd is
populated before populating the pmd level. The do..while blocks now loop
over the pmd in order to have the same implementation for the two page
table formats. The pmd_addr_end() definition has been removed and the
generic one used instead. The pmd clean-up is done in the pgd_free()
function.
Catalin Marinas [Tue, 22 Nov 2011 17:30:31 +0000 (17:30 +0000)]
ARM: LPAE: Add context switching support
With LPAE, TTBRx registers are 64-bit. The ASID is stored in TTBR0
rather than a separate Context ID register. This patch makes the
necessary changes to handle context switching on LPAE.
Catalin Marinas [Tue, 22 Nov 2011 17:30:31 +0000 (17:30 +0000)]
ARM: LPAE: Add fault handling support
The DFSR and IFSR register format is different when LPAE is enabled. In
addition, DFSR and IFSR have similar definitions for the fault type.
This modifies the fault code to correctly handle the new format.
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: Invalidate the TLB before freeing the PMD
Similar to the PTE freeing, this patch introduced __pmd_free_tlb() which
invalidates the TLB before freeing a PMD page. This is needed because on
newer processors the entry in the upper page table may be cached by the
TLB and point to random data after the PMD has been freed.
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: MMU setup for the 3-level page table format
This patch adds the MMU initialisation for the LPAE page table format.
The swapper_pg_dir size with LPAE is 5 rather than 4 pages. A new
proc-v7-3level.S file contains the TTB initialisation, context switch
and PTE setting code with the LPAE. The TTBRx split is based on the
PAGE_OFFSET with TTBR1 used for the kernel mappings. The 36-bit mappings
(supersections) and a few other memory types in mmu.c are conditionally
compiled.
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: Page table maintenance for the 3-level format
This patch modifies the pgd/pmd/pte manipulation functions to support
the 3-level page table format. Since there is no need for an 'ext'
argument to cpu_set_pte_ext(), this patch conditionally defines a
different prototype for this function when CONFIG_ARM_LPAE.
The patch also introduces the L_PGD_SWAPPER flag to mark pgd entries
pointing to pmd tables pre-allocated in the swapper_pg_dir and avoid
trying to free them at run-time. This flag is 0 with the classic page
table format.
Catalin Marinas [Tue, 22 Nov 2011 17:30:29 +0000 (17:30 +0000)]
ARM: LPAE: Introduce the 3-level page table format definitions
This patch introduces the pgtable-3level*.h files with definitions
specific to the LPAE page table format (3 levels of page tables).
Each table is 4KB and has 512 64-bit entries. An entry can point to a
40-bit physical address. The young, write and exec software bits share
the corresponding hardware bits (negated). Other software bits use spare
bits in the PTE.
The patch also changes some variable types from unsigned long or int to
pteval_t or pgprot_t.
Will Deacon [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: add ISBs around MMU enabling code
Before we enable the MMU, we must ensure that the TTBR registers contain
sane values. After the MMU has been enabled, we jump to the *virtual*
address of the following function, so we also need to ensure that the
SCTLR write has taken effect.
This patch adds ISB instructions around the SCTLR write to ensure the
visibility of the above.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: Factor out classic-MMU specific code into proc-v7-2level.S
This patch modifies the proc-v7.S file so that it only contains code
shared between classic MMU and LPAE. The non-common code is factored out
into a separate file.
Catalin Marinas [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: Move the FSR definitions to separate files
The FSR structure is different with LPAE and this patch moves the
classic MMU specific definition to a separate fsr-2level.c file that is
included in fault.c. It also moves the fsr_fs and FSR bits to the
fault.h file.
Catalin Marinas [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: LPAE: Move page table maintenance macros to pgtable-2level.h
The page table maintenance macros need to be duplicated between the
classic and the LPAE MMU so this patch moves those that are not common
to the pgtable-2level.h file.
Russell King [Tue, 22 Nov 2011 17:30:28 +0000 (17:30 +0000)]
ARM: pgtable: switch to use pgtable-nopud.h
Nick Piggin noted upon introducing 4level-fixup.h:
| Add a temporary "fallback" header so architectures can run with
| the 4level pagetables patch without modification. All architectures
| should be converted to use the folding headers (include/asm-generic/
| pgtable-nop?d.h) as soon as possible, and the fallback header removed.
This makes ARM compliant with this statement.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Tue, 22 Nov 2011 17:30:27 +0000 (17:30 +0000)]
ARM: pgtable: Fix compiler warning in ioremap.c introduced by nopud
With the arch/arm code conversion to pgtable-nopud.h, the section and
supersection (un|re)map code triggers compiler warnings on UP systems.
This is caused by pmd_offset() being given a pgd_t argument rather than
a pud_t one. This patch makes the necessary conversion with the
assumption that the pud is folded into the pgd. The page table setting
code only loops over the pmd which is enough with the classic page
tables. This code is not compiled when LPAE is enabled.
entry-macro.S contains some stale code for chips before Tegra20 that
apparently didn't use an ARM GIC. All chips supported by mainline use
an ARM GIC, so rip out the stale code.
Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net>
NeilBrown [Thu, 8 Dec 2011 05:26:08 +0000 (16:26 +1100)]
md: ensure new badblocks are handled promptly.
When we mark blocks as bad we need them to be acknowledged by the
metadata handler promptly.
For an in-kernel metadata handler that was already being done. But
for an external metadata handler we need to alert it of the change by
sending a notification through the sysfs file. This adds that
notification.
NeilBrown [Thu, 8 Dec 2011 05:22:48 +0000 (16:22 +1100)]
md: bad blocks shouldn't cause a Blocked status on a Faulty device.
Once a device is marked Faulty the badblocks - whether acknowledged or
not - become irrelevant. So they shouldn't cause the device to be
marked as Blocked.
Without this patch, a process might write "-blocked" to clear the
Blocked status, but while that will correctly fail the device, it
won't remove the apparent 'blocked' status.
arm/tegra: convert tegra20 to GIC devicetree binding
Convert tegra20 IRQ intialization to the GIC devicetree binding. Modify the
interrupt definitions in the dts files according to
Documentation/devicetree/bindings/arm/gic.txt
v3 (swarren):
* Moved of_irq_init() call into board-dt.c to avoid ifdef'ing it.
- Even with a dummy replacement if !CONFIG_OF, the reference from
tegra_dt_irq_match[] to gic_of_init() would still have to be ifdef'd
- It's plausible that tegra_dt_irq_match[] may need to contain more
entries in the future, and defining what they are seems more suitable
for board-dt.c than irq.c
v2 (swarren):
* Removed some stale GIC init code from board-dt.c
* Undid some accidental 0x -> 0x0 search/replace.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Stephen Warren <swarren@nvidia.com>
[olof: added include of <asm/hardware/gic.h> for compile to pass] Signed-off-by: Olof Johansson <olof@lixom.net>
PPI_NR is never used in arch/arm/mach-tegra/irq.c. Remove it.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net>
NeilBrown [Thu, 8 Dec 2011 04:49:46 +0000 (15:49 +1100)]
md: take a reference to mddev during sysfs access.
When we are accessing an mddev via sysfs we know that the
mddev cannot disappear because it has an embedded kobj which
is refcounted by sysfs.
And we also take the mddev_lock.
However this is not enough.
The final mddev_put could have been called and the
mddev_delayed_delete is waiting for sysfs to let go so it can destroy
the kobj and mddev.
In this state there are a lot of changes that should not be attempted.
To to guard against this we:
- initialise mddev->all_mddevs in on last put so the state can be
easily detected.
- in md_attr_show and md_attr_store, check ->all_mddevs under
all_mddevs_lock and mddev_get the mddev if it still appears to
be active.
This means that if we get to sysfs as the mddev is being deleted we
will get -EBUSY.
rdev_attr_store and rdev_attr_show are similar but already have
sufficient protection. They check that rdev->mddev still points to
mddev after taking mddev_lock. As this is cleared before delayed
removal which can only be requested under the mddev_lock, this
ensure the rdev and mddev are still alive.
NeilBrown [Thu, 8 Dec 2011 04:49:12 +0000 (15:49 +1100)]
md: refine interpretation of "hold_active == UNTIL_IOCTL".
We like md devices to disappear when they really are not needed.
However it is not possible to tell from the current state whether it
is needed or not. We can only tell from recent history of changes.
In particular immediately after we create an md device it looks very
similar to immediately after we have finished with it.
So we always preserve a newly created md device until something
significant happens. This state is stored in 'hold_active'.
The normal case is to keep it until an ioctl happens, as that will
normally either activate it, or explicitly de-activate it. If it
doesn't then it was probably created by mistake and it is now time to
get rid of it.
We can also modify an array via sysfs (instead of via ioctl) and we
currently treat any change via sysfs like an ioctl as a sign that if
it now isn't more active, it should be destroyed.
However this is not appropriate as changes made via sysfs are more
gradual so we should look for a more definitive change.
So this patch only clears 'hold_active' from UNTIL_IOCTL to clear when
the array_state is changed via sysfs. Other changes via sysfs
are ignored.
Stephen Warren [Mon, 21 Nov 2011 21:44:11 +0000 (14:44 -0700)]
arm/dt: tegra: Fix SDHCI nodes to match board files
Mark any SDHCI controllers that aren't registered by the board files as
disabled in the device-tree files.
In practice, these controllers:
* Have nothing hooked up to them at all, or
* For ports intended for SDIO usage, the drivers for anything that might
be attached are not in the device-tree yet. If/when drivers appear, the
SD/MMC port can be re-enabled.
The only possible exception is TrimSlice's mico SD slot, but that wasn't
enabled in the board files before anyway, and doesn't work when all the
SDHCI controllers are enabled anyway.
Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Stephen Warren [Mon, 21 Nov 2011 21:44:10 +0000 (14:44 -0700)]
arm/dt: tegra: Fix serial nodes to match board files
Mark any serial ports that aren't registered by the board files as disabled
in the device-tree files.
In practice, none of the now-disabled ports ended up succeeding device
probing because of the missing clock-frequency property. However,
explicitly marking the devices disabled has the advantage of squashing
the dev_warn() the failed probe causes, and documenting that we intend
the port not to be used, rather than accidentally left out the property.
Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Stephen Warren [Mon, 21 Nov 2011 21:44:09 +0000 (14:44 -0700)]
arm/dt: tegra: Fix I2C nodes to match board files
With board files, all I2C busses run at 400KHz. Fix the device-tree
to be consistent with this. It's possible this is incorrect, but at
least it keeps the board files and device-tree consistent.
Also, disable any I2C controllers that the board files don't register,
also for consistency.
Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net>