Xiaotian Feng [Thu, 15 Nov 2012 02:37:08 +0000 (13:37 +1100)]
tasklet: ignore disabled tasklet in tasklet_action()
We met a ksoftirqd 100% issue, the perf top shows kernel is busy with
tasklet_action(), but no actual action is shown. From dumped kernel,
there's only one disabled tasklet on the tasklet_vec.
tasklet_action might be handled after tasklet is disabled, this will make
disabled tasklet stayed on tasklet_vec. tasklet_action will not handle
disabled tasklet, but place it on the tail of tasklet_vec, still raise
softirq for this tasklet. Things will become worse if device driver uses
tasklet_disable on its device remove/close code. The disabled tasklet
will stay on the vec, frequently __raise_softirq_off() and make ksoftirqd
wakeup even if no tasklets need to be handled.
This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
in tasklet_action(), simply ignore the disabled tasklet and don't raise
the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
it is the same as tasklet_enable(). So only tasklet_enable() needs to be
modified, if tasklet state is changed from disable to enable, use
__tasklet_schedule() to put it on the right vec.
Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The block control group, infiniband, xfs, crypto, 802.11, netfilter.
Nothing quite so fundamental as fs/namespace.c but definitely in
multiplatform-code that should work, and is already broken on those
architecutres.
Looking at the implementation of atomic64_add_return in lib/atomic64.c the
code looks as efficient as these kinds of things get.
Which leads me to the conclusion that we need atomic64 support on all
architectures.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hannes Reinecke [Thu, 15 Nov 2012 02:37:07 +0000 (13:37 +1100)]
fs/pstore/ram.c: fix up section annotations
The compiler complained about missing section annotations. Fix it.
Signed-off-by: Hannes Reinecke <hare@suse.de> Cc: Anton Vorontsov <cbouatmailru@gmail.com> Cc: Colin Cross <ccross@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
How is the compiler even handling exported functions that are marked
inline? Anyway, these shouldn't be inline because of that, so remove that
marking.
Based on a larger patch by Mark Charlebois to get LLVM to build the
kernel.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Charlebois <mcharleb@qualcomm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The use of defined() on arrays and hashes has been deprecated since perl
5.6, but until 5.17.6 it only warned on lexicals, not package globals.
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alan Cox [Thu, 15 Nov 2012 02:37:07 +0000 (13:37 +1100)]
irq: tsk->comm is an array
The array check is useless so remove it.
[akpm@linux-foundation.org: remove comment, per David] Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jean Delvare [Thu, 15 Nov 2012 02:37:06 +0000 (13:37 +1100)]
drm/i915: optimize DIV_ROUND_CLOSEST() call
DIV_ROUND_CLOSEST is faster if the compiler knows it will only be dealing
with unsigned dividends.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pcmcia: move unbind/rebind into dev_pm_ops.complete
Move the device rebind procedures for cardbus devices from the pm.resume
into the pm.complete callback.
The reason for moving the code is: "[...] The PM code needs to send
suspend and resume messages to every device in the right order, and it
can't do that if new devices are being added at the same time. [...]"
However the situation really isn't quite that rigid. In particular,
adding new children during a resume callback shouldn't cause much of
problem because the children don't need to be resumed anyway (since they
were never suspended). On the other hand, if you do it you will get a
dev_warn() from the PM core, something like 'parent should not be
sleeping'.
Still, it is considered bad form and should be avoided if possible."
(Alan Stern's full comment about the topic can
be found here: <https://lkml.org/lkml/2012/7/10/254>)
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Greg KH <greg@kroah.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wen Congyang [Thu, 15 Nov 2012 02:37:05 +0000 (13:37 +1100)]
x86: make 'mem=' option to work for efi platform
Current mem boot option only can work for non efi environment. If the
user specifies add_efi_memmap, it cannot work for efi environment. In the
efi environment, we call e820_add_region() to add the memory map. So we
can modify __e820_add_region() and the mem boot option can work for efi
environment.
Note: Only E820_RAM is limited, and BOOT_SERVICES_{CODE,DATA} are always
mapped(If its address >= mem_limit, the memory won't be freed in
efi_free_boot_services()).
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Rob Landley <rob@landley.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Thu, 15 Nov 2012 02:37:05 +0000 (13:37 +1100)]
olpc: fix olpc-xo1-sci.c build errors
Fix build errors when CONFIG_INPUT=m. This is not pretty, but all of the
OLPC kconfig options are bool instead of tristate.
arch/x86/built-in.o: In function `send_lid_state':
olpc-xo1-sci.c:(.text+0x1d323): undefined reference to `input_event'
olpc-xo1-sci.c:(.text+0x1d338): undefined reference to `input_event'
arch/x86/built-in.o: In function `free_ebook_switch':
olpc-xo1-sci.c:(.text+0x1d529): undefined reference to `input_unregister_device'
olpc-xo1-sci.c:(.text+0x1d533): undefined reference to `input_free_device'
arch/x86/built-in.o: In function `free_power_button':
olpc-xo1-sci.c:(.text+0x1d549): undefined reference to `input_unregister_device'
olpc-xo1-sci.c:(.text+0x1d553): undefined reference to `input_free_device'
arch/x86/built-in.o: In function `send_ebook_state':
olpc-xo1-sci.c:(.text+0x1d632): undefined reference to `input_event'
olpc-xo1-sci.c:(.text+0x1d647): undefined reference to `input_event'
arch/x86/built-in.o: In function `xo1_sci_intr':
olpc-xo1-sci.c:(.text+0x1d78e): undefined reference to `input_event'
olpc-xo1-sci.c:(.text+0x1d7a3): undefined reference to `input_event'
olpc-xo1-sci.c:(.text+0x1d7be): undefined reference to `input_event'
arch/x86/built-in.o:olpc-xo1-sci.c:(.text+0x1d7d3): more undefined references to `input_event' follow
arch/x86/built-in.o: In function `free_lid_switch':
olpc-xo1-sci.c:(.text+0x1d7fd): undefined reference to `input_unregister_device'
olpc-xo1-sci.c:(.text+0x1d807): undefined reference to `input_free_device'
arch/x86/built-in.o: In function `setup_lid_switch':
olpc-xo1-sci.c:(.devinit.text+0x155): undefined reference to `input_allocate_device'
olpc-xo1-sci.c:(.devinit.text+0x1a4): undefined reference to `input_register_device'
olpc-xo1-sci.c:(.devinit.text+0x1ce): undefined reference to `input_unregister_device'
olpc-xo1-sci.c:(.devinit.text+0x1d8): undefined reference to `input_free_device'
arch/x86/built-in.o: In function `xo1_sci_probe':
olpc-xo1-sci.c:(.devinit.text+0x235): undefined reference to `input_allocate_device'
olpc-xo1-sci.c:(.devinit.text+0x285): undefined reference to `input_register_device'
olpc-xo1-sci.c:(.devinit.text+0x299): undefined reference to `input_free_device'
olpc-xo1-sci.c:(.devinit.text+0x2e1): undefined reference to `input_register_device'
olpc-xo1-sci.c:(.devinit.text+0x2f5): undefined reference to `input_free_device'
olpc-xo1-sci.c:(.devinit.text+0x54c): undefined reference to `input_allocate_device'
In the long run, fixing this driver kconfig to be tristate instead of bool
would be a very good change.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Andres Salomon <dilinger@queued.net> Cc: Chris Ball <cjb@laptop.org> Cc: Jon Nettleton <jon.nettleton@gmail.com> Cc: Daniel Drake <dsd@laptop.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alex Shi [Thu, 15 Nov 2012 02:37:05 +0000 (13:37 +1100)]
arch/x86/platform/uv: fix incorrect tlb flush all issue
The flush tlb optimization code has logical issue on UV platform. It
doesn't flush the full range at all, since it simply ignores its 'end'
parameter (and hence also the "all" indicator) in uv_flush_tlb_others()
function.
Cliff's notes:
: I tested the patch on a UV. It has the effect of either clearing 1 or all
: TLBs in a cpu. I added some debugging to test for the cases when clearing
: all TLBs is overkill, and in practice it happens very seldom.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Cliff Wickman <cpw@sgi.com> Tested-by: Cliff Wickman <cpw@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Thu, 15 Nov 2012 02:37:04 +0000 (13:37 +1100)]
arch/x86/tools/insn_sanity.c: identify source of messages
The kernel build prints:
Building modules, stage 2.
TEST posttest
MODPOST 3821 modules
TEST posttest
Success: decoded and checked 1000000 random instructions with 0 errors (seed:0xaac4bc47)
CC arch/x86/boot/a20.o
CC arch/x86/boot/cmdline.o
AS arch/x86/boot/copy.o
HOSTCC arch/x86/boot/mkcpustr
CC arch/x86/boot/cpucheck.o
CC arch/x86/boot/early_serial_console.o
which is irritating because you don't know what program is proudly
pronouncing its success.
So, as described in "console mode programming user interface guidelines
version 101" which doesn't exist, change this program to identify the
source of its messages.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wen Congyang [Thu, 15 Nov 2012 02:37:04 +0000 (13:37 +1100)]
x86 numa: don't check if node is NUMA_NO_NODE
If we aren't debugging per_cpu maps, the cpu's node is stored in per_cpu
variable numa_node. If `node' is NUMA_NO_NODE, it means the caller wants
to clear the cpu's node. So we should also call set_cpu_numa_node() in
this case.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shérab [Thu, 15 Nov 2012 02:37:04 +0000 (13:37 +1100)]
arch/x86/platform/iris/iris.c: register a platform device and a platform driver
This makes the iris driver use the platform API, so it is properly exposed
in /sys.
[akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout] Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org> Cc: Len Brown <lenb@kernel.org> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wen Congyang [Thu, 15 Nov 2012 02:37:03 +0000 (13:37 +1100)]
x86 cpu_hotplug: unmap cpu2node when the cpu is hotremoved
When a cpu is hotplugged, we call acpi_map_cpu2node() in
_acpi_map_lsapic() to store the cpu's node. But we don't clear the cpu's
node in acpi_unmap_lsapic() when this cpu is hotremoved. If the node is
also hotremoved, We will get the following messages:
The reason is that: the cpu's node is not NUMA_NO_NODE, we will call
alloc_pages_exact_node() to alloc memory on the node, but the node is
offlined.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MITSUNARI Shigeo [Thu, 15 Nov 2012 02:37:03 +0000 (13:37 +1100)]
fs/block_dev.c: page cache wrongly left invalidated after revalidate_disk()
We found that bdev->bd_invalidated was left set once revalidate_disk() is
called, which results in page cache flush every time that device is open.
Specifically, we found this problem in MD block device. Once we resize a
MD device, mdadm --monitor periodically flush all page cache for that
device every 60 or 1000 seconds when it opens the device.
This bug lies since at least 3.2.0 till the latest kernel(3.6.2).
Patch is attached.
The following steps will reproduce the problem.
1. prepair a block device(ex. /dev/sdb).
2. create two partitions.
NeilBrown [Thu, 15 Nov 2012 02:37:02 +0000 (13:37 +1100)]
vfs: d_obtain_alias() needs to use "/" as default name.
NFS appears to use d_obtain_alias() to create the root dentry rather than
d_make_root. This can cause 'prepend_path()' to complain that the root
has a weird name if an NFS filesystem is lazily unmounted. e.g. if
"/mnt" is an NFS mount then
{ cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; }
will cause a WARN message like
WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0()
...
Root dentry has weird name <>
to appear in kernel logs.
So change d_obtain_alias() to use "/" rather than "" as the anonymous
name.
Signed-off-by: NeilBrown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch below does what Paul McKenney suggested in the previous thread.
Signed-off-by: Dave Jones <davej@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@parisplace.org> Cc: James Morris <jmorris@namei.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Corey Minyard [Thu, 15 Nov 2012 02:37:02 +0000 (13:37 +1100)]
CRIS: Fix I/O macros
The inb/outb macros for CRIS are broken from a number of points of view,
missing () around parameters and they have an unprotected if statement in
them. This was breaking the compile of IPMI on CRIS and thus I was being
annoyed by build regressions, so I fixed them.
Plus I don't think they would have worked at all, since the data values
were missing "&" and the outsl had a "3" instead of a "4" for the size.
From what I can tell, this stuff is not used at all, so this can't be any
more broken than it was before, anyway.
Mel Gorman [Thu, 15 Nov 2012 02:37:01 +0000 (13:37 +1100)]
mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"
Jiri Slaby reported the following:
(It's an effective revert of "mm: vmscan: scale number of pages
reclaimed by reclaim/compaction based on failures".) Given kswapd
had hours of runtime in ps/top output yesterday in the morning
and after the revert it's now 2 minutes in sum for the last 24h,
I would say, it's gone.
The intention of the patch in question was to compensate for the loss of
lumpy reclaim. Part of the reason lumpy reclaim worked is because it
aggressively reclaimed pages and this patch was meant to be a sane
compromise.
When compaction fails, it gets deferred and both compaction and
reclaim/compaction is deferred avoid excessive reclaim. However, since
commit c6543459 ("mm: remove __GFP_NO_KSWAPD"), kswapd is woken up each
time and continues reclaiming which was not taken into account when the
patch was developed.
Attempts to address the problem ended up just changing the shape of the
problem instead of fixing it. The release window gets closer and while a
THP allocation failing is not a major problem, kswapd chewing up a lot of
CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed
by reclaim/compaction based on failures" and will be revisited in the
future.
7b540d0646ce ("proc_map_files_readdir(): don't bother with grabbing
files") switched proc_map_files_readdir() to use @f_mode directly instead
of grabbing @file reference, but same time the test for @vm_file presence
was lost leading to nil dereference. The patch brings the test back.
The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped (which
is set to 'n' by default) so the bug doesn't affect regular kernels.
The regression is 3.7-rc1 only as far as I can tell.
[gorcunov@openvz.org: provided changelog] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jiang Liu [Thu, 15 Nov 2012 02:37:01 +0000 (13:37 +1100)]
mm: fix a regression with HIGHMEM
Changeset 7f1290f2f2 ("mm: fix-up zone present pages") tried to fix a
issue when calculating zone->present_pages, but it causes a regression to
32bit systems with HIGHMEM. With that changeset,
reset_zone_present_pages() resets all zone->present_pages to zero, and
fixup_zone_present_pages() is called to recalculate zone->present_pages
when the boot allocator frees core memory pages into the buddy allocator.
Because highmem pages are not freed by bootmem allocator, all highmem
zones' present_pages becomes zero.
Actually there's no need to recalculate present_pages for the highmem zone
because the bootmem allocator never allocates pages from it. So fix the
regression by skipping highmem in function reset_zone_present_pages() and
fixup_zone_present_pages().
Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com> Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hugh Dickins [Thu, 15 Nov 2012 02:37:00 +0000 (13:37 +1100)]
tmpfs: change final i_blocks BUG to WARNING
Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.
It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(), and
the lack of coherent locking between mapping's nrpages and shmem's swapped
count. There's a window in shmem_writepage(), between lowering nrpages in
shmem_delete_from_page_cache() and then raising swapped count, when the
freed count appears to be +1 when it should be 0, and then the asymmetry
stops it from being corrected with -1 before hitting the BUG.
One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected. Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether, but
previous attempts at that failed.
So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Thanks to Johannes for pointing to truncation: free_swap_and_cache() only
does a trylock on the page, so the page lock we've held since before
confirming swap is not enough to protect against truncation.
What cleanup is needed in this case? Just delete_from_swap_cache(), which
takes care of the memcg uncharge.
Reported-by: Dave Jones <davej@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Randy Dunlap [Thu, 15 Nov 2012 02:37:00 +0000 (13:37 +1100)]
rapidio: fix kernel-doc warnings
Fix rapidio kernel-doc warnings:
Warning(drivers/rapidio/rio.c:415): No description found for parameter 'local'
Warning(drivers/rapidio/rio.c:415): Excess function parameter 'lstart' description in 'rio_map_inb_region'
Warning(include/linux/rio.h:290): No description found for parameter 'switches'
Warning(include/linux/rio.h:290): No description found for parameter 'destid_table'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Matt Porter <mporter@kernel.crashing.org> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ian Munsie [Thu, 8 Nov 2012 05:40:28 +0000 (16:40 +1100)]
powerpc: Disable relocation on exceptions when kexecing
Since we don't know if they new kernel we are kexecing into has been
built to support relocation on exceptions, we disable them before we
kexec.
We do NOT disable them if we are execing a kdump kernel, because we
want to change as little state as possible and it is likely that we are
execing ourselves and will be able to handle them anyway.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ian Munsie [Thu, 8 Nov 2012 05:03:14 +0000 (16:03 +1100)]
powerpc: Enable relocation on during exceptions at boot
We currently do this synchronously at boot from setup_arch. On a large
system this could hypothetically take a little while to complete, so
currently we will give up if we are asked to wait for more than a second
in total.
If we actually start hitting that timeout in practice we can always move
this code into a kernel thread to take care of it in the background.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ian Munsie [Thu, 8 Nov 2012 05:10:29 +0000 (16:10 +1100)]
powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function
I am going to use this in the next patch, better to have this code in
one place rather than three.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ian Munsie [Thu, 8 Nov 2012 04:57:04 +0000 (15:57 +1100)]
powerpc: Add wrappers to enable/disable relocation on exceptions
These wrappers hide the parameters that have to be passed to H_SET_MODE
to enable/disable relocation on during exceptions.
As noted in the comments, since these have partition wide scope, they
may take some time to complete and must be periodically retried until
H_SUCCESS is returned.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ian Munsie [Tue, 6 Nov 2012 05:15:17 +0000 (16:15 +1100)]
powerpc: Add set_mode hcall
This new hcall in POWER8 is used to set various resource mode registers.
eg. it can set address translation mode on interrupt (note: partition wide
scope)
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Fri, 2 Nov 2012 05:41:58 +0000 (16:41 +1100)]
powerpc: Setup relocation on exceptions for bare metal systems
This turns on MMU on execptions via AIL field in the LPCR.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Mon, 5 Nov 2012 03:40:18 +0000 (14:40 +1100)]
powerpc: Move initial mfspr LPCR out of __init_LPCR
We want to change what's initially set in the LPCR, so start by taking the move
from LPCR out of the function and into the caller.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Fri, 2 Nov 2012 06:21:43 +0000 (17:21 +1100)]
powerpc: Add relocation on exception vector handlers
POWER8/v2.07 allows exceptions to be taken with the MMU still on.
A new set of exception vectors is added at 0xc000_0000_0000_4xxx. When the HW
takes us here, MSR IR/DR will be set already and we no longer need a costly
RFID to turn the MMU back on again.
The original 0x0 based exception vectors remain for when the HW can't leave the
MMU on. Examples of this are when we can't trust the current MMU mappings,
like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU
was off already. In these cases the HW will take us to the original 0x0 based
exception vectors with the MMU off as before.
This uses the new macros added previously too implement these new execption
vectors at 0xc000_0000_0000_4xxx. We exit these exception vectors using
mflr/blr (rather than mtspr SSR0/RFID), since we don't need the costly MMU
switch anymore.
This moves the __end_interrupts marker down past these new 0x4000 vectors since
they will need to be copied down to 0x0 when the kernel is not at 0x0.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Fri, 2 Nov 2012 06:21:28 +0000 (17:21 +1100)]
powerpc: Add new macros needed for relocation on exceptions
POWER8/v2.07 allows exceptions to be taken with the MMU still on.
A new set of exception vectors is added at 0xc000_0000_0000_4xxx. When the HW
takes us here, MSR IR/DR will be set already and we no longer need a costly
RFID to turn the MMU back on again.
The original 0x0 based exception vectors remain for when the HW can't leave the
MMU on. Examples of this are when we can't trust the current the MMU mappings,
like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU
was off already. In these cases the HW will take us to the original 0x0 based
exception vectors with the MMU off as before.
The below macros are copies of the macros used at the 0x0 offset but modified
to handle the MMU being on. In these macros we use the link register to jump
to the secondary handlers rather than using RFID (RFID was also use to turn on
the MMU).
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Fri, 2 Nov 2012 06:16:01 +0000 (17:16 +1100)]
powerpc: Turn syscall handler into macros
This turns the syscall handler into macros as we are going to want to reuse
them again later.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n,
the kernel fails when we run at a non zero offset. It turns out
we were incorrectly wrapping some of the relocatable kernel code
with CONFIG_CRASH_DUMP.
Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>