Thomas Gleixner [Thu, 24 Mar 2011 20:41:57 +0000 (21:41 +0100)]
x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional
That call escaped the name space cleanup. Fix it up.
We really want to call there. The chip might have changed since the
irq was setup initially. So let the core code and the chip decide what
to do. The status is just an unreliable snapshot.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Thomas Gleixner [Thu, 24 Mar 2011 21:53:10 +0000 (22:53 +0100)]
x86: DT: Fix return condition in irq_create_of_mapping()
The xlate() function returns 0 or a negative error code. Returning the
error code blindly will be seen as an huge irq number by the calling
function because irq_create_of_mapping() returns an unsigned value.
Return 0 (NO_IRQ) as required.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
xen: update mask_rw_pte after kernel page tables init changes
After "x86-64, mm: Put early page table high" already existing kernel
page table pages can be mapped using early_ioremap too so we need to
update mask_rw_pte to make sure these pages are still mapped RO.
The reason why we have to do that is explain by the commit message of fef5ba797991f9335bcfc295942b684f9bf613a1:
"Xen requires that all pages containing pagetable entries to be mapped
read-only. If pages used for the initial pagetable are already mapped
then we can change the mapping to RO. However, if they are initially
unmapped, we need to make sure that when they are later mapped, they
are also mapped RO.
..SNIP..
the pagetable setup code early_ioremaps the pages to write their
entries, so we must make sure that mappings created in the early_ioremap
fixmap area are mapped RW. (Those mappings are removed before the pages
are presented to Xen as pagetable pages.)"
We accomplish all this in mask_rw_pte by mapping RO all the pages mapped
using early_ioremap apart from the last one that has been allocated
because it is not a page table page yet (it has not been hooked into the
page tables yet).
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Fri, 18 Feb 2011 11:30:30 +0000 (11:30 +0000)]
x86: Cleanup highmap after brk is concluded
Now cleanup_highmap actually is in two steps: one is early in head64.c
and only clears above _end; a second one is in init_memory_mapping() and
tries to clean from _brk_end to _end.
It should check if those boundaries are PMD_SIZE aligned but currently
does not.
Also init_memory_mapping() is called several times for numa or memory
hotplug, so we really should not handle initial kernel mappings there.
This patch moves cleanup_highmap() down after _brk_end is settled so
we can do everything in one step.
Also we honor max_pfn_mapped in the implementation of cleanup_highmap.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Shaohua Li [Wed, 16 Mar 2011 03:37:29 +0000 (11:37 +0800)]
x86: Flush TLB if PGD entry is changed in i386 PAE mode
According to intel CPU manual, every time PGD entry is changed in i386 PAE
mode, we need do a full TLB flush. Current code follows this and there is
comment for this too in the code.
But current code misses the multi-threaded case. A changed page table
might be used by several CPUs, every such CPU should flush TLB. Usually
this isn't a problem, because we prepopulate all PGD entries at process
fork. But when the process does munmap and follows new mmap, this issue
will be triggered.
When it happens, some CPUs keep doing page faults:
Namhyung Kim [Fri, 18 Mar 2011 02:40:06 +0000 (11:40 +0900)]
x86, dumpstack: Correct stack dump info when frame pointer is available
Current stack dump code scans entire stack and check each entry
contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it
could mark whether the pointer is valid or not based on value of
the frame pointer. Invalid entries could be preceded by '?' sign.
However this was not going to happen because scan start point
was always higher than the frame pointer so that they could not
meet.
Commit 9c0729dc8062 ("x86: Eliminate bp argument from the stack
tracing routines") delayed bp acquisition point, so the bp was
read in lower frame, thus all of the entries were marked
invalid.
This patch fixes this by reverting above commit while retaining
stack_frame() helper as suggested by Frederic Weisbecker.
Jon Nettleton [Wed, 16 Mar 2011 15:32:47 +0000 (15:32 +0000)]
x86: Use PentiumPro-optimized partial_csum() on VIA C7
Testing on the OLPC XO-1.5 (VIA C7-M 1000MHz CPU) shows a partial_csum()
speed increase by a factor of 1.5 when we switch to the Pentium-optimized
version.
Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: dilinger@queued.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Wed, 16 Mar 2011 15:10:07 +0000 (08:10 -0700)]
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
smp: Document transitivity for memory barriers.
rcu: add comment saying why DEBUG_OBJECTS_RCU_HEAD depends on PREEMPT.
rcupdate: remove dead code
rcu: add documentation saying which RCU flavor to choose
rcutorture: Get rid of duplicate sched.h include
rcu: call __rcu_read_unlock() in exit_rcu for tiny RCU
Linus Torvalds [Wed, 16 Mar 2011 15:04:07 +0000 (08:04 -0700)]
Increase OSF partition limit from 8 to 18
It turns out that while a maximum of 8 partitions may be what people
"should" have had, you can actually fit up to 18 entries(*) in a sector.
And some people clearly were taking advantage of that, like Michael
Cree, who had ten partitions on one of his OSF disks.
(*) The OSF partition data starts at byte offset 64 in the first sector,
and the array of 16-byte partition entries start at offset 148 in
the on-disk partition structure.
iprune_sem is continously giving us lockdep warnings because we do take it in
read mode in the reclaim path, but we're also doing non-NOFS allocations under
it taken in write mode.
Taking a bit deeper look at it I think it's fixable quite trivially:
- for invalidate_inodes we do not need iprune_sem at all. We have an active
reference on the superblock, so the filesystem is not going away until it
has finished.
- for evict_inodes we do need it, to make sure prune_icache has done it's
work before we tear down the superblock. But there is no reason to
hold it over the actual reclaim operation - it's enough to cycle through
it after the actual reclaim to make sure we wait for any pending
prune_icache to complete. We just have to remove the WARN_ON for
otherwise busy inodes as they can actually happen now.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Boris Ostrovsky [Tue, 15 Mar 2011 16:13:44 +0000 (12:13 -0400)]
x86, AMD: Set ARAT feature on AMD processors
Support for Always Running APIC timer (ARAT) was introduced in
commit db954b5898dd3ef3ef93f4144158ea8f97deb058. This feature
allows us to avoid switching timers from LAPIC to something else
(e.g. HPET) and go into timer broadcasts when entering deep
C-states.
AMD processors don't provide a CPUID bit for that feature but
they also keep APIC timers running in deep C-states (except for
cases when the processor is affected by erratum 400). Therefore
we should set ARAT feature bit on AMD CPUs.
Tested-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
LKML-Reference: <1300205624-4813-1-git-send-email-ostr@amd64.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andreas Herrmann [Tue, 15 Mar 2011 14:31:37 +0000 (15:31 +0100)]
x86, quirk: Fix SB600 revision check
Commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
(x86 quirk: Fix polarity for IRQ0 pin2 override on SB800
systems) introduced a regression. It removed some SB600 specific
code to determine the revision ID without adapting a
corresponding revision ID check for SB600.
Linus Torvalds [Wed, 16 Mar 2011 03:01:36 +0000 (20:01 -0700)]
Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
x86: Clean up apic.c and apic.h
x86: Remove superflous goal definition of tsc_sync
x86: dt: Correct local apic documentation in device tree bindings
x86: dt: Cleanup local apic setup
x86: dt: Fix OLPC=y/INTEL_CE=n build
rtc: cmos: Add OF bindings
x86: ce4100: Use OF to setup devices
x86: ioapic: Add OF bindings for IO_APIC
x86: dtb: Add generic bus probe
x86: dtb: Add support for PCI devices backed by dtb nodes
x86: dtb: Add device tree support for HPET
x86: dtb: Add early parsing of IO_APIC
x86: dtb: Add irq domain abstraction
x86: dtb: Add a device tree for CE4100
x86: Add device tree support
x86: e820: Remove conditional early mapping in parse_e820_ext
x86: OLPC: Make OLPC=n build again
x86: OLPC: Remove extra OLPC_OPENFIRMWARE_DT indirection
x86: OLPC: Cleanup config maze completely
x86: OLPC: Hide OLPC_OPENFIRMWARE config switch
...
Fix up conflicts in arch/x86/platform/ce4100/ce4100.c
Linus Torvalds [Wed, 16 Mar 2011 02:49:10 +0000 (19:49 -0700)]
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (93 commits)
x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others()
x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation
x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation()
x86-64, NUMA: Clean up initmem_init()
x86-64, NUMA: Fix numa_emulation code with node0 without RAM
x86-64, NUMA: Revert NUMA affine page table allocation
x86: Work around old gas bug
x86-64, NUMA: Better explain numa_distance handling
x86-64, NUMA: Fix distance table handling
mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCK
x86-64, NUMA: Fix size of numa_distance array
x86: Rename e820_table_* to pgt_buf_*
bootmem: Move __alloc_memory_core_early() to nobootmem.c
bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c
bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c
x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()
x86-64, NUMA: Add proper function comments to global functions
x86-64, NUMA: Move NUMA emulation into numa_emulation.c
x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file
x86-64, NUMA: Do not scan two times for setup_node_bootmem()
...
Linus Torvalds [Wed, 16 Mar 2011 02:41:42 +0000 (19:41 -0700)]
Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, mem: Convert memmove() to assembly file and fix return value bug
Linus Torvalds [Wed, 16 Mar 2011 02:40:35 +0000 (19:40 -0700)]
Merge branch 'um-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'um-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
um: Select GENERIC_HARDIRQS_NO_DEPRECATED
um: Use proper accessors in show_interrupts()
um: Convert irq_chips to new functions
um: Remove stale irq_chip.end
Linus Torvalds [Wed, 16 Mar 2011 02:23:40 +0000 (19:23 -0700)]
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits)
x86: Enable forced interrupt threading support
x86: Mark low level interrupts IRQF_NO_THREAD
x86: Use generic show_interrupts
x86: ioapic: Avoid redundant lookup of irq_cfg
x86: ioapic: Use new move_irq functions
x86: Use the proper accessors in fixup_irqs()
x86: ioapic: Use irq_data->state
x86: ioapic: Simplify irq chip and handler setup
x86: Cleanup the genirq name space
genirq: Add chip flag to force mask on suspend
genirq: Add desc->irq_data accessor
genirq: Add comments to Kconfig switches
genirq: Fixup fasteoi handler for oneshot mode
genirq: Provide forced interrupt threading
sched: Switch wait_task_inactive to schedule_hrtimeout()
genirq: Add IRQF_NO_THREAD
genirq: Allow shared oneshot interrupts
genirq: Prepare the handling of shared oneshot interrupts
genirq: Make warning in handle_percpu_event useful
x86: ioapic: Move trigger defines to io_apic.h
...
Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name
space changes clashing with the Xen cleanups. The set_irq_msi() had
moved to xen_bind_pirq_msi_to_irq().
Linus Torvalds [Wed, 16 Mar 2011 02:16:00 +0000 (19:16 -0700)]
Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Combine printk()s in show_regs_common()
x86: Don't call dump_stack() from arch_trigger_all_cpu_backtrace_handler()
Linus Torvalds [Wed, 16 Mar 2011 02:00:53 +0000 (19:00 -0700)]
Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix and clean up generic_processor_info()
x86: Don't copy per_cpu cpuinfo for BSP two times
x86: Move llc_shared_map out of cpu_info
Linus Torvalds [Wed, 16 Mar 2011 01:59:56 +0000 (18:59 -0700)]
Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, binutils, xen: Fix another wrong size directive
x86: Remove dead config option X86_CPU
x86: Really print supported CPUs if PROCESSOR_SELECT=y
x86: Fix a bogus unwind annotation in lib/semaphore_32.S
um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S
x86: Remove unused bits from lib/thunk_*.S
x86: Use {push,pop}_cfi in more places
x86-64: Add CFI annotations to lib/rwsem_64.S
x86, asm: Cleanup unnecssary macros in asm-offsets.c
x86, system.h: Drop unused __SAVE/__RESTORE macros
x86: Use bitmap library functions
x86: Partly unify asm-offsets_{32,64}.c
x86: Reduce back the alignment of the per-CPU data section
Linus Torvalds [Wed, 16 Mar 2011 01:59:21 +0000 (18:59 -0700)]
Merge branch 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
RTC: Fix up rtc.txt documentation to reflect changes to generic rtc layer
RTC: sa1100: Update the sa1100 RTC driver.
RTC: Fix the cross interrupt issue on rtc-test.
RTC: Remove UIE and PIE information from the sa1100 driver proc.
RTC: Include information about UIE and PIE in RTC driver proc.
RTC: Clean out UIE icotl implementations
RTC: Cleanup rtc_class_ops->update_irq_enable()
RTC: Cleanup rtc_class_ops->irq_set_freq()
RTC: Cleanup rtc_class_ops->irq_set_state
RTC: Initialize kernel state from RTC
Linus Torvalds [Wed, 16 Mar 2011 01:53:35 +0000 (18:53 -0700)]
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits)
posix-clocks: Check write permissions in posix syscalls
hrtimer: Remove empty hrtimer_init_hres_timer()
hrtimer: Update hrtimer->state documentation
hrtimer: Update base[CLOCK_BOOTTIME].offset correctly
timers: Export CLOCK_BOOTTIME via the posix timers interface
timers: Add CLOCK_BOOTTIME hrtimer base
time: Extend get_xtime_and_monotonic_offset() to also return sleep
time: Introduce get_monotonic_boottime and ktime_get_boottime
hrtimers: extend hrtimer base code to handle more then 2 clockids
ntp: Remove redundant and incorrect parameter check
mn10300: Switch do_timer() to xtimer_update()
posix clocks: Introduce dynamic clocks
posix-timers: Cleanup namespace
posix-timers: Add support for fd based clocks
x86: Add clock_adjtime for x86
posix-timers: Introduce a syscall for clock tuning.
time: Splitout compat timex accessors
ntp: Add ADJ_SETOFFSET mode bit
time: Introduce timekeeping_inject_offset
posix-timer: Update comment
...
Fix up new system-call-related conflicts in
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/syscall_table_32.S
(name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some
due to movement of get_jiffies_64() in:
kernel/time.c
Linus Torvalds [Wed, 16 Mar 2011 01:37:30 +0000 (18:37 -0700)]
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
sched: Resched proper CPU on yield_to()
sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy
sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks
sched: Clean up the IRQ_TIME_ACCOUNTING code
sched: Add #ifdef around irq time accounting functions
sched, autogroup: Stop claiming ownership of the root task group
sched, autogroup: Stop going ahead if autogroup is disabled
sched, autogroup, sysctl: Use proc_dointvec_minmax() instead
sched: Fix the group_imb logic
sched: Clean up some f_b_g() comments
sched: Clean up remnants of sd_idle
sched: Wholesale removal of sd_idle logic
sched: Add yield_to(task, preempt) functionality
sched: Use a buddy to implement yield_task_fair()
sched: Limit the scope of clear_buddies
sched: Check the right ->nr_running in yield_task_fair()
sched: Avoid expensive initial update_cfs_load(), on UP too
sched: Fix switch_from_fair()
sched: Simplify the idle scheduling class
softirqs: Account ksoftirqd time as cpustat softirq
...
Linus Torvalds [Wed, 16 Mar 2011 01:31:30 +0000 (18:31 -0700)]
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits)
perf probe: Clean up probe_point_lazy_walker() return value
tracing: Fix irqoff selftest expanding max buffer
tracing: Align 4 byte ints together in struct tracer
tracing: Export trace_set_clr_event()
tracing: Explain about unstable clock on resume with ring buffer warning
ftrace/graph: Trace function entry before updating index
ftrace: Add .ref.text as one of the safe areas to trace
tracing: Adjust conditional expression latency formatting.
tracing: Fix event alignment: skb:kfree_skb
tracing: Fix event alignment: mce:mce_record
tracing: Fix event alignment: kvm:kvm_hv_hypercall
tracing: Fix event alignment: module:module_request
tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
tracing: Remove lock_depth from event entry
perf header: Stop using 'self'
perf session: Use evlist/evsel for managing perf.data attributes
perf top: Don't let events to eat up whole header line
perf top: Fix events overflow in top command
ring-buffer: Remove unused #include <linux/trace_irq.h>
tracing: Add an 'overwrite' trace_option.
...
Linus Torvalds [Wed, 16 Mar 2011 01:23:52 +0000 (18:23 -0700)]
Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
futex: Deobfuscate handle_futex_death()
plist: Add priority list test
plist: Shrink struct plist_head
futex,plist: Remove debug lock assignment from plist_node
futex,plist: Pass the real head of the priority list to plist_del()
futex: Sanitize futex ops argument types
futex: Sanitize cmpxchg_futex_value_locked API
futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic()
futex: Avoid redudant evaluation of task_pid_vnr()
futex: Update futex_wait_setup comments about locking
Linus Torvalds [Wed, 16 Mar 2011 01:23:25 +0000 (18:23 -0700)]
Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
debugobjects: Add hint for better object identification
Linus Torvalds [Tue, 15 Mar 2011 22:48:13 +0000 (15:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
tidy the trailing symlinks traversal up
Turn resolution of trailing symlinks iterative everywhere
simplify link_path_walk() tail
Make trailing symlink resolution in path_lookupat() iterative
update nd->inode in __do_follow_link() instead of after do_follow_link()
pull handling of one pathname component into a helper
fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
readlinkat(), fchownat() and fstatat() with empty relative pathnames
Allow O_PATH for symlinks
New kind of open files - "location only".
ext4: Copy fs UUID to superblock
ext3: Copy fs UUID to superblock.
vfs: Export file system uuid via /proc/<pid>/mountinfo
unistd.h: Add new syscalls numbers to asm-generic
x86: Add new syscalls for x86_64
x86: Add new syscalls for x86_32
fs: Remove i_nlink check from file system link callback
fs: Don't allow to create hardlink for deleted file
vfs: Add open by file handle support
...
Trond Myklebust [Tue, 15 Mar 2011 17:36:43 +0000 (13:36 -0400)]
VFS: Fix the nfs sillyrename regression in kernel 2.6.38
The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename
because the latter relies on being able to determine the parent
directory of the dentry in the ->iput() callback in order to send the
appropriate unlink rpc call.
Looking at the code that cares about races with dput(), there doesn't
seem to be anything that specifically uses d_parent as a test for
whether or not there is a race:
- __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent
- shrink_dcache_for_umount() is safe since nothing else can rearrange
the dentries in that super block.
- have_submount(), select_parent() and d_genocide() can test for a
deletion if we set the DCACHE_DISCONNECTED flag when the dentry
is removed from the parent's d_subdirs list.
Linus Torvalds [Tue, 15 Mar 2011 22:29:21 +0000 (15:29 -0700)]
dcache.c: create helper function for duplicated functionality
This creates a helper function for he "try to ascend into the parent
directory" case, which was written out in triplicate before. With all
the locking and subtle sequence number stuff, we really don't want to
duplicate that kind of code.
Al Viro [Tue, 15 Mar 2011 02:20:34 +0000 (22:20 -0400)]
tidy the trailing symlinks traversal up
* pull the handling of current->total_link_count into
__do_follow_link()
* put the common "do ->put_link() if needed and path_put() the link"
stuff into a helper (put_link(nd, link, cookie))
* rename __do_follow_link() to follow_link(), while we are at it
Al Viro [Tue, 15 Mar 2011 01:54:55 +0000 (21:54 -0400)]
Turn resolution of trailing symlinks iterative everywhere
The last remaining place (resolution of nested symlink) converted
to the loop of the same kind we have in path_lookupat() and
path_openat().
Note that we still *do* have a recursion in pathname resolution;
can't avoid it, really. However, it's strictly for nested symlinks
now - i.e. ones in the middle of a pathname.
link_path_walk() has lost the tail now - it always walks everything
except the last component.
do_follow_link() renamed to nested_symlink() and moved down.
Al Viro [Tue, 15 Mar 2011 01:28:04 +0000 (21:28 -0400)]
simplify link_path_walk() tail
Now that link_path_walk() is called without LOOKUP_PARENT
only from do_follow_link(), we can simplify the checks in
last component handling. First of all, checking if we'd
arrived to a directory is not needed - the caller will check
it anyway. And LOOKUP_FOLLOW is guaranteed to be there,
since we only get to that place with nd->depth > 0.
Al Viro [Sun, 13 Mar 2011 23:58:58 +0000 (19:58 -0400)]
pull handling of one pathname component into a helper
new helper: walk_component(). Handles everything except symlinks;
returns negative on error, 0 on success and 1 on symlinks we decided
to follow. Drops out of RCU mode on such symlinks.
link_path_walk() and do_last() switched to using that.
fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
We don't want to allow creation of private hardlinks by different application
using the fd passed to them via SCM_RIGHTS. So limit the null relative name
usage in linkat syscall to CAP_DAC_READ_SEARCH
Ingo Molnar [Tue, 15 Mar 2011 19:51:09 +0000 (20:51 +0100)]
perf probe: Clean up probe_point_lazy_walker() return value
Newer compilers (gcc 4.6) complains about:
return ret < 0 ?: 0;
For the following reason:
util/probe-finder.c: In function ‘probe_point_lazy_walker’:
util/probe-finder.c:1331:18: error: the omitted middle operand in ?: will always be ‘true’, suggest explicit middle operand [-Werror=parentheses]
And indeed the return value is a somewhat obscure (but correct) value
of 'true', so return 'ret' instead - this is cleaner and unconfuses
GCC as well.
Linus Torvalds [Tue, 15 Mar 2011 17:59:09 +0000 (10:59 -0700)]
Merge branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
xen: suspend: remove xen_hvm_suspend
xen: suspend: pull pre/post suspend hooks out into suspend_info
xen: suspend: move arch specific pre/post suspend hooks into generic hooks
xen: suspend: refactor non-arch specific pre/post suspend hooks
xen: suspend: add "arch" to pre/post suspend hooks
xen: suspend: pass extra hypercall argument via suspend_info struct
xen: suspend: refactor cancellation flag into a structure
xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding
xen: switch to new schedop hypercall by default.
xen: use new schedop interface for suspend
xen: do not respond to unknown xenstore control requests
xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled
xen: PV on HVM: support PV spinlocks and IPIs
xen: make the ballon driver work for hvm domains
xen-blkfront: handle Xen major numbers other than XENVBD
xen: do not use xen_info on HVM, set pv_info name to "Xen HVM"
xen: no need to delay xen_setup_shutdown_event for hvm guests anymore
Linus Torvalds [Tue, 15 Mar 2011 17:49:16 +0000 (10:49 -0700)]
Merge branches 'stable/ia64', 'stable/blkfront-cleanup' and 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: ia64 build broken due to "xen: switch to new schedop hypercall by default."
* 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: Union the blkif_request request specific fields
* 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: annotate functions which only call into __init at start of day
xen p2m: annotate variable which appears unused
xen: events: mark cpu_evtchn_mask_p as __refdata
Linus Torvalds [Tue, 15 Mar 2011 17:47:56 +0000 (10:47 -0700)]
Merge branch 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: events: remove dom0 specific xen_create_msi_irq
xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq
xen: events: push set_irq_msi down into xen_create_msi_irq
xen: events: update pirq_to_irq in xen_create_msi_irq
xen: events: refactor xen_create_msi_irq slightly
xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ
xen: events: assume PHYSDEVOP_get_free_pirq exists
xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq
xen: events: return irq from xen_allocate_pirq_msi
xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi
xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available.
xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0
Linus Torvalds [Tue, 15 Mar 2011 17:47:16 +0000 (10:47 -0700)]
Merge branches 'stable/irq.rework' and 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well.
xen: Use IRQF_FORCE_RESUME
xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.
xen: Fix compile error introduced by "switch to new irq_chip functions"
xen: Switch to new irq_chip functions
xen: Remove stale irq_chip.end
xen: events: do not free legacy IRQs
xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges.
xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq
xen:events: move find_unbound_irq inside CONFIG_PCI_MSI
xen: handled remapped IRQs when enabling a pcifront PCI device.
genirq: Add IRQF_FORCE_RESUME
* 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code.
pci/xen: Cleanup: convert int** to int[]
pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq
xen-pcifront: Sanity check the MSI/MSI-X values
xen-pcifront: don't use flush_scheduled_work()
Linus Torvalds [Tue, 15 Mar 2011 17:32:15 +0000 (10:32 -0700)]
Merge branches 'stable/p2m-identity.v4.9.1' and 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/p2m-identity.v4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
xen/m2p: No need to catch exceptions when we know that there is no RAM
xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set.
xen/debugfs: Add 'p2m' file for printing out the P2M layout.
xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
xen/mmu: WARN_ON when racing to swap middle leaf.
xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN.
xen/mmu: Add the notion of identity (1-1) mapping.
xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
* 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow.
xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps.
x86: stop_machine_text_poke() should issue sync_core()
Intel Archiecture Software Developer's Manual section 7.1.3 specifies that a
core serializing instruction such as "cpuid" should be executed on _each_ core
before the new instruction is made visible.
Failure to do so can lead to unspecified behavior (Intel XMC erratas include
General Protection Fault in the list), so we should avoid this at all cost.
This problem can affect modified code executed by interrupt handlers after
interrupt are re-enabled at the end of stop_machine, because no core serializing
instruction is executed between the code modification and the moment interrupts
are reenabled.
Because stop_machine_text_poke performs the text modification from the first CPU
decrementing stop_machine_first, modified code executed in thread context is
also affected by this problem. To explain why, we have to split the CPUs in two
categories: the CPU that initiates the text modification (calls text_poke_smp)
and all the others. The scheduler, executed on all other CPUs after
stop_machine, issues an "iret" core serializing instruction, and therefore
handles core serialization for all these CPUs. However, the text modification
initiator can continue its execution on the same thread and access the modified
text without any scheduler call. Given that the CPU that initiates the code
modification is not guaranteed to be the one actually performing the code
modification, it falls into the XMC errata.
Q: Isn't this executed from an IPI handler, which will return with IRET (a
serializing instruction) anyway?
A: No, now stop_machine uses per-cpu workqueue, so that handler will be
executed from worker threads. There is no iret anymore.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20110303160137.GB1590@Krystal> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: <stable@kernel.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Al Viro [Sun, 13 Mar 2011 19:56:26 +0000 (15:56 -0400)]
readlinkat(), fchownat() and fstatat() with empty relative pathnames
For readlinkat() we simply allow empty pathname; it will fail unless
we have dfd equal to O_PATH-opened symlink, so we are outside of
POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH;
let the caller explicitly ask for such behaviour.
Al Viro [Sun, 13 Mar 2011 20:42:14 +0000 (16:42 -0400)]
Allow O_PATH for symlinks
At that point we can't do almost nothing with them. They can be opened
with O_PATH, we can manipulate such descriptors with dup(), etc. and
we can see them in /proc/*/{fd,fdinfo}/*.
We can't (and won't be able to) follow /proc/*/fd/* symlinks for those;
there's simply not enough information for pathname resolution to go on
from such point - to resolve a symlink we need to know which directory
does it live in.
We will be able to do useful things with them after the next commit, though -
readlinkat() and fchownat() will be possible to use with dfd being an
O_PATH-opened symlink and empty relative pathname. Combined with
open_by_handle() it'll give us a way to do realink-by-handle and
lchown-by-handle without messing with more redundant syscalls.
Al Viro [Sun, 13 Mar 2011 07:51:11 +0000 (03:51 -0400)]
New kind of open files - "location only".
New flag for open(2) - O_PATH. Semantics:
* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
* almost all operations on the resulting descriptors shall
fail with -EBADF. Exceptions are:
1) operations on descriptors themselves (i.e.
close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
fcntl(fd, F_SETFD, ...))
2) fcntl(fd, F_GETFL), for a common non-destructive way to
check if descriptor is open
3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
points of pathname resolution
* closing such descriptor does *NOT* affect dnotify or
posix locks.
* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself. Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).
fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them. That protects
existing code from dealing with those things.
There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.
fs: Don't allow to create hardlink for deleted file
Add inode->i_nlink == 0 check in VFS. Some of the file systems
do this internally. A followup patch will remove those instance.
This is needed to ensure that with link by handle we don't allow
to create hardlink of an unlinked file. The check also prevent a race
between unlink and link
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 14 Mar 2011 22:56:51 +0000 (18:56 -0400)]
New AT_... flag: AT_EMPTY_PATH
For name_to_handle_at(2) we'll want both ...at()-style syscall that
would be usable for non-directory descriptors (with empty relative
pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and
corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to
deal with the latter.
Linus Torvalds [Mon, 14 Mar 2011 22:20:39 +0000 (15:20 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300:
MN10300: atomic_read() should ensure it emits a load
MN10300: The SMP_ICACHE_INV_FLUSH_RANGE IPI command does not exist
MN10300: Proper use of macros get_user() in the case of incremented pointers
Linus Torvalds [Mon, 14 Mar 2011 22:20:12 +0000 (15:20 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (26 commits)
MIPS: Alchemy: Fix reset for MTX-1 and XXS1500
MIPS: MTX-1: Make au1000_eth probe all PHY addresses
MIPS: Jz4740: Add HAVE_CLK
MIPS: Move idle task creation to work queue
MIPS, Perf-events: Use unsigned delta for right shift in event update
MIPS, Perf-events: Work with the new callchain interface
MIPS, Perf-events: Fix event check in validate_event()
MIPS, Perf-events: Work with the new PMU interface
MIPS, Perf-events: Work with irq_work
MIPS: Fix always CONFIG_LOONGSON_UART_BASE=y
MIPS: Loongson: Fix potentially wrong string handling
MIPS: Fix GCC-4.6 'set but not used' warning in arch/mips/mm/init.c
MIPS: Fix GCC-4.6 'set but not used' warning in ieee754int.h
MIPS: Remove unused code from arch/mips/kernel/syscall.c
MIPS: Fix GCC-4.6 'set but not used' warning in signal*.c
MIPS: MSP: Fix MSP71xx bpci interrupt handler return value
MIPS: Select R4K timer lib for all MSP platforms
MIPS: Loongson: Remove ad-hoc cmdline default
MIPS: Clear the correct flag in sysmips(MIPS_FIXADE, ...).
MIPS: Add an unreachable return statement to satisfy buggy GCCs.
...
Linus Torvalds [Mon, 14 Mar 2011 22:19:09 +0000 (15:19 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: ce4100: Set pci ops via callback instead of module init
x86/mm: Fix pgd_lock deadlock
x86/mm: Handle mm_fault_error() in kernel space
x86: Don't check for BIOS corruption in first 64K when there's no need to
Linus Torvalds [Mon, 14 Mar 2011 22:17:07 +0000 (15:17 -0700)]
Revert "oom: oom_kill_process: fix the child_points logic"
This reverts the parent commit. I hate doing that, but it's generating
some discussion ("half of it is right"), and since I am planning on
doing the 2.6.38 release later today we can punt it to stable if
required. Let's not rock the boat right now.
Oleg Nesterov [Mon, 14 Mar 2011 19:05:30 +0000 (20:05 +0100)]
oom: oom_kill_process: fix the child_points logic
oom_kill_process() starts with victim_points == 0. This means that
(most likely) any child has more points and can be killed erroneously.
Also, "children has a different mm" doesn't match the reality, we should
check child->mm != t->mm. This check is not exactly correct if t->mm ==
NULL but this doesn't really matter, oom_kill_task() will kill them
anyway.
Note: "Kill all processes sharing p->mm" in oom_kill_task() is wrong
too.
Thomas Gleixner [Mon, 14 Mar 2011 19:00:30 +0000 (20:00 +0100)]
arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
commit 522d7dec(futex: Remove redundant pagefault_disable in
futex_atomic_cmpxchg_inatomic()) added a bogus comment.
/* Note that preemption is disabled by futex_atomic_cmpxchg_inatomic
* call sites. */
Bogus in two aspects:
1) pagefault_disable != preempt_disable even if the mechanism we use
is the same
2) we have a call site which deliberately does not disable pagefaults
as it wants the possible fault to be handled - though that has been
changed for consistency reasons now.
Sigh. I really should have seen that when committing the above. :(
Catched-by-and-rightfully-ranted-at-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1103141126590.2787@localhost6.localdomain6> Cc: Michel Lespinasse <walken@google.com> Cc: Darren Hart <darren@dvhart.com>
Thomas Gleixner [Mon, 14 Mar 2011 09:34:35 +0000 (10:34 +0100)]
futex: Deobfuscate handle_futex_death()
handle_futex_death() uses futex_atomic_cmpxchg_inatomic() without
disabling page faults. That's ok, but totally non obvious.
We don't hold locks so we actually can and want to fault here, because
the get_user() before futex_atomic_cmpxchg_inatomic() does not
guarantee a R/W mapping.
We could just add a big fat comment to explain this, but actually
changing the code so that the functionality is entirely clear is
better.
Use the helper function which disables page faults around the
futex_atomic_cmpxchg_inatomic() and handle a fault with a call to
fault_in_user_writeable() as all other places in the futex code do as
well.
Pointed-out-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <darren@dvhart.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com>
LKML-Reference: <alpine.LFD.2.00.1103141126590.2787@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Florian Fainelli [Mon, 21 Feb 2011 13:28:02 +0000 (14:28 +0100)]
MIPS: Alchemy: Fix reset for MTX-1 and XXS1500
Since commit 32fd6901 (MIPS: Alchemy: get rid of common/reset.c)
Alchemy-based boards use their own reset function. For MTX-1 and XXS1500,
the reset function pokes at the BCSR.SYSTEM_RESET register, but this does
not work. According to Bruno Randolf, this was not tested when written.
Previously, the generic au1000_restart() routine called the board specific
reset function, which for MTX-1 and XXS1500 did not work, but finally made
a jump to the reset vector, which really triggers a system restart. Fix
reboot for both targets by jumping to the reset vector.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2093/ Acked-by: Bruno Randolf <br1@einfach.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Florian Fainelli [Sun, 27 Feb 2011 18:53:53 +0000 (19:53 +0100)]
MIPS: MTX-1: Make au1000_eth probe all PHY addresses
When au1000_eth probes the MII bus for PHY address, if we do not set
au1000_eth platform data's phy_search_highest_address, the MII probing
logic will exit early and will assume a valid PHY is found at address 0.
For MTX-1, the PHY is at address 31, and without this patch, the link
detection/speed/duplex would not work correctly.
Maksim Rayskiy [Sat, 12 Feb 2011 18:21:32 +0000 (10:21 -0800)]
MIPS: Move idle task creation to work queue
To avoid forking usermode thread when creating an idle task, move fork_idle
to a work queue.
If kernel starts with maxcpus= option which does not bring all available
cpus online at boot time, idle tasks for offline cpus are not created. If
later offline cpus are hotplugged through sysfs, __cpu_up is called in
the context of the user task, and fork_idle copies its non-zero mm
pointer. This causes BUG() in per_cpu_trap_init.
This also avoids issues with resource limits of the CPU writing to sysfs,
containers, maybe others.
Signed-off-by: Maksim Rayskiy <mrayskiy@broadcom.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2070/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Hardware performance counters on ARM are 32-bits wide but atomic64_t
variables are used to represent counter data in the hw_perf_event structure.
The armpmu_event_update function right-shifts a signed 64-bit delta variable
and adds the result to the event count. This can lead to shifting in sign-bits
if the MSB of the 32-bit counter value is set. This results in perf output
such as:
- Most archs use one callchain buffer per cpu, except x86 that needs
to deal with NMIs. Provide a default perf_callchain_buffer()
implementation that x86 overrides.
- Centralize all the kernel/user regs handling and invoke new arch
handlers from there: perf_callchain_user() / perf_callchain_kernel()
That avoid all the user_mode(), current->mm checks and so...
- Invert some parameters in perf_callchain_*() helpers: entry to the
left, regs to the right, following the traditional (dst, src).
Drop the TASK_RUNNING test on user tasks for callchains as
this check doesn't seem to make any sense.
Also remove the tests for !current that is not supposed to
happen and current->pid as this should be handled at the
generic level, with exclude_idle attribute.
The validate_event function in the ARM perf events backend has the
following problems:
1.) Events that are disabled count towards the cost.
2.) Events associated with other PMUs [for example, software events or
breakpoints] do not count towards the cost, but do fail validation,
causing the group to fail.
This patch changes validate_event so that it ignores events in the
PERF_EVENT_STATE_OFF state or that are scheduled for other PMUs.
Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
To: a.p.zijlstra@chello.nl
To: fweisbec@gmail.com
To: will.deacon@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: wuzhangjin@gmail.com Cc: paulus@samba.org Cc: mingo@elte.hu Cc: acme@redhat.com Cc: dengcheng.zhu@gmail.com Cc: matt@console-pimps.org Cc: sshtylyov@mvista.com Cc: ddaney@caviumnetworks.com
Patchwork: http://patchwork.linux-mips.org/patch/2013/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.
The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.
This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).
It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).
The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:
1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state
2) We store the counter state and ignore all read/overflow events
For MIPSXX, the stopped state is implemented in the way of 1.b as above.
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.
Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.
The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.
Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
For MIPSXX, we need to call irq_work_run() at the tail of the perf IRQ
handler as described above.
Reported-by: Wu Zhangjin <wuzhangjin@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
To: fweisbec@gmail.com
To: will.deacon@arm.com Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: paulus@samba.org Cc: mingo@elte.hu Cc: acme@redhat.com Cc: matt@console-pimps.org Cc: sshtylyov@mvista.com,
Patchwork: http://patchwork.linux-mips.org/patch/2011/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Mon, 24 Jan 2011 22:51:36 +0000 (14:51 -0800)]
MIPS: Fix GCC-4.6 'set but not used' warning in ieee754int.h
GCC-4.6 can find more unused code than previous versions could.
In the case of arch/mips/math-emu/ieee754int.h, the COMPXSP and
COMPXDP macros are used in several places, but a couple of them leave
xs unused. The easiest thing to do is mark it as __maybe_unused to
quiet the warning.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2032/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Mon, 24 Jan 2011 22:51:34 +0000 (14:51 -0800)]
MIPS: Fix GCC-4.6 'set but not used' warning in signal*.c
GCC-4.6 can find more unused code than previous versions could.
In the case of protected_restore_fp_context{,32}, the variable tmp is
really used. Its use is tricky in that we really care about the side
effects of the __put_user() calls. So we must mark tmp with
__maybe_unused to quiet the warning.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2035/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Anoop P A [Thu, 18 Nov 2010 10:32:50 +0000 (16:02 +0530)]
MIPS: MSP: Fix MSP71xx bpci interrupt handler return value
Signed-off-by: Anoop P A <anoop.pa@gmail.com>
To: Ben Hutchings <ben@decadent.org.uk>
To: linux-mips@linux-mips.org
To: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1804/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Anoop P A [Thu, 18 Nov 2010 08:12:28 +0000 (13:42 +0530)]
MIPS: Select R4K timer lib for all MSP platforms
Signed-off-by: Anoop P A <anoop.pa@gmail.com>
To: linux-mips@linux-mips.org
To: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1803/ Tested-by: Shane McDonald <mcdonald.shane@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Robert Millan [Sun, 7 Nov 2010 12:38:29 +0000 (13:38 +0100)]
MIPS: Loongson: Remove ad-hoc cmdline default
Loongson builds have an ad-hoc cmdline default of "console=ttyS0,115200
root=/dev/hda1". These settings come from a vendor; I remember builds
from Lemote branch requiring a "console=tty" override in order to get a
working console.
At least on Yeeloong, they're particularly useless: there's no external
serial port, and the IDE drive is now recognised as /dev/sda.
Signed-off-by: Robert Millan <rmh@gnu.org>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1759/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Wu Zhangjin [Fri, 21 Jan 2011 18:01:53 +0000 (02:01 +0800)]
MIPS, Tracing: Fix set_graph_function of function graph tracer
trace.func should be set to the recorded ip of the mcount calling site
in the __mcount_loc section to filter the function entries configured
through the tracing/set_graph_function interface, but before, this is
set to the self_ra(the return address of mcount), which has made
set_graph_function not work as expected.
This fixes it via calculating the right recorded ip in the __mcount_loc
section and assign it to trace.func.