git.karo-electronics.de Git - mv-sheeva.git/log

KVM: add cast within kvm_clear_guest_page to fix warning

Fixes this:

CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_clear_guest_page':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1224:2: warning: passing argument 3 of 'kvm_write_guest_page' makes pointer from integer without a cast
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1185:5: note: expected 'const void *' but argument is of type 'long unsigned int'

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: use kmalloc() for small dirty bitmaps

Currently we are using vmalloc() for all dirty bitmaps even if
they are small enough, say less than K bytes.

We use kmalloc() if dirty bitmap size is less than or equal to
PAGE_SIZE so that we can avoid vmalloc area usage for VGA.

This will also make the logging start/stop faster.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: pre-allocate one more dirty bitmap to avoid vmalloc()

Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.

This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.

Performance improvement:
  I measured performance for the case of VGA update by trace-cmd.
  The result was 1.5 times faster than the original one.

  In the case of live migration, the improvement ratio depends on the workload
  and the guest memory size. In general, the larger the memory size is the more
  benefits we get.

Note:
  This does not change other architectures's logic but the allocation size
  becomes twice. This will increase the actual memory consumption only when
  the new size changes the number of pages allocated by vmalloc().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: introduce wrapper functions for creating/destroying dirty bitmaps

This makes it easy to change the way of allocating/freeing dirty bitmaps.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: x86: trace "exit to userspace" event

Add tracepoint for userspace exit.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: propagate fault r/w information to gup(), allow read-only memory

As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: MMU: flush TLBs on writable -> read-only spte overwrite

This can happen in the following scenario:

vcpu0 vcpu1
read fault
gup(.write=0)
gup(.write=1)
reuse swap cache, no COW
set writable spte
use writable spte
set read-only spte

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: MMU: remove kvm_mmu_set_base_ptes

Unused.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: VMX: remove setting of shadow_base_ptes for EPT

The EPT present/writable bits use the same position as normal
pagetable bits.

Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.

Also pass present/writable error code information from EPT violation
to generic pagefault handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: Avoid double interrupt injection with vapic

After an interrupt injection, the PPR changes, and we have to reflect that
into the vapic. This causes a KVM_REQ_EVENT to be set, which causes the
whole interrupt injection routine to be run again (harmlessly).

Optimize by only setting KVM_REQ_EVENT if the ppr was lowered; otherwise
there is no chance that a new injection is needed.

Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers

This abstraction only serves to obfuscate. Remove.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path

ldt is never used in the kernel context; same goes for fs (x86_64) and gs
(i386). So save/restore them in the heavyweight exit path instead
of the lightweight path.

By itself, this doesn't buy us much, but it paves the way for moving vmload
and vmsave to the heavyweight exit path, since they modify the same registers.

[jan: fix copy/pase mistake on i386]

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: SVM: Move svm->host_gs_base into a separate structure

More members will join it soon.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: SVM: Move guest register save out of interrupts disabled section

Saving guest registers is just a memory copy, and does not need to be in the
critical section. Move outside the critical section to improve latency a
bit.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: x86: Add missing inline tag to kvm_read_and_reset_pf_reason

May otherwise generates build warnings about unused
kvm_read_and_reset_pf_reason if included without CONFIG_KVM_GUEST
enabled.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Move KVM context switch into own function

gcc 4.5 with some special options is able to duplicate the VMX
context switch asm in vmx_vcpu_run(). This results in a compile error
because the inline asm sequence uses an on local label. The non local
label is needed because other code wants to set up the return address.

This patch moves the asm code into an own function and marks
that explicitely noinline to avoid this problem.

Better would be probably to just move it into an .S file.

The diff looks worse than the change really is, it's all just
code movement and no logic change.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: x86: Mark kvm_arch_setup_async_pf static

It has no user outside mmu.c and also no prototype.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: improve hva_to_pfn() readability

Improve vma handling code readability in hva_to_pfn() and fix
async pf handling code to properly check vma returned by find_vma().

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Send async PF when guest is not in userspace too.

If guest indicates that it can handle async pf in kernel mode too send
it, but only if interrupts are enabled.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Let host know whether the guest can handle async PF in non-userspace context.

If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM paravirt: Handle async PF in non preemptable context

If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Inject asynchronous page fault into a PV guest if page is swapped out.

Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.

Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Handle async PF in a guest.

When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM paravirt: Add async PF initialization to PV guest.

Enable async PF in a guest if async PF capability is discovered.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Add PV MSR to enable asynchronous page faults delivery.

Guest enables async PF vcpu functionality using this MSR.

Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM paravirt: Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c.

Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Add memory slot versioning and use it to provide fast guest write interface

Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Retry fault before vmentry

When page is swapped in it is mapped into guest memory only after guest
tries to access it again and generate another fault. To save this fault
we can map it immediately since we know that guest is going to access
the page. Do it only when tdp is enabled for now. Shadow paging case is
more complicated. CR[034] and EFER registers should be switched before
doing mapping and then switched back.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Halt vcpu if page it tries to access is swapped out

If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

[avi: remove call to get_user_pages_noio(), nacked by Linus; this
makes everything synchrnous again]

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: Don't reset mmu context unnecessarily when updating EFER

The only bit of EFER that affects the mmu is NX, and this is already
accounted for (LME only takes effect when changing cr0).

Based on a patch by Hillf Danton.

Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: i8259: initialize isr_ack

isr_ack is never initialized. So, until the first PIC reset, interrupts
may fail to be injected. This can cause Windows XP to fail to boot, as
reported in the fallout from the fix to
https://bugzilla.kernel.org/show_bug.cgi?id=21962.

Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow

We use the physical address instead of the base gfn for the four
PAE page directories we use in unpaged mode. When the guest accesses
an address above 1GB that is backed by a large host page, a BUG_ON()
in kvm_mmu_set_gfn() triggers.

Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=21962
Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com>
KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: handle rt_sigreturn() more cleanly
arch/tile: handle CLONE_SETTLS in copy_thread(), not user space

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
MIPS: Fix build errors in sc-mips.c

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86: avoid high BIOS area when allocating address space
  x86: avoid E820 regions when allocating address space
  x86: avoid low BIOS area when allocating address space
  resources: add arch hook for preventing allocation in reserved areas
  Revert "resources: support allocating space within a region from the top down"
  Revert "PCI: allocate bus resources from the top down"
  Revert "x86/PCI: allocate space from the end of a region, not the beginning"
  Revert "x86: allocate space within a region top-down"
  Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
  PCI: Update MCP55 quirk to not affect non HyperTransport variants

arch/tile: handle rt_sigreturn() more cleanly

The current tile rt_sigreturn() syscall pattern uses the common idiom
of loading up pt_regs with all the saved registers from the time of
the signal, then anticipating the fact that we will clobber the ABI
"return value" register (r0) as we return from the syscall by setting
the rt_sigreturn return value to whatever random value was in the pt_regs
for r0.

However, this breaks in our 64-bit kernel when running "compat" tasks,
since we always sign-extend the "return value" register to properly
handle returned pointers that are in the upper 2GB of the 32-bit compat
address space.  Doing this to the sigreturn path then causes occasional
random corruption of the 64-bit r0 register.

Instead, we stop doing the crazy "load the return-value register"
hack in sigreturn.  We already have some sigreturn-specific assembly
code that we use to pass the pt_regs pointer to C code.  We extend that
code to also set the link register to point to a spot a few instructions
after the usual syscall return address so we don't clobber the saved r0.
Now it no longer matters what the rt_sigreturn syscall returns, and the
pt_regs structure can be cleanly and completely reloaded.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>

arch/tile: handle CLONE_SETTLS in copy_thread(), not user space

Previously we were just setting up the "tp" register in the
new task as started by clone() in libc.  However, this is not
quite right, since in principle a signal might be delivered to
the new task before it had its TLS set up.  (Of course, this race
window still exists for resetting the libc getpid() cached value
in the new task, in principle.  But in any case, we are now doing
this exactly the way all other architectures do it.)

This change is important for 2.6.37 since the tile glibc we will
be submitting upstream will not set TLS in user space any more,
so it will only work on a kernel that has this fix.  It should
also be taken for 2.6.36.x in the stable tree if possible.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: stable <stable@kernel.org>

MIPS: Fix build errors in sc-mips.c

Seen with malta_defconfig on Linus' tree:

CC arch/mips/mm/sc-mips.o
arch/mips/mm/sc-mips.c: In function 'mips_sc_is_activated':
arch/mips/mm/sc-mips.c:77: error: 'config2' undeclared (first use in this function)
arch/mips/mm/sc-mips.c:77: error: (Each undeclared identifier is reported only once
arch/mips/mm/sc-mips.c:77: error: for each function it appears in.)
arch/mips/mm/sc-mips.c:81: error: 'tmp' undeclared (first use in this function)
make[2]: *** [arch/mips/mm/sc-mips.o] Error 1
make[1]: *** [arch/mips/mm] Error 2
make: *** [arch/mips] Error 2

[Ralf: Cosmetic changes to minimize the number of arguments passed to
mips_sc_is_activated]

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/1752/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

x86: avoid high BIOS area when allocating address space

This prevents allocation of the last 2MB before 4GB.

The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27

This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

x86: avoid E820 regions when allocating address space

When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.

On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS.  On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,

    BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
    pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]

If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory.  This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.

I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below).  That means we're vulnerable to BIOS defects that Windows would not
trip over.  For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

x86: avoid low BIOS area when allocating address space

This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas. This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.

We previously avoided this area in pcibios_align_resource(). This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

resources: add arch hook for preventing allocation in reserved areas

This adds arch_remove_reservations(), which an arch can implement if it
needs to protect part of the address space from allocation.

Sometimes that can be done by just putting a region in the resource tree,
but there are cases where that doesn't work well. For example, x86 BIOS
E820 reservations are not related to devices, so they may overlap part of,
all of, or more than a device resource, so they may not end up at the
correct spot in the resource tree.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Revert "resources: support allocating space within a region from the top down"

This reverts commit e7f8567db9a7f6b3151b0b275e245c1cef0d9c70.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Revert "PCI: allocate bus resources from the top down"

This reverts commit b126b4703afa4010b161784a43650337676dd03b.

We're going back to the old behavior of allocating from bus resources
in _CRS order.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Revert "x86/PCI: allocate space from the end of a region, not the beginning"

This reverts commit dc9887dc02e37bcf83f4e792aa14b07782ef54cf.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Revert "x86: allocate space within a region top-down"

This reverts commit 1af3c2e45e7a641e774bbb84fa428f2f0bf2d9c9.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"

This reverts commit 82e3e767c21fef2b1b38868e20eb4e470a1e38e3.

We're going back to considering bus resources in the order we found
them (in _CRS order, when we're using _CRS), so we don't need to
define any ordering.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Merge branch 'for_linus' of git://github.com/at91linux/linux-2.6-at91

* 'for_linus' of git://github.com/at91linux/linux-2.6-at91:
at91: Refactor Stamp9G20 and PControl G20 board file
at91: Fix uhpck clock rate in upll case

Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Fix preemption counter leak in kvm_timer_init()
  KVM: enlarge number of possible CPUID leaves
  KVM: SVM: Do not report xsave in supported cpuid
  KVM: Fix OSXSAVE after migration

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / Runtime: Fix pm_runtime_suspended()
  PM / Hibernate: Restore old swap signature to avoid user space breakage
  PM / Hibernate: Fix PM_POST_* notification with user-space suspend

Merge branch 'bkl_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6

* 'bkl_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  [media] uvcvideo: Convert to unlocked_ioctl
  [media] uvcvideo: Lock stream mutex when accessing format-related information
  [media] uvcvideo: Move mmap() handler to uvc_queue.c
  [media] uvcvideo: Move mutex lock/unlock inside uvc_free_buffers
  [media] uvcvideo: Lock controls mutex when querying menus
  [media] v4l2-dev: fix race condition
  [media] V4L: improve the BKL replacement heuristic
  [media] v4l2-dev: use mutex_lock_interruptible instead of plain mutex_lock
  [media] cx18: convert to unlocked_ioctl
  [media] radio-timb: convert to unlocked_ioctl
  [media] sh_vou: convert to unlocked_ioctl
  [media] cafe_ccic: replace ioctl by unlocked_ioctl
  [media] et61x251_core: trivial conversion to unlocked_ioctl
  [media] sn9c102: convert to unlocked_ioctl
  [media] BKL: trivial ioctl -> unlocked_ioctl video driver conversions
  [media] typhoon: convert to unlocked_ioctl
  [media] si4713: convert to unlocked_ioctl
  [media] tea5764: convert to unlocked_ioctl
  [media] cadet: use unlocked_ioctl
  [media] BKL: trivial BKL removal from V4L2 radio drivers

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix conflict of Mic Boot controls
  ALSA: HDA: Enable subwoofer on Asus G73Jw
  ALSA: HDA: Fix auto-mute on Lenovo Edge 14
  ASoC: Fix bias power down of non-DAPM codec
  ASoC: WM8580: Fix R8 initial value
  ASoC: fix deemphasis control in wm8904/55/60 codecs

Merge branch 'fix/asoc' into for-linus

Merge branch 'fix/hda' into for-linus

ALSA: hda - Fix conflict of Mic Boot controls

Due to the recent change for multiple mics assignment, we need to handle
the index of each Mic Boost control respectively. Otherwise the driver
gets the control element conflicts, and gives the unsable state.

Reference: kernel bug 25002
https://bugzilla.kernel.org/show_bug.cgi?id=25002

Reported-and-tested-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

at91: Refactor Stamp9G20 and PControl G20 board file

As PControl G20 is a carrier board for the Stamp9G20 SoM, some code can
be shared. Therefore board-stamp9g20.c is refactored to allow reusing the
SoM initialization and board-pcontrol-g20.c is modified to use it.

Signed-off-by: Christian Glindkamp <christian.glindkamp@taskit.de>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

at91: Fix uhpck clock rate in upll case

The uhpck clock should be divided from the utmi clock, not its parent
(main). This change is mostly cosmetic as the uhpck rate value is not
used anywhere except for the debugfs clock output.

Signed-off-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  fanotify: fill in the metadata_len field on struct fanotify_event_metadata
  fanotify: split version into version and metadata_len
  fanotify: Dont try to open a file descriptor for the overflow event
  fanotify: Introduce FAN_NOFD
  fanotify: do not leak user reference on allocation failure
  inotify: stop kernel memory leak on file creation failure
  fanotify: on group destroy allow all waiters to bypass permission check
  fanotify: Dont allow a mask of 0 if setting or removing a mark
  fanotify: correct broken ref counting in case adding a mark failed
  fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called
  fanotify: remove packed from access response message
  fanotify: deny permissions when no event was sent

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (28 commits)
  MIPS: Add a CONFIG_FORCE_MAX_ZONEORDER Kconfig option.
  MIPS: LD/SD o32 macro GAS fix update
  MIPS: Alchemy: fix build with SERIAL_8250=n
  MIPS: Rename mips_dma_cache_sync back to dma_cache_sync
  MIPS: MT: Fix typo in comment.
  SSB: Fix nvram_get on BCM47xx platform
  MIPS: BCM47xx: Swap serial console if ttyS1 was specified.
  MIPS: BCM47xx: Use sscanf for parsing mac address
  MIPS: BCM47xx: Fill values for b43 into SSB sprom
  MIPS: BCM47xx: Do not read config from CFE
  MIPS: FDT size is a be32
  MIPS: Fix CP0 COUNTER clockevent race
  MIPS: Fix regression on BCM4710 processor detection
  MIPS: JZ4740: Fix pcm device name
  MIPS: Separate two consecutive loads in memset.S
  MIPS: Send proper signal and siginfo on FP emulator faults.
  MIPS: AR7: Fix loops per jiffies on TNETD7200 devices
  MIPS: AR7: Fix double ar7_gpio_init declaration
  MIPS: Rework GENERIC_HARDIRQS Kconfig.
  MIPS: Alchemy: Add return value check for strict_strtoul()
  ...

PCI: Update MCP55 quirk to not affect non HyperTransport variants

I wrote this quirk awhile ago to properly setup MCP55 chips on hypertransport
busses so that interrupts reached whatever cpu happend to boot the kdump kernel.
while that works well, it was recently shown to me that a a non-hypertransport
variant of the MCP55 exists, and on those system the register that this quirk
manipulates causes hangs if you write to it. Since the quirk was only meant to
handle errors found on MCP55 chips that have a HT interface, this patch adds a
filter to make sure the chip is an HT capable before making the needed register
adjustment. This lets the broken MCP55s work with kdump while not breaking the
non-HT variants.

Resolves https://bugzilla.kernel.org/show_bug.cgi?id=23952

Tested successfully by the reporter and myself.

Cc: stable@kernel.org
Reported-by: Mathieu Bérard <mathieu@mberard.eu>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

MIPS: Add a CONFIG_FORCE_MAX_ZONEORDER Kconfig option.

For huge page support with base page size of 16K or 32K, we have to
increase the MAX_ORDER so that huge pages can be allocated.

[Ralf: I don't think a user should have to configure obscure constants like
this but for the time being this will have to suffice.]

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1685/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: LD/SD o32 macro GAS fix update

I am about to commit:

http://sourceware.org/ml/binutils/2010-10/msg00033.html

that fixes a problem with the LD/SD macro currently implemented by GAS for
the o32 ABI in an inconsistent way.  This is best illustrated with a
simple program, which I'm copying here from the message above for easier
reference:

$ cat ld.s
ld $5,32767($4)
ld $5,32768($4)

This gets assebled into the following output:

$ mips-linux-as -32 -mips3 -o ld.o ld.s
$ mips-linux-objdump -d ld.o

ld.o:     file format elf32-tradbigmips

Disassembly of section .text:

00000000 <.text>:
   0: dc857fff ld a1,32767(a0)
   4: 3c010001 lui at,0x1
   8: 00810821 addu at,a0,at
   c: 8c258000 lw a1,-32768(at)
  10: 8c268004 lw a2,-32764(at)
...

Oops!

The GAS fix makes the macro behave in a consistent way and pairs of LW/SW
instructions to be output as appropriate regardless of the size of the
offset associated with the address used.  The machine instruction is still
available, but to reach it macros have to be disabled first.  This has a
side effect of requiring the use of a machine-addressable memory operand.

As some platforms require 64-bit operations for accesses to some I/O
registers LD/SD instructions are used in a couple of places in Linux
regardless of the ABI selected.  Here's a fix for some pieces of code
affected I've been able to track down.  The fix should be backwards
compatible with all supported binutils releases in existence and can be
used as a reference for any other places or off-tree code.  The use of the
"R" constraint guarantees a machine-addressable operand.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1680/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Alchemy: fix build with SERIAL_8250=n

In commit 7d172bfe ("Alchemy: Add UART PM methods") I introduced
platform PM methods which call a function of the 8250 driver;
this patch works around link failures when the kernel is built
without 8250 support.

Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
To: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/1737/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Rename mips_dma_cache_sync back to dma_cache_sync

This fixes IP22 and IP28 build errors.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: MT: Fix typo in comment.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

SSB: Fix nvram_get on BCM47xx platform

The nvram_get function was never in the mainline kernel, it only existed in
an external OpenWrt patch. Use nvram_getenv function, which is in mainline
and use an include instead of an extra function declaration. et0macaddr
contains the mac address in text from like 00:11:22:33:44:55. We have to
parse it before adding it into macaddr.

nvram_parse_macaddr will be merged into asm/mach-bcm47xx/nvram.h through
the MIPS git tree and will be available soon. It will not build now without
nvram_parse_macaddr, but it hasn't before either.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: mb@bu3sch.de
Cc: netdev@vger.kernel.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Michael Buesch <mb@bu3sch.de>
Patchwork: https://patchwork.linux-mips.org/patch/1849/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: BCM47xx: Swap serial console if ttyS1 was specified.

Some devices like the Netgear WGT634U are using ttyS1 for default console
output. We should switch to that console if it was given in the kernel_args
parameters.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1848/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: BCM47xx: Use sscanf for parsing mac address

Instead of writing own function for parsing the mac address we now
use sscanf.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1847/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: BCM47xx: Fill values for b43 into SSB sprom

Fill the sprom with all available values from the nvram. Most of these
new values are needed for the b43 or b43legacy driver.

Parts of this patch have been in OpenWRT for a long time and were written
by Michael Buesch.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1846/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: BCM47xx: Do not read config from CFE

The config options read out here are not stored in CFE but only in NVRAM on
the devices. Remove reading from CFE and only access the NVRAM. Reading out
CFE does not harm but is useless here.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Patchwork: https://patchwork.linux-mips.org/patch/1845/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: FDT size is a be32

The totalsize field was be32. And the reserve bootmem would cause failure.

Signed-off-by: Thomas Chou <thomas@wytron.com.tw>
To: devicetree-discuss@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: grant.likely@secretlab.ca
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Dezhong Diao <dediao@cisco.com>
Patchwork: https://patchwork.linux-mips.org/patch/1838/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Fix CP0 COUNTER clockevent race

Consider the following test case:

write_c0_compare(read_c0_count());

Even if the counter doesn't increment during execution, this might not
generate an interrupt until the counter wraps around. The CPU may
perform the comparison each time CP0 COUNT increments, not when CP0
COMPARE is written.

If mips_next_event() is called with a very small delta, and CP0 COUNT
increments during the calculation of "cnt += delta", it is possible
that CP0 COMPARE will be written with the current value of CP0 COUNT.
If this is detected, the function should return -ETIME, to indicate
that the interrupt might not have actually gotten scheduled.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1836/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Fix regression on BCM4710 processor detection

BCM4710 uses the BMIPS32 core (like BCM6345), not the MIPS 4Kc core as
was previously believed.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Tested-by: Alexandros C. Couloumbis <alex@ozo.com>
Patchwork: https://patchwork.linux-mips.org/patch/1837/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: JZ4740: Fix pcm device name

As part the ASoC multi-component patch (commit f0fba2ad) the jz4740 pcm
driver was renamed to 'jz4740-pcm-audio'. Adjust the device name
accordingly.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1770/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Separate two consecutive loads in memset.S

partial_fixup is used in noreorder block.

Separating two consecutive loads can save one cycle on processors with
GPR intrelock and can fix load-use on processors that need a load delay slot.

Also do so for fwd_fixup.

[Ralf: Only R2000/R3000 class processors are lacking the the load-user
interlock and even some of those got it retrofitted. With R2000/R3000
being fairly uncommon these days the impact of this bug should be minor.]

Signed-off-by: Tony Wu <tung7970@gmail.com>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1768/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Send proper signal and siginfo on FP emulator faults.

We were unconditionally sending SIGBUS with an empty siginfo on FP
emulator faults. This differs from what happens when real floating
point hardware would get a fault.

For most faults we need to send SIGSEGV with the faulting address
filled in in the struct siginfo.

Reported-by: Camm Maguire <camm@maguirefamily.org>
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Cc: Camm Maguire <camm@maguirefamily.org>
Patchwork: https://patchwork.linux-mips.org/patch/1727/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: AR7: Fix loops per jiffies on TNETD7200 devices

TNETD7200 run their CPU clock faster than the default CPU clock we assume.
In order to have the correct loops per jiffies settings, initialize clocks right
before setting mips_hpt_frequency. As a side effect, we can no longer use
msleep in clocks.c which requires other parts of the kernel to be initialized,
so replace these with mdelay.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1749/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: AR7: Fix double ar7_gpio_init declaration

Signed-off-by: Florian Fainelli <florian@openwrt.org>
To: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1748/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Rework GENERIC_HARDIRQS Kconfig.

Recent changes to CONFIG_GENERIC_HARDIRQS have caused us to start getting:

warning: (SMP && SYS_SUPPORTS_SMP) selects IRQ_PER_CPU which has unmet direct dependencies (HAVE_GENERIC_HARDIRQS)

Rearranging our Kconfig quiets the message.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Patchwork: https://patchwork.linux-mips.org/patch/1757/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Alchemy: Add return value check for strict_strtoul()

arch/mips/alchemy/devboards/prom.c: In function 'prom_init':
arch/mips/alchemy/devboards/prom.c:60: error: ignoring return value of
'strict_strtoul', declared with attribute warn_unused_result

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/1761/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Loongson: Add return value check for strict_strtoul()

cc1: warnings being treated as errors
arch/mips/loongson/common/env.c: In function 'prom_init_env':
arch/mips/loongson/common/env.c:49: error: ignoring return value of 'strict_strtol', declared with attribute warn_unused_result
arch/mips/loongson/common/env.c:50: error: ignoring return value of 'strict_strtol', declared with attribute warn_unused_result
arch/mips/loongson/common/env.c:51: error: ignoring return value of 'strict_strtol', declared with attribute warn_unused_result
arch/mips/loongson/common/env.c:52: error: ignoring return value of 'strict_strtol', declared with attribute warn_unused_result

Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
Cc: linux-mips <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/1762/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: VPE loader: Check vmalloc return value in vpe_open

The return value of the vmalloc() call in arch/mips/kernel/vpe.c::vpe_open()
is not checked, so we potentially store a null pointer in v->pbuffer.  Add
a check for a null return and then return -ENOMEM in that case.

[Ralf: The check added by Jesper's original patch is where it logically
should be.  Adding it eleminated the need for the checks in a few other
places, so I removed them.  There still is a zillion of other things that
need to be fixed in this file / API.]

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1747/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: compat: Don't clobber personality bits in 32-bit sys_personality().

If PER_LINUX32 has been set on a 32-bit kernel, only twiddle with the
low-order personality bits, let the upper bits pass through.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Cc: Camm Maguire <camm@maguirefamily.org>
Patchwork: https://patchwork.linux-mips.org/patch/1751/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Don't clobber personality high bits.

The high bits of current->personality carry settings that we don't want to
clobber on each exec. Only clobber them if the lower bits that indicate
either PER_LINUX or PER_LINUX32 are invalid.

The clobbering prevents us from using useful bits like ADDR_NO_RANDOMIZE.

Reported-by: Camm Maguire <camm@maguirefamily.org>
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Camm Maguire <camm@maguirefamily.org>
Patchwork: https://patchwork.linux-mips.org/patch/1750/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: jz4740: Fix section mismatch in prom.c

This patch fixes the following section mismatch:

WARNING: arch/mips/built-in.o(.text+0xc): Section mismatch in reference from the
function jz4740_init_cmdline() to the variable .init.data:arcs_cmdline

While were at it, make jz4740_init_cmdline static as well.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1755/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: jz4740: qi_lb60: Fix gpio for the 6th row of the keyboard matrix

This patch fixes the gpio number for the 6th row of the keyboard matrix.

(And fixes a typo in my name...)

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: linux-mips@linux-mips.org
Cc: stable@kernel.org
Signed-off-by: https://patchwork.linux-mips.org/patch/1754/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Don't stomp on caller's ->regs[2] in copy_thread()

We never needed that (->regs[2] is overwritten on return from syscall paths
with return value of syscall, so storing it there early made no sense) and
with new restart logics since d27240bf7e61d2656de18e158ec910a902030847 it
has become really bad - we lose the original syscall number before the
place where we decide that we might need a syscall restart.

Note that for child we do need the assignment to regs[2] - it won't go
through the normal return from syscall path.

[Ralf: Issue found and reported by Lluís; initial investigations by me;
bug finally found and patch by Al; testing by me and Lluís.]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Lluís Batlle i Rossell <viriketo@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Swarm: Fix typo in symbol name: RTC_M4LT81 -> RTC_M41T81

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest: populate initial_page_table
  lguest: restore boot speed
  lguest: fix crash lguest_time_init

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix regression of garbage collection ioctl

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2
Input: wacom - add another Bamboo Pen ID (0xd4)

PM / Runtime: Fix pm_runtime_suspended()

There are some situations (e.g. in __pm_generic_call()), where
pm_runtime_suspended() is used to decide whether or not to execute
a device's (system) ->suspend() callback. The callback is not
executed if pm_runtime_suspended() returns true, but it does so
for devices that don't even support runtime PM, because the
power.disable_depth device field is ignored by it. This leads to
problems (i.e. devices are not suspened when they should), so rework
pm_runtime_suspended() so that it returns false if the device's
power.disable_depth field is different from zero.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org

PM / Hibernate: Restore old swap signature to avoid user space breakage

Commit 3624eb0 (PM / Hibernate: Modify signature used to mark swap)
attempted to modify hibernate signature used to mark swap partitions
containing hibernation images, so that old kernels don't try to
handle compressed images. However, this change broke resume from
hibernation on Fedora 14 that apparently doesn't pass the resume=
argument to the kernel and tries to trigger resume from early user
space. This doesn't work, because the signature is now different,
so the old signature has to be restored to avoid the problem.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22732 .

Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Zhang Rui <rui.zhang@intel.com>
Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Hibernate: Fix PM_POST_* notification with user-space suspend

The user-space hibernation sends a wrong notification after the image
restoration because of thinko for the file flag check. RDONLY
corresponds to hibernation and WRONLY to restoration, confusingly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org

KVM: Fix preemption counter leak in kvm_timer_init()

Based on a patch from Thomas Meyer.

Signed-off-by: Avi Kivity <avi@redhat.com>

lguest: populate initial_page_table

Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.

In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables. With
the first patch, the direct mapping tables used that memory:

Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000

I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...

2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.

This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir. So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.

For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org

lguest: restore boot speed

lguest is dumb and drops *all* the pagetables for set_pte (which is
only used for kernel mapping manipulation, so it's OK without highmem).

But it's used a lot in boot, too. As a guest optimization, we
suppressed this flushing until the first page switch. Now we have
initial_page_table, that happens much earlier, so extend the heuristic
to wait until we switch to something other than the swapper_pg_dir or
initial_page_table.

As measured on my laptop under kvm, this dropped the time-to-mount-root
from 48 seconds to 4.3 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

lguest: fix crash lguest_time_init

fe25c7fc2e "x86: lguest: Convert to new irq chip functions" converted
enable_lguest_irq() to take a struct irq_data *, but didn't fix the one
internal caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: x86@kernel.org

nilfs2: fix regression of garbage collection ioctl

On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the
commit 263d90cefc7d82a0 ("nilfs2: remove own inode hash used for GC"),
and leading to filesystem corruption.

The patch doesn't queue gc-inodes for log writer if they are reused
through the vfs inode cache.  Here, gc-inode is the inode which
buffers blocks to be relocated on GC.  That patch queues gc-inodes in
nilfs_init_gcinode() function, but this function is not called when
they don't have I_NEW flag.  Thus, some of live blocks are wrongly
overrode without being moved to new logs.

This resolves the problem by moving the gc-inode queueing to an outer
function to ensure it's done right.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

Linux 2.6.37-rc6