Jan Kiszka [Tue, 23 Feb 2010 16:47:58 +0000 (17:47 +0100)]
KVM: x86: Drop RF manipulation for guest single-stepping
RF is not required for injecting TF as the latter will trigger only
after an instruction execution anyway. So do not touch RF when arming or
disarming guest single-step mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:56 +0000 (17:47 +0100)]
KVM: SVM: Emulate nRIP feature when reinjecting INT3
When in guest debugging mode, we have to reinject those #BP software
exceptions that are caused by guest-injected INT3. As older AMD
processors do not support the required nRIP VMCB field, try to emulate
it by moving RIP past the instruction on exception injection. Fix it up
again in case the injection failed and we were able to catch this. This
does not work for unintercepted faults, but it is better than doing
nothing.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:55 +0000 (17:47 +0100)]
KVM: x86: Add kvm_is_linear_rip
Based on Gleb's suggestion: Add a helper kvm_is_linear_rip that matches
a given linear RIP against the current one. Use this for guest
single-stepping, more users will follow.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 22 Feb 2010 15:52:14 +0000 (16:52 +0100)]
KVM: PPC: Destory timer on vcpu destruction
When we destory a vcpu, we should also make sure to kill all pending
timers that could still be up. When not doing this, hrtimers might
dereference null pointers trying to call our code.
This patch fixes spontanious kernel panics seen after closing VMs.
Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Fri, 19 Feb 2010 18:38:07 +0000 (19:38 +0100)]
KVM: x86: Save&restore interrupt shadow mask
The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.
As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Mon, 15 Feb 2010 09:45:41 +0000 (10:45 +0100)]
KVM: x86: Do not return soft events in vcpu_events
To avoid that user space migrates a pending software exception or
interrupt, mask them out on KVM_GET_VCPU_EVENTS. Without this, user
space would try to reinject them, and we would have to reconstruct the
proper instruction length for VMX event injection. Now the pending event
will be reinjected via executing the triggering instruction again.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:01 +0000 (16:23 +0100)]
KVM: SVM: Fix wrong interrupt injection in enable_irq_windows
The nested_svm_intr() function does not execute the vmexit
anymore. Therefore we may still be in the nested state after
that function ran. This patch changes the nested_svm_intr()
function to return wether the irq window could be enabled.
Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 11:24:33 +0000 (12:24 +0100)]
KVM: PPC: Allocate vcpu struct using vmalloc
We used to use get_free_pages to allocate our vcpu struct. Unfortunately
that call failed on me several times after my machine had a big enough
uptime, as memory became too fragmented by then.
Fortunately, we don't need it to be page aligned any more! We can just
vmalloc it and everything's great.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:47 +0000 (11:00 +0100)]
KVM: PPC: Simplify kvmppc_load_up_(FPU|VMX|VSX)
We don't need as complex code. I had some thinkos while writing it, figuring
I needed to support PPC32 paths on PPC64 which would have required DR=0, but
everything just runs fine with DR=1.
So let's make the functions simple C call wrappers that reserve some space on
the stack for the respective functions to clobber.
Fixes out-of-RMA-access (and thus guest FPU loading) on the PS3.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:46 +0000 (11:00 +0100)]
KVM: PPC: Enable use of secondary htab bucket
We had code to make use of the secondary htab buckets, but kept that
disabled because it was unstable when I put it in.
I checked again if that's still the case and apparently it was only
exposing some instability that was there anyways before. I haven't
seen any badness related to usage of secondary htab entries so far.
This should speed up guest memory allocations by quite a bit, because
we now have more space to put PTEs in.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:44 +0000 (11:00 +0100)]
KVM: PPC: Implement Paired Single emulation
The one big thing about the Gekko is paired singles.
Paired singles are an extension to the instruction set, that adds 32 single
precision floating point registers (qprs), some SPRs to modify the behavior
of paired singled operations and instructions to deal with qprs to the
instruction set.
Unfortunately, it also changes semantics of existing operations that affect
single values in FPRs. In most cases they get mirrored to the coresponding
QPR.
Thanks to that we need to emulate all FPU operations and all the new paired
single operations too.
In order to achieve that, we use the just introduced FPU call helpers to
call the real FPU whenever the guest wants to modify an FPR. Additionally
we also fix up the QPR values along the way.
That way we can execute paired single FPU operations without implementing a
soft fpu.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:43 +0000 (11:00 +0100)]
KVM: PPC: Enable program interrupt to do MMIO
When we get a program interrupt we usually don't expect it to perform an
MMIO operation. But why not? When we emulate paired singles, we can end
up loading or storing to an MMIO address - and the handling of those
happens in the program interrupt handler.
So let's teach the program interrupt handler how to deal with EMULATE_MMIO.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:42 +0000 (11:00 +0100)]
KVM: PPC: Add helpers to modify ppc fields
The PowerPC specification always lists bits from MSB to LSB. That is
really confusing when you're trying to write C code, because it fits
in pretty badly with the normal (1 << xx) schemes.
So I came up with some nice wrappers that allow to get and set fields
in a u64 with bit numbers exactly as given in the spec. That makes the
code in KVM and the spec easier comparable.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:41 +0000 (11:00 +0100)]
KVM: PPC: Fix error in BAT assignment
BATs didn't work. Well, they did, but only up to BAT3. As soon as we
came to BAT4 the offset calculation was screwed up and we ended up
overwriting BAT0-3.
Fortunately, Linux hasn't been using BAT4+. It's still a good
idea to write correct code though.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:40 +0000 (11:00 +0100)]
KVM: PPC: Add helpers to call FPU instructions
To emulate paired single instructions, we need to be able to call FPU
operations from within the kernel. Since we don't want gcc to spill
arbitrary FPU code everywhere, we tell it to use a soft fpu.
Since we know we can really call the FPU in safe areas, let's also add
some calls that we can later use to actually execute real world FPU
operations on the host's FPU.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:38 +0000 (11:00 +0100)]
KVM: PPC: Make software load/store return eaddr
The Book3S KVM implementation contains some helper functions to load and store
data from and to virtual addresses.
Unfortunately, this helper used to keep the physical address it so nicely
found out for us to itself. So let's change that and make it return the
physical address it resolved.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:37 +0000 (11:00 +0100)]
KVM: PPC: Implement mtsr instruction emulation
The Book3S_32 specifications allows for two instructions to modify segment
registers: mtsrin and mtsr.
Most normal operating systems use mtsrin, because it allows to define which
segment it wants to change using a register. But since I was trying to run
an embedded guest, it turned out to be using mtsr with hardcoded values.
So let's also emulate mtsr. It's a valid instruction after all.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:36 +0000 (11:00 +0100)]
KVM: PPC: Fix typo in book3s_32 debug code
There's a typo in the debug ifdef of the book3s_32 mmu emulation. While trying
to debug something I stumbled across that and wanted to save anyone after me
(or myself later) from having to debug that again.
So let's fix the ifdef.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:35 +0000 (11:00 +0100)]
KVM: PPC: Preload FPU when possible
There are some situations when we're pretty sure the guest will use the
FPU soon. So we can save the churn of going into the guest, finding out
it does want to use the FPU and going out again.
This patch adds preloading of the FPU when it's reasonable.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:34 +0000 (11:00 +0100)]
KVM: PPC: Combine extension interrupt handlers
When we for example get an Altivec interrupt, but our guest doesn't support
altivec, we need to inject a program interrupt, not an altivec interrupt.
The same goes for paired singles. When an altivec interrupt arrives, we're
pretty sure we need to emulate the instruction because it's a paired single
operation.
So let's make all the ext handlers aware that they need to jump to the
program interrupt handler when an extension interrupt arrives that
was not supposed to arrive for the guest CPU.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:32 +0000 (11:00 +0100)]
KVM: PPC: Add hidden flag for paired singles
The Gekko implements an extension called paired singles. When the guest wants
to use that extension, we need to make sure we're not running the host FPU,
because all FPU instructions need to get emulated to accomodate for additional
operations that occur.
This patch adds an hflag to track if we're in paired single mode or not.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:31 +0000 (11:00 +0100)]
KVM: PPC: Add AGAIN type for emulation return
Emulation of an instruction can have different outcomes. It can succeed,
fail, require MMIO, do funky BookE stuff - or it can just realize something's
odd and will be fixed the next time around.
Exactly that is what EMULATE_AGAIN means. Using that flag we can now tell
the caller that nothing happened, but we still want to go back to the
guest and see what happens next time we come around.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:27 +0000 (11:00 +0100)]
KVM: PPC: Add QPR registers
The Gekko has GPRs, SPRs and FPRs like normal PowerPC codes, but
it also has QPRs which are basically single precision only FPU registers
that get used when in paired single mode.
The following patches depend on them being around, so let's add the
definitions early.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:08 +0000 (16:23 +0100)]
KVM: SVM: Make lazy FPU switching work with nested svm
The new lazy fpu switching code may disable cr0 intercepts
when running nested. This is a bug because the nested
hypervisor may still want to intercept cr0 which will break
in this situation. This patch fixes this issue and makes
lazy fpu switching working with nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:07 +0000 (16:23 +0100)]
KVM: SVM: Activate nested state only when guest state is complete
Certain functions called during the emulated world switch
behave differently when the vcpu is running nested. This is
not the expected behavior during a world switch emulation.
This patch ensures that the nested state is activated only
if the vcpu is completly in nested state.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:06 +0000 (16:23 +0100)]
KVM: SVM: Don't sync nested cr8 to lapic and back
This patch makes syncing of the guest tpr to the lapic
conditional on !nested. Otherwise a nested guest using the
TPR could freeze the guest.
Another important change this patch introduces is that the
cr8 intercept bits are no longer ORed at vmrun emulation if
the guest sets VINTR_MASKING in its VMCB. The reason is that
nested cr8 accesses need alway be handled by the nested
hypervisor because they change the shadow version of the
tpr.
Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:05 +0000 (16:23 +0100)]
KVM: SVM: Fix nested msr intercept handling
The nested_svm_exit_handled_msr() function maps only one
page of the guests msr permission bitmap. This patch changes
the code to use kvm_read_guest to fix the bug.
Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:03 +0000 (16:23 +0100)]
KVM: SVM: Sync all control registers on nested vmexit
Currently the vmexit emulation does not sync control
registers were the access is typically intercepted by the
nested hypervisor. But we can not count on that intercepts
to sync these registers too and make the code
architecturally more correct.
Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:00 +0000 (16:23 +0100)]
KVM: SVM: Don't use kmap_atomic in nested_svm_map
Use of kmap_atomic disables preemption but if we run in
shadow-shadow mode the vmrun emulation executes kvm_set_cr3
which might sleep or fault. So use kmap instead for
nested_svm_map.
Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Marcelo Tosatti [Sat, 13 Feb 2010 18:10:26 +0000 (16:10 -0200)]
KVM: add doc note about PIO/MMIO completion API
Document that partially emulated instructions leave the guest state
inconsistent, and that the kernel will complete operations before
checking for pending signals.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: allow 4 coldfire serial ports
m68knommu: fix coldfire tcdrain
m68knommu: remove a duplicate vector setting line for 68360
Fix m68k-uclinux's rt_sigreturn trampoline
m68knommu: correct the CC flags for Coldfire M5272 targets
uclinux: error message when FLAT reloc symbol is invalid, v2
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
mc13783-regulator: fix a memory leak in mc13783_regulator_remove
regulator: Let drivers know when they use the stub API
Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix TSS size check for 16-bit tasks
KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
KVM: Increase NR_IOBUS_DEVS limit to 200
KVM: fix the handling of dirty bitmaps to avoid overflows
KVM: MMU: fix kvm_mmu_zap_page() and its calling path
KVM: VMX: Save/restore rflags.vm correctly in real mode
KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
KVM: take srcu lock before call to complete_pio()
David Howells [Wed, 21 Apr 2010 11:01:23 +0000 (12:01 +0100)]
AFS: Don't pass error value to page_cache_release() in error handling
In the error handling in afs_mntpt_do_automount(), we pass an error
pointer to page_cache_release() if read_mapping_page() failed. Instead,
we should extend the gotos around the error handling we don't need.
Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix tcdrain on coldfire uarts.
Currently with coldfire uarts tcdrain returns without waiting for txempty,
because (tx)fifosize is 0. Fix that and call uart_update_timeout when
setting the baud rate, otherwise tcdrain will wait for an half our :)
Also constify mcf_uart_ops.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Jun Sun [Fri, 1 Jan 2010 01:28:52 +0000 (17:28 -0800)]
uclinux: error message when FLAT reloc symbol is invalid, v2
This patch fixes a cosmetic error in printk. Text segment and data/bss
segment are allocated from two different areas. It is not meaningful to
give the diff between them in the error reporting messages.
Signed-off-by: Jun Sun <jsun@junsun.net> Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Joern Engel [Tue, 20 Apr 2010 19:44:10 +0000 (21:44 +0200)]
[LogFS] Split large truncated into smaller chunks
Truncate would do an almost limitless amount of work without invoking
the garbage collector in between. Split it up into more manageable,
though still large, chunks.
Jan Kara [Mon, 19 Apr 2010 14:47:20 +0000 (16:47 +0200)]
quota: Convert __DQUOT_PARANOIA symbol to standard config option
Make __DQUOT_PARANOIA define from the old days a standard config option
and turn it off by default.
This gets rid of a quota warning about writes before quota is turned on
for systems with ext4 root filesystem. Currently there's no way to legally
solve this because /etc/mtab has to be written before quota is turned on
on most systems.
Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
pcmcia: fix error handling in cm4000_cs.c
drivers/pcmcia: Add missing local_irq_restore
serial_cs: MD55x support (PCMCIA GPRS/EDGE modem) (kernel 2.6.33)
pcmcia: avoid late calls to pccard_validate_cis
pcmcia: fix ioport size calculation in rsrc_nonstatic
pcmcia: re-start on MFC override
pcmcia: fix io_probe due to parent (PCI) resources
pcmcia: use previously assigned IRQ for all card functions
Before commit e28cbf22933d0c0ccaf3c4c27a1a263b41f73859 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.
Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.
Reported-and-tested-by: Andy Isaacson <adi@hexapodia.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
KVM: MMU: fix kvm_mmu_zap_page() and its calling path
This patch fix:
- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking
Avi Kivity [Thu, 8 Apr 2010 15:19:35 +0000 (18:19 +0300)]
KVM: VMX: Save/restore rflags.vm correctly in real mode
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode. The means that the following sequence
Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().
Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Andre Przywara [Wed, 24 Mar 2010 16:46:42 +0000 (17:46 +0100)]
KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jesse Barnes [Fri, 26 Mar 2010 18:07:16 +0000 (11:07 -0700)]
drm: delay vblank cleanup until after driver unload
Drivers may use vblank calls now (e.g. drm_vblank_off) in their unload
paths, so don't clean up the vblank related structures until after
driver unload.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
rmap: add exclusively owned pages to the newest anon_vma
The recent anon_vma fixes cause many anonymous pages to end up
in the parent process anon_vma, even when the page is exclusively
owned by the current process.
Adding exclusively owned anonymous pages to the top anon_vma
reduces rmap scanning overhead, especially in workloads with
forking servers.
This patch adds a parameter to __page_set_anon_rmap that can
be used to indicate whether or not the added page is exclusively
owned by the current process.
Pages added through page_add_new_anon_rmap are exclusively
owned by the current process, and can be added to the top
anon_vma.
Pages added through page_add_anon_rmap can be either shared
or exclusively owned, so we do the conservative thing and
add it to the oldest anon_vma.
A next step would be to add the exclusive parameter to
page_add_anon_rmap, to be used from functions where we do
know for sure whether a page is exclusively owned.
Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Lightly-tested-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
[ Edited to look nicer - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Turn lower lookup error messages into debug messages
eCryptfs: Copy lower directory inode times and size on link
ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
ecryptfs: fix error code for missing xattrs in lower fs
eCryptfs: Decrypt symlink target for stat size
eCryptfs: Strip metadata in xattr flag in encrypted view
eCryptfs: Clear buffer before reading in metadata xattr
eCryptfs: Rename ecryptfs_crypt_stat.num_header_bytes_at_front
eCryptfs: Fix metadata in xattr feature regression
Tyler Hicks [Thu, 25 Mar 2010 16:16:56 +0000 (11:16 -0500)]
eCryptfs: Turn lower lookup error messages into debug messages
Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted
file name have caused many users to become concerned about their data.
Since this is a rather harmless condition, I'm moving this warning to
only be printed when the ecryptfs_verbosity module param is 1.
Tyler Hicks [Tue, 23 Mar 2010 23:09:02 +0000 (18:09 -0500)]
eCryptfs: Copy lower directory inode times and size on link
The timestamps and size of a lower inode involved in a link() call was
being copied to the upper parent inode. Instead, we should be
copying lower parent inode's timestamps and size to the upper parent
inode. I discovered this bug using the POSIX test suite at Tuxera.
Jeff Mahoney [Fri, 19 Mar 2010 19:35:46 +0000 (15:35 -0400)]
ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.
When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.
The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.
As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().
This patch removes the d_drop call and fixes both issues.
This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887
Reported-by: Árpád Bíró <biroa@demasz.hu> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
ecryptfs: fix error code for missing xattrs in lower fs
If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.
Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Tyler Hicks [Mon, 22 Mar 2010 05:41:35 +0000 (00:41 -0500)]
eCryptfs: Decrypt symlink target for stat size
Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path. Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path. This
could lead to confusion in some userspace applications, since the two
values should be equal.
Commit b91ecb00 ("gigaset: include cleanup cleanup") removed an implicit
sched.h inclusion that came in via slab.h, and caused various compile
problems as a result.
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Make RCU lockdep check the lockdep_recursion variable
rcu: Update docs for rcu_access_pointer and rcu_dereference_protected
rcu: Better explain the condition parameter of rcu_dereference_check()
rcu: Add rcu_access_pointer and rcu_dereference_protected
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
gigaset: include cleanup cleanup
packet : remove init_net restriction
WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
ip: Fix ip_dev_loopback_xmit()
net: dev_pick_tx() fix
fib: suppress lockdep-RCU false positive in FIB trie.
tun: orphan an skb on tx
forcedeth: fix tx limit2 flag check
iwlwifi: work around bogus active chains detection