]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
8 years agof2fs: report error for f2fs_parent_dir
Jaegeuk Kim [Thu, 9 Jun 2016 21:57:19 +0000 (14:57 -0700)]
f2fs: report error for f2fs_parent_dir

If there is no dentry, we can report its error correctly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: find parent dentry correctly
Sheng Yong [Sat, 4 Jun 2016 14:01:28 +0000 (22:01 +0800)]
f2fs: find parent dentry correctly

If dotdot directory is corrupted, its slot may be ocupied by another
file. In this case, dentry[1] is not the parent directory. Rename and
cross-rename will update the inode in dentry[1] incorrectly.   This
patch finds dotdot dentry by name.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
[Jaegeuk Kim: remove wron bug_on]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix deadlock in add_link failure
Jaegeuk Kim [Tue, 7 Jun 2016 21:34:22 +0000 (14:34 -0700)]
f2fs: fix deadlock in add_link failure

mkdir                        sync_dirty_inode
 - init_inode_metadata
   - lock_page(node)
   - make_empty_dir
                             - filemap_fdatawrite()
                              - do_writepages
                              - lock_page(data)
                              - write_page(data)
                               - lock_page(node)
   - f2fs_init_acl
    - error
   - truncate_inode_pages
    - lock_page(data)

So, we don't need to truncate data pages in this error case, which will
be done by f2fs_evict_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce mode=lfs mount option
Jaegeuk Kim [Sat, 4 Jun 2016 02:29:38 +0000 (19:29 -0700)]
f2fs: introduce mode=lfs mount option

This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.

Especially, this supports host-managed SMR device.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: skip clean segment for gc
Jaegeuk Kim [Tue, 7 Jun 2016 01:49:54 +0000 (18:49 -0700)]
f2fs: skip clean segment for gc

If a segment in a section is clean or prefreed, we don't need to get its summary
and do gc.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: drop any block plugging
Jaegeuk Kim [Sat, 4 Jun 2016 21:25:24 +0000 (14:25 -0700)]
f2fs: drop any block plugging

In f2fs, we don't need to keep block plugging for NODE and DATA writes, since
we already merged bios as much as possible.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid reverse IO order for NODE and DATA
Jaegeuk Kim [Sat, 4 Jun 2016 21:21:28 +0000 (14:21 -0700)]
f2fs: avoid reverse IO order for NODE and DATA

There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(),
which incur unnecessary reversed bio submission.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: set mapping error for EIO
Jaegeuk Kim [Fri, 3 Jun 2016 19:28:26 +0000 (12:28 -0700)]
f2fs: set mapping error for EIO

If EIO occurred, we need to set all the mapping to avoid any further IOs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: control not to exceed # of cached nat entries
Jaegeuk Kim [Thu, 2 Jun 2016 22:24:24 +0000 (15:24 -0700)]
f2fs: control not to exceed # of cached nat entries

This is to avoid cache entry management overhead including radix tree.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix wrong percentage
Jaegeuk Kim [Thu, 2 Jun 2016 22:26:27 +0000 (15:26 -0700)]
f2fs: fix wrong percentage

This should be 1%, 10MB / 1GB.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid data race between FI_DIRTY_INODE flag and update_inode
Jaegeuk Kim [Thu, 2 Jun 2016 21:15:56 +0000 (14:15 -0700)]
f2fs: avoid data race between FI_DIRTY_INODE flag and update_inode

FI_DIRTY_INODE flag is not covered by inode page lock, so it can be unset
at any time like below.

Thread #1                        Thread #2
- lock_page(ipage)
- update i_fields
                                 - update i_size/i_blocks/and so on
 - set FI_DIRTY_INODE
- reset FI_DIRTY_INODE
- set_page_dirty(ipage)

In this case, we can lose the latest i_field information.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove obsolete parameter in f2fs_truncate
Jaegeuk Kim [Thu, 2 Jun 2016 20:49:38 +0000 (13:49 -0700)]
f2fs: remove obsolete parameter in f2fs_truncate

We don't need lock parameter, which is always true.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid wrong count on dirty inodes
Jaegeuk Kim [Thu, 2 Jun 2016 18:08:56 +0000 (11:08 -0700)]
f2fs: avoid wrong count on dirty inodes

The number should be covered by spin_lock. Otherwise we can see wrong count
in f2fs_stat.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove deprecated parameter
Jaegeuk Kim [Thu, 2 Jun 2016 04:18:25 +0000 (21:18 -0700)]
f2fs: remove deprecated parameter

Remove deprecated paramter.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: handle writepage correctly
Jaegeuk Kim [Mon, 30 May 2016 04:18:23 +0000 (21:18 -0700)]
f2fs: handle writepage correctly

Previously, f2fs_write_data_pages() calls __f2fs_writepage() which calls
f2fs_write_data_page().
If f2fs_write_data_page() returns AOP_WRITEPAGE_ACTIVATE, __f2fs_writepage()
calls mapping_set_error(). But, this should not happen at every time, since
sometimes f2fs_write_data_page() tries to skip writing pages without error.
For example, volatile_write() gives EIO all the time, as Shuoran Liu pointed
out.

Reported-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: return error of f2fs_lookup
Jaegeuk Kim [Fri, 27 May 2016 17:10:41 +0000 (10:10 -0700)]
f2fs: return error of f2fs_lookup

Now we can report an error to f2fs_lookup given by f2fs_find_entry.

Suggested-by: He YunLei <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: return the errno to the caller to avoid using a wrong page
Yunlong Song [Thu, 26 May 2016 11:40:29 +0000 (19:40 +0800)]
f2fs: return the errno to the caller to avoid using a wrong page

Commit aaf9607516ed38825268515ef4d773289a44f429 ("f2fs: check node page
contents all the time") pointed out that "sometimes it was reported that
its contents was missing", so it checks the page's mapping and contents.
When "nid != nid_of_node(page)", ERR_PTR(-EIO) will be returned to the
caller. However, commit e1c51b9f1df2f9efc2ec11488717e40cd12015f9 ("f2fs:
clean up node page updating flow") moves "nid != nid_of_node(page)" test
to "f2fs_bug_on(sbi, nid != nid_of_node(page))", this will return a
wrong page to the caller when F2FS_CHECK_FS is off when "sometimes it
was reported that its contents was missing" happens.

This patch restores to check node page contents all the time, and
returns the errno to make the caller known something is wrong and avoid
to use the page. This patch also moves f2fs_bug_on to its proper location.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove two steps to flush dirty data pages
Jaegeuk Kim [Thu, 26 May 2016 03:57:16 +0000 (20:57 -0700)]
f2fs: remove two steps to flush dirty data pages

If there is no cold page, we don't need to do a loop to flush dirty
data pages.

On /dev/pmem0,

1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
 Before : 1.1 GB/s
 After  : 1.2 GB/s

2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
 Before : 2.2 GB/s
 After  : 2.3 GB/s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: do not skip writing data pages
Jaegeuk Kim [Thu, 26 May 2016 00:17:56 +0000 (17:17 -0700)]
f2fs: do not skip writing data pages

For data pages, let's try to flush as much as possible in background.

On /dev/pmem0,

1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync
 Before : 800 MB/s
 After  : 1.1 GB/s

2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048
 Before : 1.3 GB/s
 After  : 2.2 GB/s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: inject to produce some orphan inodes
Jaegeuk Kim [Wed, 25 May 2016 22:24:18 +0000 (15:24 -0700)]
f2fs: inject to produce some orphan inodes

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: propagate error given by f2fs_find_entry
Jaegeuk Kim [Wed, 25 May 2016 21:29:11 +0000 (14:29 -0700)]
f2fs: propagate error given by f2fs_find_entry

If we get ENOMEM or EIO in f2fs_find_entry, we should stop right away.
Otherwise, for example, we can get duplicate directory entry by ->chash and
->clevel.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove writepages lock
Jaegeuk Kim [Sat, 21 May 2016 05:50:29 +0000 (22:50 -0700)]
f2fs: remove writepages lock

This patch removes writepages lock.
We can improve multi-threading performance.

tiobench, 32 threads, 4KB write per fsync on SSD
Before: 25.88 MB/s
After: 28.03 MB/s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: set flush_merge by default
Jaegeuk Kim [Sat, 21 May 2016 05:39:20 +0000 (22:39 -0700)]
f2fs: set flush_merge by default

This patch sets flush_merge by default.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: detect congestion of flush command issues
Jaegeuk Kim [Mon, 23 May 2016 19:04:56 +0000 (12:04 -0700)]
f2fs: detect congestion of flush command issues

If flush commands do not incur any congestion, we don't need to throw that to
dispatching queue which causes unnecessary latency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add lazytime mount option
Jaegeuk Kim [Sat, 21 May 2016 04:47:24 +0000 (21:47 -0700)]
f2fs: add lazytime mount option

This patch adds lazytime support.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid unnecessary updating inode during fsync
Jaegeuk Kim [Sat, 21 May 2016 03:42:37 +0000 (20:42 -0700)]
f2fs: avoid unnecessary updating inode during fsync

If roll-forward recovery can recover i_size, we don't need to update inode's
metadata during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove syncing inode page in all the cases
Jaegeuk Kim [Fri, 20 May 2016 23:32:49 +0000 (16:32 -0700)]
f2fs: remove syncing inode page in all the cases

This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()

Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: flush inode metadata when checkpoint is doing
Jaegeuk Kim [Fri, 20 May 2016 18:10:10 +0000 (11:10 -0700)]
f2fs: flush inode metadata when checkpoint is doing

This patch registers all the inodes which have dirty metadata to sync when
checkpoint is doing.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: call mark_inode_dirty_sync for i_field changes
Jaegeuk Kim [Fri, 20 May 2016 16:52:20 +0000 (09:52 -0700)]
f2fs: call mark_inode_dirty_sync for i_field changes

This patch calls mark_inode_dirty_sync() for the following on-disk inode
changes.

 -> largest
 -> ctime/mtime/atime
 -> i_current_depth
 -> i_xattr_nid
 -> i_pino
 -> i_advise
 -> i_flags
 -> i_mode

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce f2fs_i_links_write with mark_inode_dirty_sync
Jaegeuk Kim [Fri, 20 May 2016 16:43:20 +0000 (09:43 -0700)]
f2fs: introduce f2fs_i_links_write with mark_inode_dirty_sync

This patch introduces f2fs_i_links_write() to call mark_inode_dirty_sync() when
changing inode->i_links.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce f2fs_i_blocks_write with mark_inode_dirty_sync
Jaegeuk Kim [Fri, 20 May 2016 16:26:06 +0000 (09:26 -0700)]
f2fs: introduce f2fs_i_blocks_write with mark_inode_dirty_sync

This patch introduces f2fs_i_blocks_write() to call mark_inode_dirty_sync() when
changing inode->i_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce f2fs_i_size_write with mark_inode_dirty_sync
Jaegeuk Kim [Fri, 20 May 2016 16:22:03 +0000 (09:22 -0700)]
f2fs: introduce f2fs_i_size_write with mark_inode_dirty_sync

This patch introduces f2fs_i_size_write() to call mark_inode_dirty_sync() with
i_size_write().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: use inode pointer for {set, clear}_inode_flag
Jaegeuk Kim [Fri, 20 May 2016 17:13:22 +0000 (10:13 -0700)]
f2fs: use inode pointer for {set, clear}_inode_flag

This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agoRevert "f2fs: no need inc dirty pages under inode lock"
Jaegeuk Kim [Thu, 2 Jun 2016 03:55:51 +0000 (20:55 -0700)]
Revert "f2fs: no need inc dirty pages under inode lock"

This reverts commit b951a4ec165af4973b2bd9c80fb5845fbd840435.

 Conflicts:
fs/f2fs/checkpoint.c

8 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 2 Jun 2016 22:08:06 +0000 (15:08 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:
   - two fixes for 4.6 vgic [Christoffer] (cc stable)

   - six fixes for 4.7 vgic [Marc]

  x86:
   - six fixes from syzkaller reports [Paolo] (two of them cc stable)

   - allow OS X to boot [Dmitry]

   - don't trust compilers [Nadav]"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
  KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
  KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
  KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
  KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
  kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
  KVM: Handle MSR_IA32_PERF_CTL
  KVM: x86: avoid write-tearing of TDP
  KVM: arm/arm64: vgic-new: Removel harmful BUG_ON
  arm64: KVM: vgic-v3: Relax synchronization when SRE==1
  arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1
  arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value
  KVM: arm/arm64: vgic-v3: Always resample level interrupts
  KVM: arm/arm64: vgic-v2: Always resample level interrupts
  KVM: arm/arm64: vgic-v3: Clear all dirty LRs
  KVM: arm/arm64: vgic-v2: Clear all dirty LRs

8 years agoKVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
Paolo Bonzini [Wed, 1 Jun 2016 12:09:23 +0000 (14:09 +0200)]
KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS

MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
time, and the next KVM_RUN oopses:

   general protection fault: 0000 [#1] SMP
   CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
   Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
   [...]
   Call Trace:
    [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
    [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
    [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
    [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
    [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
   RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
    RSP <ffff88005836bd50>

Testcase (beautified/reduced from syzkaller output):

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[8];

    int main()
    {
        struct kvm_debugregs dr = { 0 };

        r[2] = open("/dev/kvm", O_RDONLY);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);

        memcpy(&dr,
               "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
               "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
               "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
               "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
               48);
        r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
        r[6] = ioctl(r[4], KVM_RUN, 0);
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agoKVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
Paolo Bonzini [Wed, 1 Jun 2016 12:09:22 +0000 (14:09 +0200)]
KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID

This causes an ugly dmesg splat.  Beautified syzkaller testcase:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <sys/ioctl.h>
    #include <fcntl.h>
    #include <linux/kvm.h>

    long r[8];

    int main()
    {
        struct kvm_irq_routing ir = { 0 };
        r[2] = open("/dev/kvm", O_RDWR);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_SET_GSI_ROUTING, &ir);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agoKVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
Paolo Bonzini [Wed, 1 Jun 2016 12:09:21 +0000 (14:09 +0200)]
KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi

Found by syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
    IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
    PGD 6f80b067 PUD b6535067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
    [...]
    Call Trace:
     [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm]
     [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm]
     [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm]
     [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
     [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
     [<ffffffff817a1062>] tracesys_phase2+0x84/0x89
    Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85
    RIP  [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
     RSP <ffff8800926cbca8>
    CR2: 0000000000000120

Testcase:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[26];

    int main()
    {
        memset(r, -1, sizeof(r));
        r[2] = open("/dev/kvm", 0);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);

        struct kvm_irqfd ifd;
        ifd.fd = syscall(SYS_eventfd2, 5, 0);
        ifd.gsi = 3;
        ifd.flags = 2;
        ifd.resamplefd = ifd.fd;
        r[25] = ioctl(r[3], KVM_IRQFD, &ifd);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agoKVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
Paolo Bonzini [Wed, 1 Jun 2016 12:09:20 +0000 (14:09 +0200)]
KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number

This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return
EINVAL.  It causes a WARN from exception_type:

    WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]()
    CPU: 3 PID: 16732 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
    Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
     0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e
     0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2
     ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001
    Call Trace:
     [<ffffffff813b542e>] dump_stack+0x63/0x85
     [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
     [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
     [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm]
     [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm]
     [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
     [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
     [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
     [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
    ---[ end trace b1a0391266848f50 ]---

Testcase (beautified/reduced from syzkaller output):

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>
    #include <linux/kvm.h>

    long r[31];

    int main()
    {
        memset(r, -1, sizeof(r));
        r[2] = open("/dev/kvm", O_RDONLY);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0);

        struct kvm_vcpu_events ve = {
                .exception.injected = 1,
                .exception.nr = 0xd4
        };
        r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve);
        r[30] = ioctl(r[7], KVM_RUN, 0);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agoKVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
Paolo Bonzini [Wed, 1 Jun 2016 12:09:19 +0000 (14:09 +0200)]
KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID

This causes an ugly dmesg splat.  Beautified syzkaller testcase:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <sys/ioctl.h>
    #include <fcntl.h>
    #include <linux/kvm.h>

    long r[8];

    int main()
    {
        struct kvm_cpuid2 c = { 0 };
        r[2] = open("/dev/kvm", O_RDWR);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_CREATE_VCPU, 0x8);
        r[7] = ioctl(r[4], KVM_SET_CPUID, &c);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agokvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
Paolo Bonzini [Wed, 1 Jun 2016 12:09:18 +0000 (14:09 +0200)]
kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR

Found by syzkaller:

    WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]()
    CPU: 3 PID: 15175 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
    Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
     0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e
     0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2
     00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000
    Call Trace:
     [<ffffffff813b542e>] dump_stack+0x63/0x85
     [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
     [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
     [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm]
     [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm]
     [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel]
     [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm]
     [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm]
     [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
     [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
     [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71

Testcase:

    #include <unistd.h>
    #include <sys/ioctl.h>
    #include <fcntl.h>
    #include <string.h>
    #include <linux/kvm.h>

    long r[8];

    int main()
    {
        memset(r, -1, sizeof(r));
r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
        r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
        r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
        return 0;
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agoKVM: Handle MSR_IA32_PERF_CTL
Dmitry Bilunov [Tue, 31 May 2016 14:38:24 +0000 (17:38 +0300)]
KVM: Handle MSR_IA32_PERF_CTL

Intel CPUs having Turbo Boost feature implement an MSR to provide a
control interface via rdmsr/wrmsr instructions. One could detect the
presence of this feature by issuing one of these instructions and
handling the #GP exception which is generated in case the referenced MSR
is not implemented by the CPU.

KVM's vCPU model behaves exactly as a real CPU in this case by injecting
a fault when MSR_IA32_PERF_CTL is called (which KVM does not support).
However, some operating systems use this register during an early boot
stage in which their kernel is not capable of handling #GP correctly,
causing #DP and finally a triple fault effectively resetting the vCPU.

This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the
crashes.

Signed-off-by: Dmitry Bilunov <kmeaw@yandex-team.ru>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agoKVM: x86: avoid write-tearing of TDP
Nadav Amit [Wed, 11 May 2016 15:04:29 +0000 (08:04 -0700)]
KVM: x86: avoid write-tearing of TDP

In theory, nothing prevents the compiler from write-tearing PTEs, or
split PTE writes. These partially-modified PTEs can be fetched by other
cores and cause mayhem. I have not really encountered such case in
real-life, but it does seem possible.

For example, the compiler may try to do something creative for
kvm_set_pte_rmapp() and perform multiple writes to the PTE.

Signed-off-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
8 years agoMerge tag 'kvm-arm-for-v4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Radim Krčmář [Thu, 2 Jun 2016 15:28:04 +0000 (17:28 +0200)]
Merge tag 'kvm-arm-for-v4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm

KVM/ARM Fixes for v4.7-rc2

Fixes for the vgic, 2 of the patches address a bug introduced in v4.6
while the rest are for the new vgic.

8 years agoKVM: arm/arm64: vgic-new: Removel harmful BUG_ON
Marc Zyngier [Thu, 2 Jun 2016 08:24:06 +0000 (09:24 +0100)]
KVM: arm/arm64: vgic-new: Removel harmful BUG_ON

When changing the active bit from an MMIO trap, we decide to
explode if the intid is that of a private interrupt.

This flawed logic comes from the fact that we were assuming that
kvm_vcpu_kick() as called by kvm_arm_halt_vcpu() would not return before
the called vcpu responded, but this is not the case, so we need to
perform this wait even for private interrupts.

Dropping the BUG_ON seems like the right thing to do.

 [ Commit message tweaked by Christoffer ]

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
8 years agoMerge tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Wed, 1 Jun 2016 19:38:50 +0000 (12:38 -0700)]
Merge tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Here are three pin control fixes for v4.7.  Not much, and just driver
  fixes:

   - add device tree matches to MAINTAINERS

   - inversion bug in the Nomadik driver

   - dual edge handling bug in the mediatek driver"

* tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: mediatek: fix dual-edge code defect
  MAINTAINERS: Add file patterns for pinctrl device tree bindings
  pinctrl: nomadik: fix inversion of gpio direction

8 years agoMerge tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits...
Linus Torvalds [Wed, 1 Jun 2016 19:32:25 +0000 (12:32 -0700)]
Merge tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf

Pull dma-buf updates from Sumit Semwal:

 - use of vma_pages instead of explicit computation

 - DocBook and headerdoc updates for dma-buf

* tag 'dma-buf-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
  dma-buf: use vma_pages()
  fence: add missing descriptions for fence
  doc: update/fixup dma-buf related DocBook
  reservation: add headerdoc comments
  dma-buf: headerdoc fixes

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 1 Jun 2016 05:28:28 +0000 (22:28 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.

 2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
    properly.  From Ezequiel Garcia.

 3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.

 4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.

 5) Fix deadlock on team->lock when propagating features, from Ivan
    Vecera.

 6) Work around buffer offset hw bug in alx chips, from Feng Tang.

 7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
    Long.

 8) Various statistics bug fixes in mlx4 from Eric Dumazet.

 9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.

10) All of l2tp was namespace aware, but the ipv6 support code was not
    doing so.  From Shmulik Ladkani.

11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.

12) Propagate MAC changes properly through VLAN devices, from Mike
    Manning.

13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
  sfc: Track RPS flow IDs per channel instead of per function
  usbnet: smsc95xx: fix link detection for disabled autonegotiation
  virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
  bnx2x: avoid leaking memory on bnx2x_init_one() failures
  fou: fix IPv6 Kconfig options
  openvswitch: update checksum in {push,pop}_mpls
  sctp: sctp_diag should dump sctp socket type
  net: fec: update dirty_tx even if no skb
  vlan: Propagate MAC address to VLANs
  atm: iphase: off by one in rx_pkt()
  atm: firestream: add more reserved strings
  vxlan: Accept user specified MTU value when create new vxlan link
  net: pktgen: Call destroy_hrtimer_on_stack()
  timer: Export destroy_hrtimer_on_stack()
  net: l2tp: Make l2tp_ip6 namespace aware
  Documentation: ip-sysctl.txt: clarify secure_redirects
  sfc: use flow dissector helpers for aRFS
  ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
  net: nps_enet: Disable interrupts before napi reschedule
  net/lapb: tuse %*ph to dump buffers
  ...

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Wed, 1 Jun 2016 05:20:56 +0000 (22:20 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
 "sparc64 mmu context allocation and trap return bug fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix return from trap window fill crashes.
  sparc: Harden signal return frame checks.
  sparc64: Take ctx_alloc_lock properly in hugetlb_setup().

8 years agosfc: Track RPS flow IDs per channel instead of per function
Jon Cooper [Tue, 31 May 2016 18:12:32 +0000 (19:12 +0100)]
sfc: Track RPS flow IDs per channel instead of per function

Otherwise we get confused when two flows on different channels get the
 same flow ID.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agousbnet: smsc95xx: fix link detection for disabled autonegotiation
Christoph Fritz [Thu, 26 May 2016 02:06:47 +0000 (04:06 +0200)]
usbnet: smsc95xx: fix link detection for disabled autonegotiation

To detect link status up/down for connections where autonegotiation is
explicitly disabled, we don't get an irq but need to poll the status
register for link up/down detection.
This patch adds a workqueue to poll for link status.

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovirtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
wangyunjian [Tue, 31 May 2016 03:52:43 +0000 (11:52 +0800)]
virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv

In function virtnet_open() and virtnet_probe(), func try_fill_recv() may
be executed at the same time. VQ in virtqueue_add() has not been protected
well and BUG_ON will be triggered when virito_net.ko being removed.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobnx2x: avoid leaking memory on bnx2x_init_one() failures
Vitaly Kuznetsov [Mon, 30 May 2016 13:00:54 +0000 (15:00 +0200)]
bnx2x: avoid leaking memory on bnx2x_init_one() failures

bnx2x_init_bp() allocates memory with bnx2x_alloc_mem_bp() so if we
fail later in bnx2x_init_one() we need to free this memory
with bnx2x_free_mem_bp() to avoid leakages. E.g. I'm observing memory
leaks reported by kmemleak when a failure (unrelated) happens in
bnx2x_vfpf_acquire().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofou: fix IPv6 Kconfig options
Arnd Bergmann [Tue, 31 May 2016 20:42:11 +0000 (22:42 +0200)]
fou: fix IPv6 Kconfig options

The Kconfig options I added to work around broken compilation ended
up screwing up things more, as I used the wrong symbol to control
compilation of the file, resulting in IPv6 fou support to never be built
into the kernel.

Changing CONFIG_NET_FOU_IPV6_TUNNELS to CONFIG_IPV6_FOU fixes that
problem, I had renamed the symbol in one location but not the other,
and as the file is never being used by other kernel code, this did not
lead to a build failure that I would have caught.

After that fix, another issue with the same patch becomes obvious, as we
'select INET6_TUNNEL', which is related to IPV6_TUNNEL, but not the same,
and this can still cause the original build failure when IPV6_TUNNEL is
not built-in but IPV6_FOU is. The fix is equally trivial, we just need
to select the right symbol.

I have successfully build 350 randconfig kernels with this patch
and verified that the driver is now being built.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Fixes: fabb13db448e ("fou: add Kconfig options for IPv6 support")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoopenvswitch: update checksum in {push,pop}_mpls
Simon Horman [Mon, 30 May 2016 05:04:25 +0000 (14:04 +0900)]
openvswitch: update checksum in {push,pop}_mpls

In the case of CHECKSUM_COMPLETE the skb checksum should be updated in
{push,pop}_mpls() as they the type in the ethernet header.

As suggested by Pravin Shelar.

Cc: Pravin Shelar <pshelar@nicira.com>
Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosctp: sctp_diag should dump sctp socket type
Xin Long [Sun, 29 May 2016 09:42:13 +0000 (17:42 +0800)]
sctp: sctp_diag should dump sctp socket type

Now we cannot distinguish that one sk is a udp or sctp style when
we use ss to dump sctp_info. it's necessary to dump it as well.

For sctp_diag, ss support is not officially available, thus there
are no official users of this yet, so we can add this field in the
middle of sctp_info without breaking user API.

v1->v2:
  - move 'sctpi_s_type' field to the end of struct sctp_info, so
    that it won't cause incompatibility with applications already
    built.
  - add __reserved3 in sctp_info to make sure sctp_info is 8-byte
    alignment.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: fec: update dirty_tx even if no skb
Troy Kisky [Fri, 27 May 2016 20:30:40 +0000 (13:30 -0700)]
net: fec: update dirty_tx even if no skb

If dirty_tx isn't updated, then dma_unmap_single
can be called twice.

This fixes a
[   58.420980] ------------[ cut here ]------------
[   58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux-4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8()
[   58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=66 bytes]

encountered by Holger

Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
Tested-by: <holgerschurig@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovlan: Propagate MAC address to VLANs
Mike Manning [Fri, 27 May 2016 16:45:07 +0000 (17:45 +0100)]
vlan: Propagate MAC address to VLANs

The MAC address of the physical interface is only copied to the VLAN
when it is first created, resulting in an inconsistency after MAC
address changes of only newly created VLANs having an up-to-date MAC.

The VLANs should continue inheriting the MAC address of the physical
interface until the VLAN MAC address is explicitly set to any value.
This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
to the MAC of the physical interface and thus for DAD to behave as
expected.

Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoatm: iphase: off by one in rx_pkt()
Dan Carpenter [Fri, 27 May 2016 10:34:35 +0000 (13:34 +0300)]
atm: iphase: off by one in rx_pkt()

The iadev->rx_open[] array holds "iadev->num_vc" pointers (this code
assumes that pointers are 32 bits).  So the > here should be >= or else
we could end up reading a garbage pointer from one element beyond the
end of the array.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoatm: firestream: add more reserved strings
Dan Carpenter [Fri, 27 May 2016 10:33:50 +0000 (13:33 +0300)]
atm: firestream: add more reserved strings

This bug was there when the driver was first added in back in year 2000.
It causes a Smatch warning:

    drivers/atm/firestream.c:849 process_incoming()
    error: buffer overflow 'res_strings' 60 <= 63

There are supposed to be 64 entries in this array and the missing
strings are clearly in the 30 40 range.  I added them as reserved 37 to
reserved 40.  It's possible that strings are really supposed to be added
in the middle instead of at the end, but this approach is safe, in that
it fixes the bug and doesn't break anything that wasn't already broken.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovxlan: Accept user specified MTU value when create new vxlan link
Chen Haiquan [Fri, 27 May 2016 02:49:11 +0000 (10:49 +0800)]
vxlan: Accept user specified MTU value when create new vxlan link

When create a new vxlan link, example:
  ip link add vtap mtu 1440 type vxlan vni 1 dev eth0

The argument "mtu" has no effect, because it is not set to conf->mtu. The
default value is used in vxlan_dev_configure function.

This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device
configuration).

Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration)
Signed-off-by: Chen Haiquan <oc@yunify.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: pktgen: Call destroy_hrtimer_on_stack()
Guenter Roeck [Fri, 27 May 2016 00:21:06 +0000 (17:21 -0700)]
net: pktgen: Call destroy_hrtimer_on_stack()

If CONFIG_DEBUG_OBJECTS_TIMERS=y, hrtimer_init_on_stack() requires
a matching call to destroy_hrtimer_on_stack() to clean up timer
debug objects.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotimer: Export destroy_hrtimer_on_stack()
Guenter Roeck [Fri, 27 May 2016 00:21:05 +0000 (17:21 -0700)]
timer: Export destroy_hrtimer_on_stack()

hrtimer_init_on_stack() needs a matching call to
destroy_hrtimer_on_stack(), so both need to be exported.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodma-buf: use vma_pages()
Muhammad Falak R Wani [Mon, 23 May 2016 11:38:42 +0000 (17:08 +0530)]
dma-buf: use vma_pages()

Replace explicit computation of vma page count by a call to
vma_pages().
Also, include <linux/mm.h>

Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
8 years agofence: add missing descriptions for fence
Luis de Bethencourt [Mon, 11 Apr 2016 11:48:55 +0000 (12:48 +0100)]
fence: add missing descriptions for fence

The members child_list and active_list were added to the fence struct
without descriptions for the Documentation. Adding these.

Fixes: b55b54b5db33 ("staging/android: remove struct sync_pt")
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
8 years agodoc: update/fixup dma-buf related DocBook
Rob Clark [Thu, 31 Mar 2016 20:26:52 +0000 (16:26 -0400)]
doc: update/fixup dma-buf related DocBook

Split out dma-buf related parts into their own section, add missing
files, and write a bit of overview about how it all fits together.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Tue, 31 May 2016 16:43:24 +0000 (09:43 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
 "Three bugs fixes and an update for the default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix info leak in do_sigsegv
  s390/config: update default configuration
  s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
  s390/bpf: reduce maximum program size to 64 KB

8 years agoreservation: add headerdoc comments
Rob Clark [Thu, 31 Mar 2016 20:26:51 +0000 (16:26 -0400)]
reservation: add headerdoc comments

Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
8 years agodma-buf: headerdoc fixes
Rob Clark [Thu, 31 Mar 2016 20:26:50 +0000 (16:26 -0400)]
dma-buf: headerdoc fixes

Apparently nobody noticed that dma-buf.h wasn't actually pulled into
docbook build.  And as a result the headerdoc comments bitrot a bit.
Add missing params/fields.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
8 years agoMerge tag 'gpio-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux...
Linus Torvalds [Tue, 31 May 2016 16:27:00 +0000 (09:27 -0700)]
Merge tag 'gpio-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "A bunch of GPIO fixes for the v4.7 series:

   - Drop the lock before reading out the GPIO direction setting in
     drivers supporting the .get_direction() callback: some of them may
     be slowpath.

   - Flush GPIO direction setting before locking a GPIO as an IRQ: some
     electronics or other poking around in the registers behind our back
     may have happened, so flush the direction status before trying to
     lock the line for use by IRQs.

   - Bail out silently when asked to perform operations on NULL GPIO
     descriptors.  That is what all the get_*_optional() is about: we
     get optional GPIO handles, if they are not there, we get NULL.

   - Handle compatible ioctl() correctly: we need to convert the ioctl()
     pointer using compat_ptr() here like everyone else.

   - Disable the broken .to_irq() on the LPC32xx platform.  The whole
     irqchip infrastructure was replaced in the last merge window, and a
     new implementation will be needed"

* tag 'gpio-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: drop lock before reading GPIO direction
  gpio: bail out silently on NULL descriptors
  gpio: handle compatible ioctl() pointers
  gpio: flush direction status in gpiochip_lock_as_irq()
  gpio: lpc32xx: disable broken to_irq support

8 years agoarm64: KVM: vgic-v3: Relax synchronization when SRE==1
Marc Zyngier [Wed, 25 May 2016 14:26:39 +0000 (15:26 +0100)]
arm64: KVM: vgic-v3: Relax synchronization when SRE==1

The GICv3 backend of the vgic is quite barrier heavy, in order
to ensure synchronization of the system registers and the
memory mapped view for a potential GICv2 guest.

But when the guest is using a GICv3 model, there is absolutely
no need to execute all these heavy barriers, and it is actually
beneficial to avoid them altogether.

This patch makes the synchonization conditional, and ensures
that we do not change the EL1 SRE settings if we do not need to.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
8 years agoarm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1
Marc Zyngier [Wed, 25 May 2016 14:26:38 +0000 (15:26 +0100)]
arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1

Both our GIC emulations are "strict", in the sense that we either
emulate a GICv2 or a GICv3, and not a GICv3 with GICv2 legacy
support.

But when running on a GICv3 host, we still allow the guest to
tinker with the ICC_SRE_EL1 register during its time slice:
it can switch SRE off, observe that it is off, and yet on the
next world switch, find the SRE bit to be set again. Not very
nice.

An obvious solution is to always trap accesses to ICC_SRE_EL1
(by clearing ICC_SRE_EL2.Enable), and to let the handler return
the programmed value on a read, or ignore the write.

That way, the guest can always observe that our GICv3 is SRE==1
only.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
8 years agoarm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value
Marc Zyngier [Wed, 25 May 2016 14:26:37 +0000 (15:26 +0100)]
arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value

When we trap ICC_SRE_EL1, we handle it as RAZ/WI. It would be
more correct to actual make it RO, and return the configured
value when read.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
8 years agoKVM: arm/arm64: vgic-v3: Always resample level interrupts
Marc Zyngier [Wed, 25 May 2016 14:26:36 +0000 (15:26 +0100)]
KVM: arm/arm64: vgic-v3: Always resample level interrupts

When reading back from the list registers, we need to perform
two actions for level interrupts:
1) clear the soft-pending bit if the interrupt is not pending
   anymore *in the list register*
2) resample the line level and propagate it to the pending state

But these two actions shouldn't be linked, and we should *always*
resample the line level, no matter what state is in the list
register. Otherwise, we may end-up injecting spurious interrupts
that have been already retired.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
8 years agoKVM: arm/arm64: vgic-v2: Always resample level interrupts
Marc Zyngier [Wed, 25 May 2016 14:26:35 +0000 (15:26 +0100)]
KVM: arm/arm64: vgic-v2: Always resample level interrupts

When reading back from the list registers, we need to perform
two actions for level interrupts:
1) clear the soft-pending bit if the interrupt is not pending
   anymore *in the list register*
2) resample the line level and propagate it to the pending state

But these two actions shouldn't be linked, and we should *always*
resample the line level, no matter what state is in the list
register. Otherwise, we may end-up injecting spurious interrupts
that have been already retired.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
8 years agoKVM: arm/arm64: vgic-v3: Clear all dirty LRs
Christoffer Dall [Wed, 25 May 2016 14:26:34 +0000 (15:26 +0100)]
KVM: arm/arm64: vgic-v3: Clear all dirty LRs

When saving the state of the list registers, it is critical to
reset them zero, as we could otherwise leave unexpected EOI
interrupts pending for virtual level interrupts.

Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
8 years agoKVM: arm/arm64: vgic-v2: Clear all dirty LRs
Christoffer Dall [Wed, 25 May 2016 14:26:33 +0000 (15:26 +0100)]
KVM: arm/arm64: vgic-v2: Clear all dirty LRs

When saving the state of the list registers, it is critical to
reset them zero, as we could otherwise leave unexpected EOI
interrupts pending for virtual level interrupts.

Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
8 years agopinctrl: mediatek: fix dual-edge code defect
hongkun.cao [Sat, 21 May 2016 07:23:39 +0000 (15:23 +0800)]
pinctrl: mediatek: fix dual-edge code defect

When a dual-edge irq is triggered, an incorrect irq will be reported on
condition that the external signal is not stable and this incorrect irq
has been registered.
Correct the register offset.

Cc: stable@vger.kernel.org
Signed-off-by: Hongkun Cao <hongkun.cao@mediatek.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agoMerge branch 'uuid' (lib/uuid fixes from Andy)
Linus Torvalds [Mon, 30 May 2016 22:27:07 +0000 (15:27 -0700)]
Merge branch 'uuid' (lib/uuid fixes from Andy)

Merge lib/uuid fixes from Andy Shevchenko.

* emailed patches from Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
  lib/uuid.c: use correct offset in uuid parser
  lib/uuid: add a test module

8 years agolib/uuid.c: use correct offset in uuid parser
Bjørn Mork [Mon, 30 May 2016 14:40:42 +0000 (17:40 +0300)]
lib/uuid.c: use correct offset in uuid parser

Use '+ 0' and '+ 1' as offsets, like they were intended, instead of
adding to the result.

Fixes: 2b1b0d66704a ("lib/uuid.c: introduce a few more generic helpers")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agolib/uuid: add a test module
Andy Shevchenko [Mon, 30 May 2016 14:40:41 +0000 (17:40 +0300)]
lib/uuid: add a test module

It appears that somehow I missed a test of the latest UUID rework which
landed in the kernel.  Present a small test module to avoid such cases
in the future.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Mon, 30 May 2016 22:20:18 +0000 (15:20 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - missing selection in public_key that may result in a build failure

   - Potential crash in error path in omap-sham

   - ccp AES XTS bug that affects requests larger than 4096"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ccp - Fix AES XTS error for request sizes above 4096
  crypto: public_key: select CRYPTO_AKCIPHER
  crypto: omap-sham - potential Oops on error in probe

8 years agogpio: drop lock before reading GPIO direction
Linus Walleij [Mon, 30 May 2016 15:11:59 +0000 (17:11 +0200)]
gpio: drop lock before reading GPIO direction

When adding the gpiochip, the GPIO HW drivers' callback get_direction()
could get called in atomic context. Some of the GPIO HW drivers may
sleep when accessing the register.

Move the lock before initializing the descriptors.

Reported-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agogpio: bail out silently on NULL descriptors
Linus Walleij [Mon, 30 May 2016 14:48:39 +0000 (16:48 +0200)]
gpio: bail out silently on NULL descriptors

In fdeb8e1547cb9dd39d5d7223b33f3565cf86c28e
("gpio: reflect base and ngpio into gpio_device")
assumed that GPIO descriptors are either valid or error
pointers, but gpiod_get_[index_]optional() actually return
NULL descriptors and then all subsequent calls should just
bail out.

Cc: stable@vger.kernel.org
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: fdeb8e1547cb ("gpio: reflect base and ngpio into gpio_device")
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agogpio: handle compatible ioctl() pointers
Linus Walleij [Fri, 27 May 2016 12:24:04 +0000 (14:24 +0200)]
gpio: handle compatible ioctl() pointers

If we're using the compatible ioctl() we need to handle the
argument pointer in a special way or there will be trouble.

Fixes: 3c702e9987e2 ("gpio: add a userspace chardev ABI for GPIOs")
Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agoMAINTAINERS: Add file patterns for pinctrl device tree bindings
Geert Uytterhoeven [Sun, 22 May 2016 09:06:13 +0000 (11:06 +0200)]
MAINTAINERS: Add file patterns for pinctrl device tree bindings

Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agopinctrl: nomadik: fix inversion of gpio direction
Linus Walleij [Tue, 24 May 2016 12:39:47 +0000 (14:39 +0200)]
pinctrl: nomadik: fix inversion of gpio direction

The input/output directions were inversed on the GPIO direction
read function. Loose a ! and it is correct.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agogpio: flush direction status in gpiochip_lock_as_irq()
Linus Walleij [Wed, 25 May 2016 08:56:03 +0000 (10:56 +0200)]
gpio: flush direction status in gpiochip_lock_as_irq()

As irqchip and gpiochip functions are orthogonal, the IRQ
set-up or something else can have changed the direction of
the GPIO line from what the GPIO descriptor knows when we
get into gpiochip_lock_as_irq(). Make sure to re-read the
direction setting if we have the .get_direction() callback
enabled for the chip.

Else we get problems like this:

iio iio:device2: interrupts on the rising edge
gpio gpiochip2: (8012e080.gpio): gpiochip_lock_as_irq:
  tried to flag a GPIO set as output for IRQ
gpio gpiochip2: (8012e080.gpio): unable to lock HW IRQ 0 for IRQ
genirq: Failed to request resources for l3g4200d-trigger
  (irq 111) on irqchip nmk1-32-63
iio iio:device2: failed to request trigger IRQ.
st-gyro-i2c: probe of 2-0068 failed with error -22

Fixes: 72d320006177 ("gpio: set up initial state from .get_direction()")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agogpio: lpc32xx: disable broken to_irq support
Sylvain Lemieux [Wed, 11 May 2016 17:40:00 +0000 (13:40 -0400)]
gpio: lpc32xx: disable broken to_irq support

The "to_irq" functionality is broken inside this driver since commit
76ba59f8366f ("genirq: Add irq_domain-aware core IRQ handler").

The addition of the new lpc32xx irqchip driver in 4.7, fixed the
lpc32xx platform interrupt issue.

When switching to the new lpc32xx irqchip driver, a warning appear
in the lpc32xx gpio driver: warning: "NR_IRQS" redefined.

To remove this warning (temporary solution), this patch
disables the broken "to_irq" mapping functionality support.

Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
8 years agonet: l2tp: Make l2tp_ip6 namespace aware
Shmulik Ladkani [Thu, 26 May 2016 17:16:36 +0000 (20:16 +0300)]
net: l2tp: Make l2tp_ip6 namespace aware

l2tp_ip6 tunnel and session lookups were still using init_net, although
the l2tp core infrastructure already supports lookups keyed by 'net'.

As a result, l2tp_ip6_recv discarded packets for tunnels/sessions
created in namespaces other than the init_net.

Fix, by using dev_net(skb->dev) or sock_net(sk) where appropriate.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: ip-sysctl.txt: clarify secure_redirects
Eric Garver [Thu, 26 May 2016 16:28:05 +0000 (12:28 -0400)]
Documentation: ip-sysctl.txt: clarify secure_redirects

Clarify how secure_redirects works. Mention that RFC1122 always applies.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: use flow dissector helpers for aRFS
Edward Cree [Thu, 26 May 2016 20:46:05 +0000 (21:46 +0100)]
sfc: use flow dissector helpers for aRFS

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
Baozeng Ding [Thu, 26 May 2016 13:07:42 +0000 (21:07 +0800)]
ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr

Fix a logic error to avoid potential null pointer dereference.

Signed-off-by: Baozeng Ding <sploving1@gmail.com>
Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: nps_enet: Disable interrupts before napi reschedule
Elad Kanfi [Thu, 26 May 2016 12:00:06 +0000 (15:00 +0300)]
net: nps_enet: Disable interrupts before napi reschedule

Since NAPI works by shutting down event interrupts when theres
work and turning them on when theres none, the net driver must
make sure that interrupts are disabled when it reschedules polling.
By calling napi_reschedule, the driver switches to polling mode,
therefor there should be no interrupt interference.
Any received packets will be handled in nps_enet_poll by polling the HW
indication of received packet until all packets are handled.

Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Acked-by: Noam Camus <noamca@mellanox.com>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/lapb: tuse %*ph to dump buffers
Andy Shevchenko [Thu, 26 May 2016 11:43:52 +0000 (14:43 +0300)]
net/lapb: tuse %*ph to dump buffers

Use %*ph specifier to dump small buffers in hex format instead doing this
byte-by-byte.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoptp: oops in ptp_ioctl()
Dan Carpenter [Thu, 26 May 2016 06:46:22 +0000 (09:46 +0300)]
ptp: oops in ptp_ioctl()

If we pass ERR_PTR(-EFAULT) to kfree() then it's going to oops.

Fixes: 2ece068e1b1d ('ptp: use memdup_user().')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agofou: add Kconfig options for IPv6 support
Arnd Bergmann [Wed, 25 May 2016 14:50:46 +0000 (16:50 +0200)]
fou: add Kconfig options for IPv6 support

A previous patch added the fou6.ko module, but that failed to link
in a couple of configurations:

net/built-in.o: In function `ip6_tnl_encap_add_fou_ops':
net/ipv6/fou6.c:88: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:94: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:97: undefined reference to `ip6_tnl_encap_del_ops'
net/built-in.o: In function `ip6_tnl_encap_del_fou_ops':
net/ipv6/fou6.c:106: undefined reference to `ip6_tnl_encap_del_ops'
net/ipv6/fou6.c:107: undefined reference to `ip6_tnl_encap_del_ops'

If CONFIG_IPV6=m, ip6_tnl_encap_add_ops/ip6_tnl_encap_del_ops
are in a module, but fou6.c can still be built-in, and that
obviously fails to link.

Also, if CONFIG_IPV6=y, but CONFIG_IPV6_TUNNEL=m or
CONFIG_IPV6_TUNNEL=n, the same problem happens for a different
reason.

This adds two new silent Kconfig symbols to work around both
problems:

- CONFIG_IPV6_FOU is now always set to 'm' if either CONFIG_NET_FOU=m
  or CONFIG_IPV6=m
- CONFIG_IPV6_FOU_TUNNEL is set implicitly when IPV6_FOU is enabled
  and NET_FOU_IP_TUNNELS is also turned out, and it will ensure
  that CONFIG_IPV6_TUNNEL is also available.

The options could be made user-visible as well, to give additional
room for configuration, but it seems easier not to bother users
with more choice here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: aa3463d65e7b ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoipv6: hide ip6_encap_hlen/ip6_tnl_encap definitions
Arnd Bergmann [Wed, 25 May 2016 14:50:45 +0000 (16:50 +0200)]
ipv6: hide ip6_encap_hlen/ip6_tnl_encap definitions

A recent cleanup moved MAX_IPTUN_ENCAP_OPS along with some other
definitions, but it is now invisible when CONFIG_INET is
not defined, but still referenced from ip6_tunnel.h:

In file included from net/xfrm/xfrm_input.c:17:0:
include/net/ip6_tunnel.h:67:17: error: 'MAX_IPTUN_ENCAP_OPS' undeclared here (not in a function)
   ip6tun_encaps[MAX_IPTUN_ENCAP_OPS];
                 ^~~~~~~~~~~~~~~~~~~

This hides the ip6_encap_hlen and ip6_tnl_encap functions inside
of CONFIG_INET so we don't run into the the problem.

Alternatively we could move the macro out of the #ifdef again to
restore the previous behavior

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 55c2bc143224 ("net: Cleanup encap items in ip_tunnels.h")
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosparc64: Fix return from trap window fill crashes.
David S. Miller [Sun, 29 May 2016 03:41:12 +0000 (20:41 -0700)]
sparc64: Fix return from trap window fill crashes.

We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.

Otherwise we can get an OOPS that looks like this:

ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
 [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c

The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code.  First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.

The userland register window fill handler is:

add %sp, STACK_BIAS + 0x00, %g1; \
ldxa [%g1 + %g0] ASI, %l0; \
mov 0x08, %g2; \
mov 0x10, %g3; \
ldxa [%g1 + %g2] ASI, %l1; \
mov 0x18, %g5; \
ldxa [%g1 + %g3] ASI, %l2; \
ldxa [%g1 + %g5] ASI, %l3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %l4; \
ldxa [%g1 + %g2] ASI, %l5; \
ldxa [%g1 + %g3] ASI, %l6; \
ldxa [%g1 + %g5] ASI, %l7; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i0; \
ldxa [%g1 + %g2] ASI, %i1; \
ldxa [%g1 + %g3] ASI, %i2; \
ldxa [%g1 + %g5] ASI, %i3; \
add %g1, 0x20, %g1; \
ldxa [%g1 + %g0] ASI, %i4; \
ldxa [%g1 + %g2] ASI, %i5; \
ldxa [%g1 + %g3] ASI, %i6; \
ldxa [%g1 + %g5] ASI, %i7; \
restored; \
retry; nop; nop; nop; nop; \
b,a,pt %xcc, fill_fixup_dax; \
b,a,pt %xcc, fill_fixup_mna; \
b,a,pt %xcc, fill_fixup;

And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took.  In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for.  It just always branches to the last instruction in
the parent trap's handler.

For example, for a regular fault, the code goes:

winfix_trampoline:
rdpr %tpc, %g3
or %g3, 0x7c, %g3
wrpr %g3, %tnpc
done

All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.

On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons.  The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).

This is executed inline via the FILL_*_RTRAP handlers.  rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary.  Now if you look at them, we'll see at the end:

    ba,a,pt    %xcc, user_rtt_fill_fixup;
    ba,a,pt    %xcc, user_rtt_fill_fixup;
    ba,a,pt    %xcc, user_rtt_fill_fixup;

And oops, all three cases are handled like a fault.

This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.

So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.

So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.

Reported-by: Nick Alcock <nix@esperi.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 29 May 2016 20:28:39 +0000 (13:28 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of four fixes noticed in the merge window.  The aacraid
  one is an optimisation, the mp3sas one fixes a spurious printk, the
  sd_check_events one fixes a theoretical race and the failed zero
  length commands fixes a bug in our completion/retry routines that has
  been causing problems in the field"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  aacraid: do not activate events on non-SRC adapters
  mpt3sas: add missing curly braces
  sd: get disk reference in sd_check_events()
  scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands