That was true of the implementation of d_splice_alias, but this is
really a problem with d_splice_alias: at a minimum it should be able to
return -ELOOP in the case where inserting the given dentry would cause a
directory loop.
Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Don't use ENOBUFS when ENOMEM is the correct error code
Al Viro has tactfully pointed out that we are using the incorrect
error code in some cases. This patch fixes that, and also removes
the (unused) return value for glock dumping.
> * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is
> "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
> we don't support STREAMS it's probably fair game, but... what the hell?
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>
fs/gfs2/quota.c: In function 'gfs2_quota_init':
>> fs/gfs2/quota.c:1246:3: error: implicit declaration of function '__vmalloc' [-Werror=implicit-function-declaration]
sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
^
>> fs/gfs2/quota.c:1246:24: warning: assignment makes pointer from integer without a cast [enabled by default]
sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL);
^
fs/gfs2/quota.c: In function 'gfs2_quota_cleanup':
>> fs/gfs2/quota.c:1361:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
vfree(sdp->sd_quota_bitmap);
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Move quota bitmap operations under their own lock
Gradually, the global qd_lock is being used for less and less.
After this patch it will only be used for the per super block
list whose purpose is to allow syncing of changes back to the
master quota file from the local quota changes file. Fixing
up that process to make it more efficient will be the subject
of a later patch, however this patch removes another barrier
to doing that.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
Quota slot allocation has historically used a vector of pages
and a set of homegrown find/test/set/clear bit functions. Since
the size of the bitmap is likely to be based on the default
qc file size, thats a couple of pages at most. So we ought
to be able to allocate that as a single chunk, with a vmalloc
fallback, just in case of memory fragmentation.
We are then able to use the kernel's own find/test/set/clear
bit functions, rather than rolling our own.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
GFS2: Only run logd and quota when mounted read/write
While investigating a rather strange bit of code in the quota
clean up function, I spotted that the reason for its existence
was that when remounting read only, we were not stopping the
quotad thread, and thus it was possible for it to still have
a reference to some of the quotas in that case.
This patch moves the logd and quota thread start and stop into
the make_fs_rw/ro functions, so that we now stop those threads
when mounted read only.
This means that quotad will always be stopped before we call
the quota clean up function, and we can thus dispose of the
(rather hackish) code that waits for it to give up its
reference on the quotas.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
Prior to this patch, GFS2 kept all the quotas for each
super block in a single linked list. This is rather slow
when there are large numbers of quotas.
This patch introduces a hlist_bl based hash table, similar
to the one used for glocks. The initial look up of the quota
is now lockless in the case where it is already cached,
although we still have to take the per quota spinlock in
order to bump the ref count. Either way though, this is a
big improvement on what was there before.
The qd_lock and the per super block list is preserved, for
the time being. However it is intended that since this is no
longer used for its original role, it should be possible to
shrink the number of items on that list in due course and
remove the requirement to take qd_lock in qd_get.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
We recently fixed the writeback of pages prior to performing
direct i/o, however the initial fix was perhaps a bit heavy
handed. There is no need to invalidate pages if the direct i/o
is only a read, since they will be identical to what has been
flushed to disk anyway.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch adds four new fields to directory leaf blocks.
The intent is not to use them in the kernel itself, although
perhaps we may be able to use them as hints at some later date,
but instead to provide more information for debug/fsck use.
One new field adds a pointer to the inode to which the leaf
belongs. This can be useful if the pointer to the leaf block
has become corrupt, as it will allow us to know which inode
this block should be associated with. This field is set when
the leaf is created and never changed over its lifetime.
The second field is a "distance from the hash table" field.
The meaning is as follows:
0 = An old leaf in which this value has not been set
1 = This leaf is pointed to directly from the hash table
2+ = This leaf is part of a chain, pointed to by another leaf
block, the value gives the position in the chain.
The third and fourth fields combine to give a time stamp of
the most recent directory insertion or deletion from this
leaf block. The time stamp is not updated when a new leaf
block is chained from the current one. The code is currently
written such that the timestamp on the dir inode will match
that of the leaf block for the most recent insertion/deletion.
For backwards compatibility, any of these new fields which is
zero should be considered to be "unknown".
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: For exhash conversion, only one block is needed
For most cases, only a single new block is needed when we reach
the point of converting from stuffed to exhash directory. The
exception being when the file name is so long that it will not
fit within the new leaf block.
So this patch adds a simple test for that situation so that we
do not need to request the full reservation size in this case.
Potentially we could calculate more accurately the value to use
in other cases too, but that is much more complicated to do and
it is doubtful that the benefit would outweigh the extra cost
in code complexity.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Mon, 6 Jan 2014 22:16:01 +0000 (17:16 -0500)]
GFS2: Increase i_writecount during gfs2_setattr_chown
This patch calls get_write_access in function gfs2_setattr_chown,
which merely increases inode->i_writecount for the duration of the
function. That will ensure that any file closes won't delete the
inode's multi-block reservation while the function is running.
It also ensures that a multi-block reservation exists when needed
for quota change operations during the chown.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When we look to see if there is enough space to add a dir
entry without allocation, we have then been repeating the
same search later when we do the actual insertion. This
patch caches the details of the location in the gfs2_diradd
structure, so that we do not have to repeat the search.
This will provide a performance improvement which will be
greater as the size of the directory increases.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Consolidate transaction blocks calculation for dir add
There are three cases where we need to calculate the number of
blocks to reserve in a transaction involving linking an inode
into a directory. The one in rename is a bit more complicated,
but the basis of it is the same as for link and create. So it
makes sense to move this calculation into a single function
rather than repeating it three times.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The intent is that this structure will hold the information
required when adding entries to a directory (linking). To
start with, it will contain only the number of blocks which
are required to link the new entry into the directory. The
current calculation returns either 0 or the maximim number of
blocks that can ever be requested by such a transaction.
The intent is that in a later patch, we can update the dir
code to calculate this value more accurately. In addition
further patches will also add further fields to the new
structure to increase its utility.
In addition this patch fixes a bug where the link used during
inode creation was adding requesting too many blocks in
some cases. This is harmless unless the fs is close to being
full.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Prior to this patch, GFS2 had one address space for each rgrp,
stored in the glock. This patch changes them to use a single
address space in the super block. This therefore saves
(sizeof(struct address_space) * nr_of_rgrps) bytes of memory
and for large filesystems, that can be significant.
It would be nice to be able to do something similar and merge
the inode metadata address space into the same global
address space. However, that is rather more complicated as the
on-disk location doesn't have a 1:1 mapping with the inodes in
general. So while it could be done, it will be a more complicated
operation as it requires changing a lot more code paths.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Use range based functions for rgrp sync/invalidation
Each rgrp header is represented as a single extent on disk, so we
can calculate the position within the address space, since we are
using address spaces mapped 1:1 to the disk. This means that it
is possible to use the range based versions of filemap_fdatawrite/wait
and for invalidating the page cache.
Our eventual intent is to then be able to merge the address spaces
used for rgrps into a single address space, rather than to have
one for each glock, saving memory and reducing complexity.
Since during umount, the rgrp structures are disposed of before
the glocks, we need to store the extent information in the glock
so that is is available for a final invalidation. This patch uses
a field which is otherwise unused in rgrp glocks to do that, so
that we do not have to expand the size of a glock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since gfs2_inplace_reserve() is always called with a valid
alloc parms structure, there is no need to test for this
within the function itself - and in any case, after we've
all ready dereferenced it anyway.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There is only one place this is used, when reading in the quota
changes at mount time. It is not really required and much
simpler to just convert the fields from the on-disk structure
as required.
There should be no functional change as a result of this patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
For historical reasons, we drop and retake the log lock in ->releasepage()
however, since there is no reason why we cannot hold the log lock over
the whole function, this allows some simplification. In particular,
pinning a buffer is only ever done under the log lock, so it is possible
here to remove the test for pinned buffers in the second loop, since it
is impossible for that to happen (it is also tested in the first loop).
As a result, two tests made later in the second loop become constants
and can also be reduced to the only possible branch. So the net result
is to remove various bits of unreachable code and make this more
readable.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Mon, 25 Nov 2013 11:16:25 +0000 (11:16 +0000)]
GFS2: Implement a "rgrp has no extents longer than X" scheme
With the preceding patch, we started accepting block reservations
smaller than the ideal size, which requires a lot more parsing of the
bitmaps. To reduce the amount of bitmap searching, this patch
implements a scheme whereby each rgrp keeps track of the point
at this multi-block reservations will fail.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 6 Nov 2013 15:58:00 +0000 (10:58 -0500)]
GFS2: Drop inadequate rgrps from the reservation tree
This is just basically a resend of a patch I posted earlier.
It didn't change from its original, except in diff offsets, etc:
This patch fixes a bug in the GFS2 block allocation code. The problem
starts if a process already has a multi-block reservation, but for
some reason, another process disqualifies it from further allocations.
For example, the other process might set on the GFS2_RDF_ERROR bit.
The process holding the reservation jumps to label skip_rgrp, but
that label comes after the code that removes the reservation from the
tree. Therefore, the no longer usable reservation is not removed from
the rgrp's reservations tree; it's lost. Eventually, the lost reservation
causes the count of reserved blocks to get off, and eventually that
causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger.
This patch moves the call to after label skip_rgrp so that the
disqualified reservation is properly removed from the tree, thus keeping
the rgrp rd_reserved count sane.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bob Peterson [Wed, 6 Nov 2013 15:55:52 +0000 (10:55 -0500)]
GFS2: If requested is too large, use the largest extent in the rgrp
Here is a second try at a patch I posted earlier, which also implements
suggestions Steve made:
Before this patch, GFS2 would keep searching through all the rgrps
until it found one that had a chunk of free blocks big enough to
satisfy the size hint, which is based on the file write size,
regardless of whether the chunk was big enough to perform the write.
However, when doing big writes there may not be a large enough
chunk of free blocks in any rgrp, due to file system fragmentation.
The largest chunk may be big enough to satisfy the write request,
but it may not meet the ideal reservation size from the "size hint".
The writes would slow to a crawl because every write would search
every rgrp, then finally give up and default to a single-block write.
In my case, performance would drop from 425MB/s to 18KB/s, or 24000
times slower.
This patch basically makes it so that if we can't find a contiguous
chunk of blocks big enough to satisfy the sizehint, we'll use the
largest chunk of blocks we found that will still contain the write.
It does so by keeping track of the largest run of blocks within the
rgrp.
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Linus Torvalds [Thu, 2 Jan 2014 22:40:38 +0000 (14:40 -0800)]
Merge branch 'akpm' (incoming from Andrew)
Merge patches from Andrew Morton:
"Ten fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test()
mm/memory-failure.c: transfer page count from head page to tail page after split thp
MAINTAINERS: set up proper record for Xilinx Zynq
mm: remove bogus warning in copy_huge_pmd()
memcg: fix memcg_size() calculation
mm: fix use-after-free in sys_remap_file_pages
mm: munlock: fix deadlock in __munlock_pagevec()
mm: munlock: fix a bug where THP tail page is encountered
Jason Baron [Thu, 2 Jan 2014 20:58:54 +0000 (12:58 -0800)]
epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with
commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple
topologies").
The acquistion of the ep->mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')
However, by simply not acquiring the lock, we do not serialize behind
the ep->mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.
The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon. The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock. Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.
Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
Min_low_pfn and max_low_pfn were used in pfn_valid macro if defined
CONFIG_FLATMEM. When the functions that use the pfn_valid is used in
driver module, max_low_pfn and min_low_pfn is to undefined, and fail to
build.
Naoya Horiguchi [Thu, 2 Jan 2014 20:58:51 +0000 (12:58 -0800)]
mm/memory-failure.c: transfer page count from head page to tail page after split thp
Memory failures on thp tail pages cause kernel panic like below:
mce: [Hardware Error]: Machine check events logged
MCE exception done on CPU 7
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff811b7cd1>] dequeue_hwpoisoned_huge_page+0x131/0x1e0
PGD bae42067 PUD ba47d067 PMD 0
Oops: 0000 [#1] SMP
...
CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G M O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25
...
Call Trace:
me_huge_page+0x3e/0x50
memory_failure+0x4bb/0xc20
mce_process_work+0x3e/0x70
process_one_work+0x171/0x420
worker_thread+0x11b/0x3a0
? manage_workers.isra.25+0x2b0/0x2b0
kthread+0xe4/0x100
? kthread_create_on_node+0x190/0x190
ret_from_fork+0x7c/0xb0
? kthread_create_on_node+0x190/0x190
...
RIP dequeue_hwpoisoned_huge_page+0x131/0x1e0
CR2: 0000000000000058
The reasoning of this problem is shown below:
- when we have a memory error on a thp tail page, the memory error
handler grabs a refcount of the head page to keep the thp under us.
- Before unmapping the error page from processes, we split the thp,
where page refcounts of both of head/tail pages don't change.
- Then we call try_to_unmap() over the error page (which was a tail
page before). We didn't pin the error page to handle the memory error,
this error page is freed and removed from LRU list.
- We never have the error page on LRU list, so the first page state
check returns "unknown page," then we move to the second check
with the saved page flag.
- The saved page flag have PG_tail set, so the second page state check
returns "hugepage."
- We call me_huge_page() for freed error page, then we hit the above panic.
The root cause is that we didn't move refcount from the head page to the
tail page after split thp. So this patch suggests to do this.
This panic was introduced by commit 524fca1e73 ("HWPOISON: fix
misjudgement of page_action() for errors on mlocked pages"). Note that we
did have the same refcount problem before this commit, but it was just
ignored because we had only first page state check which returned "unknown
page." The commit changed the refcount problem from "doesn't work" to
"kernel panic."
This warning was introduced by "mm: numa: Avoid unnecessary disruption
of NUMA hinting during migration" for paranoia reasons but the warning
is bogus. I was thinking of parallel races between NUMA hinting faults
and forks but this warning would also be triggered by a parallel reclaim
splitting a THP during a fork. Remote the bogus warning.
Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rik van Riel [Thu, 2 Jan 2014 20:58:46 +0000 (12:58 -0800)]
mm: fix use-after-free in sys_remap_file_pages
remap_file_pages calls mmap_region, which may merge the VMA with other
existing VMAs, and free "vma". This can lead to a use-after-free bug.
Avoid the bug by remembering vm_flags before calling mmap_region, and
not trying to dereference vma later.
Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Kees Cook <keescook@chromium.org> Cc: Michel Lespinasse <walken@google.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Thu, 2 Jan 2014 20:58:44 +0000 (12:58 -0800)]
mm: munlock: fix deadlock in __munlock_pagevec()
Commit 7225522bb429 ("mm: munlock: batch non-THP page isolation and
munlock+putback using pagevec" introduced __munlock_pagevec() to speed
up munlock by holding lru_lock over multiple isolated pages. Pages that
fail to be isolated are put_page()d immediately, also within the lock.
This can lead to deadlock when __munlock_pagevec() becomes the holder of
the last page pin and put_page() leads to __page_cache_release() which
also locks lru_lock. The deadlock has been observed by Sasha Levin
using trinity.
This patch avoids the deadlock by deferring put_page() operations until
lru_lock is released. Another pagevec (which is also used by later
phases of the function is reused to gather the pages for put_page()
operation.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Thu, 2 Jan 2014 20:58:43 +0000 (12:58 -0800)]
mm: munlock: fix a bug where THP tail page is encountered
Since commit ff6a6da60b89 ("mm: accelerate munlock() treatment of THP
pages") munlock skips tail pages of a munlocked THP page. However, when
the head page already has PageMlocked unset, it will not skip the tail
pages.
Commit 7225522bb429 ("mm: munlock: batch non-THP page isolation and
munlock+putback using pagevec") has added a PageTransHuge() check which
contains VM_BUG_ON(PageTail(page)). Sasha Levin found this triggered
using trinity, on the first tail page of a THP page without PageMlocked
flag.
This patch fixes the issue by skipping tail pages also in the case when
PageMlocked flag is unset. There is still a possibility of race with
THP page split between clearing PageMlocked and determining how many
pages to skip. The race might result in former tail pages not being
skipped, which is however no longer a bug, as during the skip the
PageTail flags are cleared.
However this race also affects correctness of NR_MLOCK accounting, which
is to be fixed in a separate patch.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 2 Jan 2014 20:45:47 +0000 (12:45 -0800)]
Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse:
"Here is a set of small fixes for GFS2. There is a fix to drop
s_umount which is copied in from the core vfs, two patches relate to a
hard to hit "use after free" and memory leak. Two patches related to
using DIO and buffered I/O on the same file to ensure correct
operation in relation to glock state changes. The final patch adds an
RCU read lock to ensure correct locking on an error path"
* tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Fix unsafe dereference in dump_holder()
GFS2: Wait for async DIO in glock state changes
GFS2: Fix incorrect invalidation for DIO/buffered I/O
GFS2: Fix slab memory leak in gfs2_bufdata
GFS2: Fix use-after-free race when calling gfs2_remove_from_ail
GFS2: don't hold s_umount over blkdev_put
Linus Torvalds [Thu, 2 Jan 2014 20:45:07 +0000 (12:45 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Two small bug fixes and a follow-up to the CONFIG_NR_CPUS change.
A kernel compiled with CONFIG_NR_CPUS=256 will waste quite a bit of
memory for the per-cpu arrays. Under z/VM the maximum number of CPUs
is 64, the code now limits the possible cpu mask to 64 if running
under z/VM"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: obtain function handle in hotplug notifier
s390/3270: fix allocation of tty3270_screen structure
s390/smp: improve setup of possible cpu mask
Jan Kiszka [Sat, 28 Dec 2013 15:31:52 +0000 (16:31 +0100)]
KVM: nVMX: Unconditionally uninit the MMU on nested vmexit
Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway
in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is
most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if
one guest VCPU manipulates the page of another VCPU in L2, we may be
fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in
nested state. That can crash the host later on if nested_ept_get_cr3 is
invoked while L1 already left vmxon and nested.current_vmcs12 became
NULL therefore.
Cc: stable@kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Linus Torvalds [Wed, 1 Jan 2014 19:36:16 +0000 (11:36 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull radeon drm fixes from Dave Airlie:
"Just piping a bunch of fixes from pre-xmas from Alex for radeon, all
either fix bad hw setup issues or regressions"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: Bump version for CIK DCE tiling fix
drm/radeon: set correct number of banks for CIK chips in DCE
drm/radeon: set correct pipe config for Hawaii in DCE
drm/radeon: expose render backend mask to the userspace
drm/radeon: fix render backend setup for SI and CIK
drm/radeon: 0x9649 is SUMO2 not SUMO
drm/radeon: fix UVD 256MB check
Dave Airlie [Wed, 1 Jan 2014 10:32:19 +0000 (20:32 +1000)]
Merge branch 'drm-fixes-3.13' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Radeon fixes, Christmas eve edition. Fix incorrect family for 0x9649
which lead to bogus rendering, tiling and RB fixes for SI and CIK,
and a UVD fix.
* 'drm-fixes-3.13' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: Bump version for CIK DCE tiling fix
drm/radeon: set correct number of banks for CIK chips in DCE
drm/radeon: set correct pipe config for Hawaii in DCE
drm/radeon: expose render backend mask to the userspace
drm/radeon: fix render backend setup for SI and CIK
drm/radeon: 0x9649 is SUMO2 not SUMO
drm/radeon: fix UVD 256MB check
drivers/crypto/ixp4xx_crypto.c: In function 'ixp_module_init':
drivers/crypto/ixp4xx_crypto.c:1419:2: error: 'dev' undeclared (first use in this function)
Now builds. Not tested on real hw.
Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Tue, 31 Dec 2013 20:19:30 +0000 (12:19 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"A fix for a panic in gpio-keys driver when set up with absolute
events, a fixup to the new zforce driver and a new keycode definition"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: allocate absinfo data when setting ABS capability
Input: define KEY_WWAN for Wireless WAN
Input: zforce - fix possible driver hang during suspend
Linus Torvalds [Tue, 31 Dec 2013 20:17:14 +0000 (12:17 -0800)]
Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"A few small cifs fixes including two for stable, and fixing a
regression introduced by the VFS change to file create"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
cifs: set FILE_CREATED
cifs: We do not drop reference to tlink in CIFSCheckMFSymlink()
Add missing end of line termination to some cifs messages
Dmitry Torokhov [Fri, 27 Dec 2013 01:44:29 +0000 (17:44 -0800)]
Input: allocate absinfo data when setting ABS capability
We need to make sure we allocate absinfo data when we are setting one of
EV_ABS/ABS_XXX capabilities, otherwise we may bomb when we try to emit this
event.
Rested-by: Paul Cercueil <pcercuei@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jan Kiszka [Sun, 29 Dec 2013 01:29:30 +0000 (02:29 +0100)]
KVM: x86: Fix APIC map calculation after re-enabling
Update arch.apic_base before triggering recalculate_apic_map. Otherwise
the recalculation will work against the previous state of the APIC and
will fail to build the correct map when an APIC is hardware-enabled
again.
Linus Torvalds [Mon, 30 Dec 2013 18:22:57 +0000 (10:22 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"A bit more endian problems found during testing of 3.13 and a few
other simple fixes and regressions fixes"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix alignment of secondary cpu spin vars
powerpc: Align p_end
powernv/eeh: Add buffer for P7IOC hub error data
powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag()
powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
powerpc: Make unaligned accesses endian-safe for powerpc
powerpc: Fix bad stack check in exception entry
powerpc/512x: dts: disable MPC5125 usb module
powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125)
Pull networking fixes from David Miller:
"Some holiday bug fixes for 3.13... There is still one bug I'd like to
get fixed before 3.13-final.
The vlan code erroneously assignes the header ops of the underlying
real device to the VLAN device above it when the real device can
hardware offload VLAN handling. That's completely bogus because
header ops are tied to the device type, so they only expect to see a
'dev' argument compatible with their ops.
The fix is the have the VLAN code use a special set of header ops that
does the pass-thru correctly, by calling the underlying real device's
header ops but _also_ passing in the real device instead of the VLAN
device.
That fix is currently waiting some testing.
Anyways, of note here:
1) Fix bitmap edge case in radiotap, from Johannes Berg.
2) Fix oops on driver unload in rtlwifi, from Larry Finger.
3) Bonding doesn't do locking correctly during speed/duplex/link
changes, from Ding Tianhong.
4) Fix header parsing in GRE code, this bug has been around for a few
releases. From Timo Teräs.
5) SIT tunnel driver MTU check needs to take GSO into account, from
Eric Dumazet.
6) Minor info leak in inet_diag, from Daniel Borkmann.
7) Info leak in YAM hamradio driver, from Salva Peiró.
8) Fix route expiration state handling in ipv6 routing code, from Li
RongQing.
9) DCCP probe module does not check request_module()'s return value,
from Wang Weidong.
10) cpsw driver passes NULL device names to request_irq(), from
Mugunthan V N.
11) Prevent a NULL splat in RDS binding code, from Sasha Levin.
12) Fix 4G overflow test in tg3 driver, from Nithin Sujir.
13) Cure use after free in arc_emac and fec driver's software
timestamp handling, from Eric Dumazet.
14) SIT driver can fail to release the route when
iptunnel_handle_offloads() throws an error. From Li RongQing.
15) Several batman-adv fixes from Simon Wunderlich and Antonio
Quartulli.
16) Fix deadlock during TIPC socket release, from Ying Xue.
17) Fix regression in ROSE protocol recvmsg() msg_name handling, from
Florian Westphal.
18) stmmac PTP support releases wrong spinlock, from Vince Bridgers"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
stmmac: Fix incorrect spinlock release and PTP cap detection.
phy: IRQ cannot be shared
net: rose: restore old recvmsg behavior
xen-netback: fix guest-receive-side array sizes
fec: Do not assume that PHY reset is active low
tipc: fix deadlock during socket release
netfilter: nf_tables: fix wrong datatype in nft_validate_data_load()
batman-adv: fix vlan header access
batman-adv: clean nf state when removing protocol header
batman-adv: fix alignment for batadv_tvlv_tt_change
batman-adv: fix size of batadv_bla_claim_dst
batman-adv: fix size of batadv_icmp_header
batman-adv: fix header alignment by unrolling batadv_header
batman-adv: fix alignment for batadv_coded_packet
netfilter: nf_tables: fix oops when updating table with user chains
netfilter: nf_tables: fix dumping with large number of sets
ipv6: release dst properly in ipip6_tunnel_xmit
netxen: Correct off-by-one errors in bounds checks
net: Add some clarification to skb_tx_timestamp() comment.
arc_emac: fix potential use after free
...
This patch adjusts the refcount in the walk of the interrupt tree.
When a match is found, there is no need to increase the refcount
on 'out_irq->np' as 'newpar' is already holding a ref. The refcount
balance between 'ipar' and 'newpar' is maintained in the skiplevel:
goto label.
This patch also removes the usage of the device_node variable 'old'
which seems useless after the latest changes.
Nikita Yushchenko reports:
While trying to make freescale p2020ds and mpc8572ds boards working
with mainline kernel, I faced that commit e38c0a1f (Handle
Both these boards have uli1575 chip.
Corresponding part in device tree is something like
With commit e38c0a1f reverted, devices under uli1575 are registered
correctly, e.g. for rtc
OF: ** translation for device /pcie@ffe09000/pcie@0/uli1575@0/isa@1e/rtc@70 **
OF: bus is isa (na=2, ns=1) on /pcie@ffe09000/pcie@0/uli1575@0/isa@1e
OF: translating address: 0000000100000070
OF: parent bus is default (na=3, ns=2) on /pcie@ffe09000/pcie@0/uli1575@0
OF: walking ranges...
OF: ISA map, cp=0, s=1000, da=70
OF: parent translation for: 010000000000000000000000
OF: with offset: 70
OF: one level translation: 000000000000000000000070
OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000/pcie@0
OF: walking ranges...
OF: default map, cp=a0000000, s=20000000, da=70
OF: default map, cp=0, s=10000, da=70
OF: parent translation for: 010000000000000000000000
OF: with offset: 70
OF: one level translation: 010000000000000000000070
OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000
OF: walking ranges...
OF: PCI map, cp=0, s=10000, da=70
OF: parent translation for: 010000000000000000000000
OF: with offset: 70
OF: one level translation: 010000000000000000000070
OF: parent bus is default (na=2, ns=2) on /
OF: walking ranges...
OF: PCI map, cp=0, s=10000, da=70
OF: parent translation for: 00000000ffc10000
OF: with offset: 70
OF: one level translation: 00000000ffc10070
OF: reached root node
With commit e38c0a1f in place, address translation fails:
OF: ** translation for device /pcie@ffe09000/pcie@0/uli1575@0/isa@1e/rtc@70 **
OF: bus is isa (na=2, ns=1) on /pcie@ffe09000/pcie@0/uli1575@0/isa@1e
OF: translating address: 0000000100000070
OF: parent bus is default (na=3, ns=2) on /pcie@ffe09000/pcie@0/uli1575@0
OF: walking ranges...
OF: ISA map, cp=0, s=1000, da=70
OF: parent translation for: 010000000000000000000000
OF: with offset: 70
OF: one level translation: 000000000000000000000070
OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000/pcie@0
OF: walking ranges...
OF: default map, cp=a0000000, s=20000000, da=70
OF: default map, cp=0, s=10000, da=70
OF: not found !
Thierry Reding confirmed this commit was not needed after all:
"We ended up merging a different address representation for Tegra PCIe
and I've confirmed that reverting this commit doesn't cause any obvious
regressions. I think all other drivers in drivers/pci/host ended up
copying what we did on Tegra, so I wouldn't expect any other breakage
either."
There doesn't appear to be a simple way to support both behaviours, so
reverting this as nothing should be depending on the new behaviour.
Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Rob Herring <robh@kernel.org>
Sebastian Ott [Wed, 18 Dec 2013 15:46:02 +0000 (16:46 +0100)]
s390/pci: obtain function handle in hotplug notifier
When using the CLP interface to enable or disable a pci device a
valid function handle needs to be delivered. So far our assumption
was that we always have an up-to-date version of the function handle
(since it doesn't change when the device is in use). This assumption
is incorrect if the pci device is enabled or disabled outside of our
control. When we are notified about such a change we already receive
the new function handle. Just use it.
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Vince Bridgers [Fri, 20 Dec 2013 17:19:34 +0000 (11:19 -0600)]
stmmac: Fix incorrect spinlock release and PTP cap detection.
This patch corrects a problem in stmmac_ptp.c, functions
stmmac_adjust_time and stmmac_adjust_freq where the incorrect spinlocks
were released. This patch also addresses a problem in stmmac_main,
function stmmac_init_ptp where the capability detection for
advanced timestamping was masked by message masking.
This patch was touch tested using linuxptp, and runs without the previously
observed instabilities. More extensive testing is ongoing.
Vince
Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 20 Dec 2013 19:09:04 +0000 (22:09 +0300)]
phy: IRQ cannot be shared
With the way PHY IRQ handler is implemented (all real handling being pushed to
the workqueue and returning IRQ_HANDLED all the time PHY is active), we cannot
really claim that PHY IRQ can be shared when calling request_irq().
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Sun, 22 Dec 2013 23:32:31 +0000 (00:32 +0100)]
net: rose: restore old recvmsg behavior
recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen.
After commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic), we now
always take the else branch due to namelen being initialized to 0.
Digging in netdev-vger-cvs git repo shows that msg_namelen was
initialized with a fixed-size since at least 1995, so the else branch
was never taken.
Compile tested only.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Mon, 23 Dec 2013 09:27:17 +0000 (09:27 +0000)]
xen-netback: fix guest-receive-side array sizes
The sizes chosen for the metadata and grant_copy_op arrays on the guest
receive size are wrong;
- The meta array is needlessly twice the ring size, when we only ever
consume a single array element per RX ring slot
- The grant_copy_op array is way too small. It's sized based on a bogus
assumption: that at most two copy ops will be used per ring slot. This
may have been true at some point in the past but it's clear from looking
at start_new_rx_buffer() that a new ring slot is only consumed if a frag
would overflow the current slot (plus some other conditions) so the actual
limit is MAX_SKB_FRAGS grant_copy_ops per ring slot.
This patch fixes those two sizing issues and, because grant_copy_ops grows
so much, it pulls it out into a separate chunk of vmalloc()ed memory.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Tue, 24 Dec 2013 14:16:01 +0000 (12:16 -0200)]
fec: Do not assume that PHY reset is active low
We should not assume that the PHY reset is always active low.
Retrieve this information from the device tree instead, so that the PHY reset
can work on both cases.
Reported-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The opposite order of holding port lock and node lock on above two
different paths may result in a deadlock. If socket lock instead of
port lock is used to protect port instance in tipc_withdraw(), the
reverse order of holding port lock and node lock will be eliminated,
as a result, the deadlock is killed as well.
Reported-by: Lars Everbrand <lars.everbrand@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Sat, 28 Dec 2013 21:01:47 +0000 (13:01 -0800)]
powerpc: Fix alignment of secondary cpu spin vars
Commit 5c0484e25ec0 ('powerpc: Endian safe trampoline') resulted in
losing proper alignment of the spinlock variables used when booting
secondary CPUs, causing some quite odd issues with failing to boot on
PA Semi-based systems.
This showed itself on ppc64_defconfig, but not on pasemi_defconfig,
so it had gone unnoticed when I initially tested the LE patch set.
Fix is to add explicit alignment instead of relying on good luck. :)
[ It appears that there is a different issue with PA Semi systems
however this fix is definitely correct so applying anyway -- BenH
]
Fixes: 5c0484e25ec0 ('powerpc: Endian safe trampoline') Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811 Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 23 Dec 2013 01:19:51 +0000 (12:19 +1100)]
powerpc: Align p_end
p_end is an 8 byte value embedded in the text section. This means it
is only 4 byte aligned when it should be 8 byte aligned. Fix this
by adding an explicit alignment.
This fixes an issue where POWER7 little endian builds with
CONFIG_RELOCATABLE=y fail to boot.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian W Hart [Fri, 20 Dec 2013 19:06:01 +0000 (13:06 -0600)]
powernv/eeh: Add buffer for P7IOC hub error data
Prevent ioda_eeh_hub_diag() from clobbering itself when called by supplying
a per-PHB buffer for P7IOC hub diagnostic data. Take care to inform OPAL of
the correct size for the buffer.
[Small style change to the use of sizeof -- BenH]
Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian W Hart [Thu, 19 Dec 2013 23:14:07 +0000 (17:14 -0600)]
powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag()
PHB diagnostic buffer may be smaller than PAGE_SIZE, especially when
PAGE_SIZE > 4KB.
Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul E. McKenney [Tue, 17 Dec 2013 22:29:57 +0000 (09:29 +1100)]
powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle
unaligned invocations. However, these shifts were designed for
big-endian systems: On little-endian systems, they must shift in the
opposite direction.
This commit relies on the C preprocessor to insert the correct shifts
into the assembly code.
[ This is a rare but nasty LE issue. Most of the time we use the POWER7
optimised __copy_tofrom_user_power7 loop, but when it hits an exception
we fall back to the base __copy_tofrom_user loop. - Anton ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Make unaligned accesses endian-safe for powerpc
The generic put_unaligned/get_unaligned macros were made endian-safe by
calling the appropriate endian dependent macros based on the endian type
of the powerpc processor.
Signed-off-by: Rajesh B Prathipati <rprathip@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Mon, 16 Dec 2013 04:12:43 +0000 (15:12 +1100)]
powerpc: Fix bad stack check in exception entry
In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1)
is valid when coming from the kernel. If it's not valid, we die but
with a nice oops message.
Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we
check to see if the stack pointer is negative. Unfortunately, this
won't detect a bad stack where r1 is less than INT_FRAME_SIZE.
This patch fixes the check to compare the modified r1 with
-INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL
pointers) are correctly detected again.
Kudos to Paulus for finding this.
Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 29 Dec 2013 21:49:51 +0000 (13:49 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Another smallish batch of fixes, it's been quiet due to the holidays.
Nothing controversial here, a handful of things across the board"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: pxa: fix USB gadget driver compilation regression
ARM: OMAP2+: Fix LCD panel backlight regression for LDP legacy booting
ARM: OMAP2+: hwmod_data: fix missing OMAP_INTC_START in irq data
ARM: DRA7: hwmod: Fix boot crash with DEBUG_LL
ARM: shmobile: r8a7790: fix shdi resource sizes
ARM: shmobile: bockw: fixup DMA mask
ARM: shmobile: armadillo: Add PWM backlight power supply
Linus Torvalds [Sun, 29 Dec 2013 21:35:04 +0000 (13:35 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"There is a small EFI fix and a big power regression fix in this batch.
My queue also had a fix for downing a CPU when there are insufficient
number of IRQ vectors available, but I'm holding that one for now due
to recent bug reports"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Don't select EFI from certain special ACPI drivers
x86 idle: Repair large-server 50-watt idle-power regression
Linus Torvalds [Sun, 29 Dec 2013 21:27:51 +0000 (13:27 -0800)]
Merge tag 'pm+acpi-3.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes and new device IDs from Rafael Wysocki:
- Fix for a cpufreq regression causing stale sysfs files to be left
behind during system resume if cpufreq_add_dev() fails for one or
more CPUs from Viresh Kumar.
- Fix for a bug in cpufreq causing CONFIG_CPU_FREQ_DEFAULT_* to be
ignored when the intel_pstate driver is used from Jason Baron.
- System suspend fix for a memory leak in pm_vt_switch_unregister()
that forgot to release objects after removing them from
pm_vt_switch_list. From Masami Ichikawa.
- Intel Valley View device ID and energy unit encoding update for the
(recently added) Intel RAPL (Running Average Power Limit) driver from
Jacob Pan.
- Intel Bay Trail SoC GPIO and ACPI device IDs for the Low Power
Subsystem (LPSS) ACPI driver from Paul Drews.
* tag 'pm+acpi-3.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap / RAPL: add support for ValleyView Soc
PM / sleep: Fix memory leak in pm_vt_switch_unregister().
cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers
cpufreq: remove sysfs files for CPUs which failed to come back after resume
ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs
David S. Miller [Sun, 29 Dec 2013 05:30:59 +0000 (00:30 -0500)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- reset netfilter-bridge state when removing the batman-adv
header from an incoming packet. This prevents netfilter
bridge from being fooled when the same packet enters a
bridge twice (or more): the first time within the
batman-adv header and the second time without.
- adjust the packet layout to prevent any architecture from
adding padding bytes. All the structs sent over the wire
now have size multiple of 4bytes (unless pack(2) is used).
- fix access to the inner vlan_eth header when reading the
VID in the rx path.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 29 Dec 2013 05:24:28 +0000 (00:24 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
This patchset contains four nf_tables fixes, one IPVS fix due to
missing updates in the interaction with the new sedadj conntrack
extension that was added to support the netfilter synproxy code,
and a couple of one-liners to fix netnamespace netfilter issues.
More specifically, they are:
* Fix ipv6_find_hdr() call without offset being explicitly initialized
in nft_exthdr, as required by that function, from Daniel Borkmann.
* Fix oops in nfnetlink_log when using netns and unloading the kernel
module, from Gao feng.
* Fix BUG_ON in nf_ct_timestamp extension after netns is destroyed,
from Helmut Schaa.
* Fix crash in IPVS due to missing sequence adjustment extension being
allocated in the conntrack, from Jesper Dangaard Brouer.
* Add bugtrap to spot a warning in case you deference sequence adjustment
conntrack area when not available, this should help to catch similar
invalid dereferences in the Netfilter tree, also from Jesper.
* Fix incomplete dumping of sets in nf_tables when retrieving by family,
from me.
* Fix oops when updating the table state (dormant <-> active) and having
user (not base ) chains, from me.
* Fix wrong validation in set element data that results in returning
-EINVAL when using the nf_tables dictionary feature with mappings,
also from me.
We don't usually have this amount of fixes by this time (as we're already
in -rc5 of the development cycle), although half of them are related to
nf_tables which is a relatively new thing, and I also believe that holidays
have also delayed the flight of bugfixes to mainstream a bit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Sat, 28 Dec 2013 23:38:32 +0000 (15:38 -0800)]
Merge tag 'omap-for-v3.13/intc-ldp-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
From Tony Lindgren:
Fix a regression for wrong interrupt numbers for some devices after
the sparse IRQ conversion, fix DRA7 console output for earlyprintk,
and fix the LDP LCD backlight when DSS is built into the kernel and
not as a loadable module.
* tag 'omap-for-v3.13/intc-ldp-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: Fix LCD panel backlight regression for LDP legacy booting
ARM: OMAP2+: hwmod_data: fix missing OMAP_INTC_START in irq data
ARM: DRA7: hwmod: Fix boot crash with DEBUG_LL
+ v3.13-rc5
Olof Johansson [Sat, 28 Dec 2013 23:20:35 +0000 (15:20 -0800)]
Merge tag 'renesas-fixes2-for-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes
From Simon Horman:
Second Round of Renesas ARM based SoC Fixes for v3.13
* r8a7790 (R-Car H2) based Lager board
- Correct SHDI resource sizes
This bug has been present since sdhi resources were added to the r8a7790 by 8c9b1aa41853272a ("ARM: shmobile: r8a7790: add MMCIF and SDHI DT
templates") in v3.11-rc2.
* r8a7778 (R-Car M1) based Bock-W board
- Correct DMA mask
This resolves a regression introduced by 4dcfa60071b3d23f
("ARM: DMA-API: better handing of DMA masks for coherent allocations")
in v3.12-rc1.
* r8a7740 (R-Mobile A1) based Armadillo board
- Add PWM backlight power supply
This resolves a regression introduced by 22ceeee16eb8f0d0
("pwm-backlight: Add power supply support") in v3.12.
* tag 'renesas-fixes2-for-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: r8a7790: fix shdi resource sizes
ARM: shmobile: bockw: fixup DMA mask
ARM: shmobile: armadillo: Add PWM backlight power supply
Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Walleij [Wed, 11 Dec 2013 08:48:58 +0000 (09:48 +0100)]
ARM: pxa: fix USB gadget driver compilation regression
After commit 88f718e3fa4d67f3a8dbe79a2f97d722323e4051
"ARM: pxa: delete the custom GPIO header" a compilation
error was introduced in the PXA25x gadget driver.
An attempt to fix the problem was made in
commit b144e4ab1ef130e8bf30bcd3e529b7f35112c503
"usb: gadget: fix pxa25x compilation problems"
by explictly stating the driver needs the <mach/hardware.h>
header, which solved the compilation for a few boards,
such as the pxa255-idp and its defconfig.
However the Lubbock board has this special clause in
drivers/usb/gadget/pxa25x_udc.c:
This include file has an implicit dependency on
<mach/irqs.h> having been included before <mach/lubbock.h>
was included.
Before commit 88f718e3fa4d67f3a8dbe79a2f97d722323e4051
"ARM: pxa: delete the custom GPIO header" this implicit
dependency for the pxa25x_udc compile on the Lubbock was
satisfied by <linux/gpio.h> implicitly including
<mach/gpio.h> which was in turn including <mach/irqs.h>,
apart from the earlier added <mach/hardware.h>.
Fix this by having the PXA25x <mach/lubbock.h> explicitly
include <mach/irqs.h>.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Greg Kroah-Hartmann <gregkh@linuxfoundation.org> Cc: Felipe Balbi <balbi@ti.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com> Signed-off-by: Olof Johansson <olof@lixom.net>
netfilter: nf_tables: fix wrong datatype in nft_validate_data_load()
This patch fixes dictionary mappings, eg.
add rule ip filter input meta dnat set tcp dport map { 22 => 1.1.1.1, 23 => 2.2.2.2 }
The kernel was returning -EINVAL in nft_validate_data_load() since
the type of the set element data that is passed was the real userspace
datatype instead of NFT_DATA_VALUE.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
batman-adv: clean nf state when removing protocol header
If an interface enslaved into batman-adv is a bridge (or a
virtual interface built on top of a bridge) the nf_bridge
member of the skbs reaching the soft-interface is filled
with the state about "netfilter bridge" operations.
Then, if one of such skbs is locally delivered, the nf_bridge
member should be cleaned up to avoid that the old state
could mess up with other "netfilter bridge" operations when
entering a second bridge.
This is needed because batman-adv is an encapsulation
protocol.
However at the moment skb->nf_bridge is not released at all
leading to bogus "netfilter bridge" behaviours.
Fix this by cleaning the netfilter state of the skb before
it gets delivered to the upper layer in interface_rx().
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Since this is a mac address and always 48 bit, and we can assume that
it is always aligned to 2-byte boundaries, add a pack(2) pragma.
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
struct batadv_icmp_header currently has a size of 17, which
will be padded to 20 on some architectures. Fix this by
unrolling the header into the parent structures.
Moreover keep the ICMP parsing functions as generic as they
are now by using a stub icmp_header struct during packet
parsing.
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
batman-adv: fix header alignment by unrolling batadv_header
The size of the batadv_header of 3 is problematic on some architectures
which automatically pad all structures to a 32 bit boundary. To not lose
performance by packing this struct, better embed it into the various
host structures.
Reported-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The compiler may decide to pad the structure, and then it does not
have the expected size of 46 byte. Fix this by moving it in the
pragma pack(2) part of the code.
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
netfilter: nf_tables: fix dumping with large number of sets
If not table name is specified, the dumping of the existing sets
may be incomplete with a sufficiently large number of sets and
tables. This patch fixes missing reset of the cursors after
finding the location of the last object that has been included
in the previous multi-part message.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Shirish Pargaonkar <spargaonkar@suse.com> Acked-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
Sachin Prabhu [Mon, 25 Nov 2013 17:09:48 +0000 (17:09 +0000)]
cifs: We do not drop reference to tlink in CIFSCheckMFSymlink()
When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain
tlink which also grabs a reference to it. We do not drop this reference
to tlink once we are done with the call.
The patch fixes this issue by instead passing tcon as a parameter and
avoids having to obtain a reference to the tlink. A lookup for the tcon
is already made in the calling functions and this way we avoid having to
re-run the lookup. This is also consistent with the argument list for
other similar calls for M-F symlinks.
We should also return an ENOSYS when we do not find a protocol specific
function to lookup the MF Symlink data.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
Li RongQing [Fri, 20 Dec 2013 09:20:12 +0000 (17:20 +0800)]
ipv6: release dst properly in ipip6_tunnel_xmit
if a dst is not attached to anywhere, it should be released before
exit ipip6_tunnel_xmit, otherwise cause dst memory leakage.
Fixes: 61c1db7fae21 ("ipv6: sit: add GSO/TSO support") Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Gibson [Fri, 20 Dec 2013 04:10:44 +0000 (15:10 +1100)]
netxen: Correct off-by-one errors in bounds checks
netxen_process_lro() contains two bounds checks. One for the ring number
against the number of rings, and one for the Rx buffer ID against the
array of receive buffers.
Both of these have off-by-one errors, using > instead of >=. The correct
versions are used in netxen_process_rcv(), they're just wrong in
netxen_process_lro().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Dec 2013 18:04:33 +0000 (13:04 -0500)]
net: Add some clarification to skb_tx_timestamp() comment.
We've seen so many instances of people invoking skb_tx_timestamp()
after the device already has been given the packet, that it's worth
being a little bit more verbose and explicit in this comment.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 20 Dec 2013 02:10:40 +0000 (18:10 -0800)]
arc_emac: fix potential use after free
Signed-off-by: Eric Dumazet <edumazet@google.com>
skb_tx_timestamp(skb) should be called _before_ TX completion
has a chance to trigger, otherwise it is too late and we access
freed memory.
Fixes: e4f2379db6c6 ("ethernet/arc/arc_emac - Add new driver")
From: Eric Dumazet <edumazet@google.com> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Fri, 20 Dec 2013 01:44:11 +0000 (17:44 -0800)]
tg3: Expand 4g_overflow_test workaround to skb fragments of any size.
The current driver assumes that an skb fragment can only be upto jumbo
size. Presumably this was a fast-path optimization. This assumption is
no longer true as fragments can be upto 32k.
v2: Remove unnecessary parantheses per Eric Dumazet.
Cc: stable@vger.kernel.org Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Lindgren [Fri, 27 Dec 2013 17:51:25 +0000 (09:51 -0800)]
Merge tag 'for-v3.13-rc/hwmod-fixes-b' of git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending into debug-ll-and-ldp-backlight-fix
A few OMAP hwmod fixes for v3.13-rc. One patch fixes some IRQ
problems with GPMC, RNG, and ISP/IVA MMUs on OMAP2/3. The other fixes
some problems with DEBUG_LL on DRA7xx.
Basic build, boot, and PM test logs are available here:
Jamal Hadi Salim [Mon, 23 Dec 2013 13:02:11 +0000 (08:02 -0500)]
net_sched: act: Dont increment refcnt on replace
This is a bug fix. The existing code tries to kill many
birds with one stone: Handling binding of actions to
filters, new actions and replacing of action
attributes. A simple test case to illustrate:
XXXX
moja@fe1:~$ sudo tc actions add action drop index 12
moja@fe1:~$ actions get action gact index 12
action order 1: gact action drop
random type none pass val 0
index 12 ref 1 bind 0
moja@fe1:~$ sudo tc actions replace action ok index 12
moja@fe1:~$ actions get action gact index 12
action order 1: gact action drop
random type none pass val 0
index 12 ref 2 bind 0
XXXX
The above shows the refcounf being wrongly incremented on replace.
There are more complex scenarios with binding of actions to filters
that i am leaving out that didnt work as well...
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>