Kees Cook [Tue, 5 Nov 2013 06:07:01 +0000 (17:07 +1100)]
vsprintf: ignore %n again
This ignores %n in printf again, as was originally documented.
Implementing %n poses a greater security risk than utility, so it should
stay ignored. To help anyone attempting to use %n, a warning will be
emitted if it is encountered.
Based on an earlier patch by Joe Perches.
Because %n was designed to write to pointers on the stack, it has been
frequently used as an attack vector when bugs are found that leak
user-controlled strings into functions that ultimately process format
strings. While this class of bug can still be turned into an information
leak, removing %n eliminates the common method of elevating such a bug
into an arbitrary kernel memory writing primitive, significantly reducing
the danger of this class of bug.
For seq_file users that need to know the length of a written string for
padding, please see seq_setwidth() and seq_pad() instead.
Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tetsuo Handa [Tue, 5 Nov 2013 06:07:00 +0000 (17:07 +1100)]
seq_file: remove "%n" usage from seq_file users
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tetsuo Handa [Tue, 5 Nov 2013 06:07:00 +0000 (17:07 +1100)]
seq_file: introduce seq_setwidth() and seq_pad()
There are several users who want to know bytes written by seq_*() for
alignment purpose. Currently they are using %n format for knowing it
because seq_*() returns 0 on success.
This patch introduces seq_setwidth() and seq_pad() for allowing them to
align without using %n format.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
prep_new_page() initialize page->private (and therefore page->ptl) with 0.
Make sure nobody took it in use in between allocation of the page and
page table constructor.
It can happen if arch try to use slab for page table allocation: slab code
uses page->slab_cache and page->first_page (for tail pages), which share
storage with page->ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm: dynamically allocate page->ptl if it cannot be embedded to struct page
If split page table lock is in use, we embed the lock into struct page of
table's page. We have to disable split lock, if spinlock_t is too big be
to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC enabled.
This patch add support for dynamic allocation of split page table lock if
we can't embed it to struct page.
page->ptl is unsigned long now and we use it as spinlock_t if
sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t.
The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
pgtable_pmd_page_ctor() for PMD table. All other helpers converted to
support dynamically allocated page->ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
At the moment xtensa uses slab allocator for PTE table. It doesn't work
with enabled split page table lock: slab uses page->slab_cache and
page->first_page for its pages. These fields share stroage with
page->ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Chris Zankel <chris@zankel.net> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jonas Bonn <jonas@southpole.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds
In split page table lock case, we embed spinlock_t into struct page. For
obvious reason, we don't want to increase size of struct page if
spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or on
-rt kernel. So we disable split page table lock, if spinlock_t is too big.
This patchset allows to allocate the lock dynamically if spinlock_t is
big. In this page->ptl is used to store pointer to spinlock instead of
spinlock itself. It costs additional cache line for indirect access, but
fix page fault scalability for multi-threaded applications.
LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
LOCK_STAT to analyse scalability issues breaks scalability. ;)
The patchset mostly fixes this. Results for ./thp_memscale -c 80 -b 512M
on 4-socket machine:
baseline, no CONFIG_LOCK_STAT: 9.115460703 seconds time elapsed
baseline, CONFIG_LOCK_STAT=y: 53.890567123 seconds time elapsed
patched, no CONFIG_LOCK_STAT: 8.852250368 seconds time elapsed
patched, CONFIG_LOCK_STAT=y: 11.069770759 seconds time elapsed
Patch count is scary, but most of them trivial. Overview:
Patches 1-4 Few bug fixes. No dependencies to other patches.
Probably should applied as soon as possible.
Patch 5 Changes signature of pgtable_page_ctor(). We will use it
for dynamic lock allocation, so it can fail.
Patches 6-8 Add missing constructor/destructor calls on few archs.
It's fixes NR_PAGETABLE accounting and prepare to use
split ptl.
Patches 9-33 Add pgtable_page_ctor() fail handling to all archs.
Patches 34 Finally adds support of dynamically-allocated page->pte.
Also contains documentation for split page table lock.
This patch (of 34):
I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
enabled. Let's add missed constructor/destructor calls.
I haven't noticed it during testing since prep_new_page() clears
page->mapping and therefore page->ptl. It's effectively equal to
spin_lock_init(&page->ptl).
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Chris Zankel <chris@zankel.net> Cc: Christoph Lameter <cl@linux.com> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The basic idea is the same as with PTE level: the lock is embedded into
struct page of table's page.
We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
take mm->page_table_lock anymore. Let's reuse page->lru of table's page
for that.
pgtable_pmd_page_ctor() returns true, if initialization is successful and
false otherwise. Current implementation never fails, but assumption that
constructor can fail will help to port it to -rt where spinlock_t is
rather huge and cannot be embedded into struct page -- dynamic allocation
is required.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently mm->pmd_huge_pte protected by page table lock. It will not work
with split lock. We have to have per-pmd pmd_huge_pte for proper access
serialization.
For now, let's just introduce wrapper to access mm->pmd_huge_pte.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm: avoid increase sizeof(struct page) due to split page table lock
Alex Thorlton noticed that some massively threaded workloads work poorly,
if THP enabled. This patchset fixes this by introducing split page table
lock for PMD tables. hugetlbfs is not covered yet.
This patchset is based on work by Naoya Horiguchi.
: akpm result summary:
:
: THP off, v3.12-rc2: 18.059261877 seconds time elapsed
: THP off, patched: 16.768027318 seconds time elapsed
:
: THP on, v3.12-rc2: 42.162306788 seconds time elapsed
: THP on, patched: 8.397885779 seconds time elapsed
:
: HUGETLB, v3.12-rc2: 47.574936948 seconds time elapsed
: HUGETLB, patched: 19.447481153 seconds time elapsed
CONFIG_GENERIC_LOCKBREAK increases sizeof(spinlock_t) to 8 bytes. It
leads to increase sizeof(struct page) by 4 bytes on 32-bit system if split
page table lock is in use, since page->ptl shares space in union with
longs and pointers.
Let's disable split page table lock on 32-bit systems with
GENERIC_LOCKBREAK enabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fix mm-drop-actor-argument-of-do_generic_file_read for linux-next changes
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There's only one caller of do_generic_file_read() and the only actor is
file_read_actor(). No reason to have a callback parameter.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Tue, 5 Nov 2013 06:06:29 +0000 (17:06 +1100)]
net/netfilter/ipset/ip_set_hash_netportnet.c: fix build with older gccs
net/netfilter/ipset/ip_set_hash_netportnet.c: In function 'hash_netportnet4_kadt':
net/netfilter/ipset/ip_set_hash_netportnet.c:151: error: unknown field 'cidr' specified in initializer
net/netfilter/ipset/ip_set_hash_netportnet.c:151: warning: missing braces around initializer
net/netfilter/ipset/ip_set_hash_netportnet.c:151: warning: (near initialization for 'e.<anonymous>')
etc
gcc-4.4.4 doesn't like that anonymous union initializer and I couldnt'
find a way of tricking it into doing the right thing, so open-code it.
Cc: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Tue, 5 Nov 2013 06:06:29 +0000 (17:06 +1100)]
net/netfilter/ipset/ip_set_hash_netnet.c: fix build with older gcc
net/netfilter/ipset/ip_set_hash_netnet.c: In function 'hash_netnet4_kadt':
net/netfilter/ipset/ip_set_hash_netnet.c:141: error: unknown field 'cidr' specified in initializer
net/netfilter/ipset/ip_set_hash_netnet.c:141: warning: missing braces around initializer
net/netfilter/ipset/ip_set_hash_netnet.c:141: warning: (near initialization for 'e.<anonymous>')
etc.
gcc-4.4.4 doesn't like that anonymous union initializer and I couldnt'
find a way of tricking it into doing the right thing, so open-code it.
Cc: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Josh Triplett [Tue, 5 Nov 2013 05:57:53 +0000 (16:57 +1100)]
scripts/bloat-o-meter: use .startswith rather than fragile slicing
str.startswith has existed since at least Python 2.0, in 2000; use it
rather than a fragile comparison against an initial slice of a string,
which requires hard-coding the length of the string to compare against.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Josh Triplett [Tue, 5 Nov 2013 05:57:52 +0000 (16:57 +1100)]
scripts/bloat-o-meter: ignore changes in the size of linux_banner
linux_banner can change size due to changes in the compiler, build number,
or the user@host the system was compiled on; ignore size changes in
linux_banner entirely.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ipc/msgutil.c: In function 'alloc_msg':
ipc/msgutil.c:54: warning: comparison of distinct pointer types lacks a cast
ipc/msgutil.c:66: warning: comparison of distinct pointer types lacks a cast
ipc/msgutil.c: In function 'load_msg':
ipc/msgutil.c:94: warning: comparison of distinct pointer types lacks a cast
ipc/msgutil.c:101: warning: comparison of distinct pointer types lacks a cast
ipc/msgutil.c: In function 'copy_msg':
ipc/msgutil.c:127: warning: comparison of distinct pointer types lacks a cast
ipc/msgutil.c:135: warning: comparison of distinct pointer types lacks a cast
ipc/msgutil.c: In function 'store_msg':
ipc/msgutil.c:155: warning: comparison of distinct pointer types lacks a cast
ipc/msgutil.c:162: warning: comparison of distinct pointer types lacks a cast
Mathias Krause [Tue, 5 Nov 2013 05:57:51 +0000 (16:57 +1100)]
ipc, msg: fix message length check for negative values
On 64 bit systems the test for negative message sizes is bogus as the
size, which may be positive when evaluated as a long, will get truncated
to an int when passed to load_msg(). So a long might very well contain a
positive value but when truncated to an int it would become negative.
That in combination with a small negative value of msg_ctlmax (which will
be promoted to an unsigned type for the comparison against msgsz, making
it a big positive value and therefore make it pass the check) will lead to
two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
small buffer as the addition of alen is effectively a subtraction. 2/ The
copy_from_user() call in load_msg() will first overflow the buffer with
userland data and then, when the userland access generates an access
violation, the fixup handler copy_user_handle_tail() will try to fill the
remainder with zeros -- roughly 4GB. That almost instantly results in a
system crash or reset.
,-[ Reproducer (needs to be run as root) ]--
| #include <sys/stat.h>
| #include <sys/msg.h>
| #include <unistd.h>
| #include <fcntl.h>
|
| int main(void) {
| long msg = 1;
| int fd;
|
| fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
| write(fd, "-1", 2);
| close(fd);
|
| msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
|
| return 0;
| }
'---
Fix the issue by preventing msgsz from getting truncated by consistently
using size_t for the message length. This way the size checks in
do_msgsnd() could still be passed with a negative value for msg_ctlmax but
we would fail on the buffer allocation in that case and error out.
Also change the type of m_ts from int to size_t to avoid similar nastiness
in other code paths -- it is used in similar constructs, i.e. signed vs.
unsigned checks. It should never become negative under normal
circumstances, though.
Setting msg_ctlmax to a negative value is an odd configuration and should
be prevented. As that might break existing userland, it will be handled
in a separate commit so it could easily be reverted and reworked without
reintroducing the above described bug.
Hardening mechanisms for user copy operations would have catched that bug
early -- e.g. checking slab object sizes on user copy operations as the
usercopy feature of the PaX patch does. Or, for that matter, detect the
long vs. int sign change due to truncation, as the size overflow plugin
of the very same patch does.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Brad Spengler <spender@grsecurity.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> [ v2.3.27+ -- yes, that old ;) ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ilija Hadzic [Tue, 5 Nov 2013 05:57:50 +0000 (16:57 +1100)]
devpts: plug the memory leak in kill_sb
When devpts is unmounted, there may be a no-longer-used IDR tree hanging
off the superblock we are about to kill. This needs to be cleaned up
before destroying the SB.
The leak is usually not a big deal because unmounting devpts is typically
done when shutting down the whole machine. However, shutting down an LXC
container instead of a physical machine exposes the problem (the garbage
is detectable with kmemleak).
Stefani Seibold [Tue, 5 Nov 2013 05:57:49 +0000 (16:57 +1100)]
kfifo API type safety
This patch enhances the type safety for the kfifo API. It is now safe to
put const data into a non const FIFO and the API will now generate a
compiler warning when reading from the fifo where the destination address
is pointing to a const variable.
As a side effect the kfifo_put() does now expect the value of an element
instead a pointer to the element. This was suggested Russell King. It
make the handling of the kfifo_put easier since there is no need to create
a helper variable for getting the address of a pointer or to pass integers
of different sizes.
IMHO the API break is okay, since there are currently only six users of
kfifo_put().
The code is also cleaner by kicking out the "if (0)" expressions.
Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make menuconfig allows one to choose compression format of an initial
ramdisk image. But this choice does not result in duly compressed ramdisk
image. Because - $ make install - does not pass on the selected
compression choice to the dracut(8) tool, which creates the initramfs
file. dracut(8) generates the image with the default compression, ie.
gzip(1).
This patch exports the selected compression option to a sub-shell
environment, so that it could be used by dracut(8) tool to generate
appropriately compressed initramfs images.
There isn't a straightforward way to pass on options to dracut(8) via
positional parameters. Because it is indirectly invoked at the end of a $
make install sequence.
init/Kconfig: add option to disable kernel compression
Some ARC users say they can boot faster with without kernel compression.
This probably depends on things like the FLASH chip they use etc.
Until now, kernel compression can only be disabled by removing "select
HAVE_<compression>" lines from the architecture Kconfig. So add the
Kconfig logic to permit disabling of kernel compression.
Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drivers: w1: make w1_slave::flags long to avoid memory corruption
On architectures where long is more then 32 bits, modifying a 32-bit field
with set_bit (and other atomic bit operations) may cause bytes following
the field to by modified.
Because the endianness of the bits within a field is the native endianness
of the CPU[1], on big-endian machines, bit number zero is in the last byte
of the field.
Therefore, `set_bit(0, ptr)' on a 64-bit big-endian machine is roughly
equivalent to `((char *)ptr)[7] |= 1', and since w1 driver uses a 32-bit
field for holding the flags, this causes bytes beyond the field to be
modified.
[1] From Documentation/atomic_ops.txt:
Native atomic bit operations are defined to operate on objects
aligned to the size of an "unsigned long" C data type, and are
least of that size. The endianness of the bits within each
"unsigned long" are the native endianness of the cpu.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Tue, 5 Nov 2013 05:57:46 +0000 (16:57 +1100)]
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
attrs field of attribute_group structure is a pointer to a pointer (as in
an array of pointers) rather than pointer to attribute struct (as in an
array of structures), so when allocating size of the pointer sholud be
used instead of the structure it is pointing to.
While at it, also change the call to use kcalloc rather than kzalloc.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alex Dubov <oakad@yahoo.com> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Paul Chavent [Tue, 5 Nov 2013 05:57:45 +0000 (16:57 +1100)]
pps : add non blocking option to PPS_FETCH ioctl.
The PPS_FETCH ioctl is blocking still the reception of a PPS event. But,
in some case, one may immediately need the last event date. This patch
allow to get the result of PPS_FETCH if the device has the O_NONBLOCK flag
set.
Signed-off-by: Paul Chavent <paul.chavent@onera.fr> Acked-by: Rodolfo Giometti <giometti@enneenne.com> Cc: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Shevchenko [Tue, 5 Nov 2013 05:57:43 +0000 (16:57 +1100)]
gcov: reuse kbasename helper
To get name of the file from a pathname let's use kbasename() helper.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ERROR: that open brace { should be on the previous line
#227: FILE: kernel/gcov/gcc_4_7.c:179:
+ for (fi_idx = 0; fi_idx < info->n_functions; fi_idx++)
+ {
ERROR: that open brace { should be on the previous line
#269: FILE: kernel/gcov/gcc_4_7.c:221:
+ for (fi_idx = 0; fi_idx < src->n_functions; fi_idx++)
+ {
total: 2 errors, 0 warnings, 574 lines checked
./patches/gcov-add-support-for-gcc-47-gcov-format.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Frantisek Hrbata <fhrbata@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The gcov in-memory format changed in gcc 4.7. The biggest change, which
requires this special implementation, is that gcov_info no longer contains
array of counters for each counter type for all functions and gcov_fn_info
is not used for mapping of function's counters to these arrays(offset).
Now each gcov_fn_info contans it's counters, which makes things a little
bit easier.
This is heavily based on the previous gcc_3_4.c implementation and patches
provided by Peter Oberparleiter. Specially the buffer gcda implementation
for iterator.
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Gospodarek <agospoda@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
gcov: move gcov structs definitions to a gcc version specific file
Since also the gcov structures(gcov_info, gcov_fn_info, gcov_ctr_info) can
change between gcc releases, as shown in gcc 4.7, they cannot be defined
in a common header and need to be moved to a specific gcc implemention
file. This also requires to make the gcov_info structure opaque for the
common code and to introduce simple helpers for accessing data inside
gcov_info.
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Gospodarek <agospoda@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Gang [Tue, 5 Nov 2013 05:57:38 +0000 (16:57 +1100)]
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
For registering in add_del_listener(), when kmalloc_node() fails, need
return -ENOMEM instead of success code, and cmd_attr_register_cpumask()
wants to know about it.
After modification, give a simple common test "build -> boot up ->
kernel/controllers/cgroup/getdelays by LTP tools".
Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Gang [Tue, 5 Nov 2013 05:57:37 +0000 (16:57 +1100)]
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
snprintf() will return the 'ideal' length which may be larger than real
buffer length, if we only want to use real length, need use scnprintf()
instead of.
Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Gang [Tue, 5 Nov 2013 05:57:36 +0000 (16:57 +1100)]
kernel/sysctl.c: check return value after call proc_put_char() in __do_proc_doulongvec_minmax()
Need to check the return value of proc_put_char(), as was done in
__do_proc_doulongvec_minmax().
Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Gang [Tue, 5 Nov 2013 05:57:36 +0000 (16:57 +1100)]
kernel/kexec.c: use vscnprintf() instead of vsnprintf() in vmcoreinfo_append_str()
vsnprintf() may let 'r' larger than sizeof(buf), in this case, if 'r' is
also less than "vmcoreinfo_max_size - vmcoreinfo_size" (left size of
destination buffer), next memcpy() will read the unexpected addresses.
Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kees Cook [Tue, 5 Nov 2013 05:57:35 +0000 (16:57 +1100)]
exec/ptrace: fix get_dumpable() incorrect tests
The get_dumpable() return value is not boolean. Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
protected state. Almost all places did this correctly, excepting the two
places fixed in this patch.
Wrong logic:
if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
or
if (dumpable == 0) { /* be protective */ }
or
if (!dumpable) { /* be protective */ }
Correct logic:
if (dumpable != SUID_DUMP_USER) { /* be protective */ }
or
if (dumpable != 1) { /* be protective */ }
Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user. (This may have been partially mitigated if Yama was enabled.)
The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.
Josh Triplett [Tue, 5 Nov 2013 05:57:34 +0000 (16:57 +1100)]
Documentation/ABI: document the non-ABI status of Kconfig and symbols
Discussion at Kernel Summit made it clear that the presence or absence of
specific Kconfig symbols are not considered ABI, and that no userspace (or
bootloader, etc) should rely on them.
In addition, kernel-internal symbols are well established as non-ABI, per
Documentation/stable_api_nonsense.txt.
Document both of these in Documentation/ABI/README, in a new section for
notable bits of non-ABI.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Rob Landley <rob@landley.net> Cc: Tao Ma <boyu.mt@taobao.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Berg [Tue, 5 Nov 2013 05:57:33 +0000 (16:57 +1100)]
kernel-doc: improve "no structured comments found" error
When using '!Ffile function' in a docbook template, and the function no
longer exists, you get a "no structured comments found" error from the
kernel-doc processing script. It's useful to know which functions it was
looking for, so print them out in this case. Also do the same for '!Pfile
doc-section'
The same error also happens when using '!Efile' when some exported
functions aren't documented (in the same file.) There's a very large
number of such functions though, so don't print the message in this case
-- right now it would give ~850 messages.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Rob Landley <rob@landley.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stefan Raspl [Tue, 5 Nov 2013 05:57:33 +0000 (16:57 +1100)]
Documentation/trace/tracepoints.txt: add links to TRACE_EVENT documentation
Existing tracepoint documentation doesn't mention the popular TRACE_EVENT
macro. Since an excellent series of articles on proper usage already
exists, respective links are added to the existing documentation.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Cc: Rob Landley <rob@landley.net> Cc: Jiri Kosina <jkosina@suse.cz> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namjae Jeon [Tue, 5 Nov 2013 05:57:32 +0000 (16:57 +1100)]
fat: fallback to buffered write in case of fallocatded region on direct IO
For normal cases of direct IO write, trying to seek to location greater
than file size, makes it fall back to buffered write to fill that region.
Similarly, in case for write in Fallocated region, make it fall to
buffered write.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namjae Jeon [Tue, 5 Nov 2013 05:57:31 +0000 (16:57 +1100)]
fat: zero out seek range on _fat_get_block
For normal buffered write operations, normally if we try to write to an
offset > than file size, it does a cont_expand_zero till that offset.
Now, in case of fallocated regions, since the blocks are already
allocated. So, make it zero out that buffers for those blocks till the
seek'ed offset.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>