Axel Lin [Tue, 26 Jul 2011 00:13:16 +0000 (17:13 -0700)]
drivers/leds/leds-netxbig: make LEDS_NETXBIG depend on LEDS_CLASS
We call led_classdev_register/led_classdev_unregister in
create_netxbig_led/delete_netxbig_led, thus make LEDS_NETXBIG depend on
LEDS_CLASS.
This patch fixes below build error if LEDS_CLASS is not configured.
LD .tmp_vmlinux1
drivers/built-in.o: In function `create_netxbig_led':
drivers/leds/leds-netxbig.c:350: undefined reference to `led_classdev_register'
drivers/leds/leds-netxbig.c:361: undefined reference to `led_classdev_unregister'
drivers/built-in.o: In function `delete_netxbig_led':
drivers/leds/leds-netxbig.c:313: undefined reference to `led_classdev_unregister'
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Acked-by: Simon Guinot <sguinot@lacie.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- return -ENOMEM if kzalloc fails, rather than the current -EINVAL
- fix a memory leak in the case of goto out_unregister_led_cdevs
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:13 +0000 (17:13 -0700)]
get_maintainers.pl: improve .mailmap parsing
Entries that used formats other than "Proper Name <commit@email.xx>"
were not parsed properly.
Try to improve the parsing so that the entries in the forms of:
Proper Name <proper@email.xx> <commit@email.xx>
and
Proper Name <proper@email.xx> Commit Name <commit@email.xx>
are transformed correctly.
Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Florian Mickler <florian@mickler.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Boyd [Tue, 26 Jul 2011 00:13:12 +0000 (17:13 -0700)]
kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n
If CONFIG_IKCONFIG=m but CONFIG_IKCONFIG_PROC=n we get a module that has
no MODULE_LICENSE definition. Move the MODULE_*() definitions outside the
CONFIG_IKCONFIG_PROC #ifdef to prevent this configuration from tainting
the kernel.
Signed-off-by: Stephen Boyd <bebarino@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:12 +0000 (17:13 -0700)]
notifiers: vt: move vt notifiers into vt.h
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:11 +0000 (17:13 -0700)]
notifiers: pm: move pm notifiers into suspend.h
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com> Cc: David Miller <davem@davemloft.net> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:10 +0000 (17:13 -0700)]
notifiers: sys: move reboot notifiers into reboot.h
It is not necessary to share the same notifier.h.
This patch already moves register_reboot_notifier() and
unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.
[amwang@redhat.com: make allyesconfig succeed on ppc64] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:09 +0000 (17:13 -0700)]
notifiers: net: move netdevice notifiers into netdevice.h
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com> Acked-by: David Miller <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:08 +0000 (17:13 -0700)]
notifiers: cpu: move cpu notifiers into cpu.h
We presently define all kinds of notifiers in notifier.h. This is not
necessary at all, since different subsystems use different notifiers, they
are almost non-related with each other.
This can also save much build time. Suppose I add a new netdevice event,
really I don't have to recompile all the source, just network related.
Without this patch, all the source will be recompiled.
I move the notify events near to their subsystem notifier registers, so
that they can be found more easily.
This patch:
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Donggeun Kim [Tue, 26 Jul 2011 00:13:07 +0000 (17:13 -0700)]
drivers/misc: add support the FSA9480 USB Switch
The FSA9480 is a USB port accessory detector and switch. This patch adds
support the FSA9480 USB Switch.
[akpm@linux-foundation.org: make a couple of things static] Signed-off-by: Donggeun Kim <dg77.kim@samsung.com> Signed-off-by: Minkyu Kang <mk7.kang@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As stated in drivers/mfd/cs5535-mfd.c, the mfd driver exposes the BARs
which then make the GPIO, MFGPT, ACPI, etc. all visible to the system.
So the dependencies of the MFGPT stuff have changed, and most people
expect Kconfig to bring in the necessary dependencies. Without them, the
module fails to load and most people don't understand why because the
details of the rewrite aren't captured anywhere most people who know to
look.
This dependency needs to be reflected in Kconfig.
Signed-off-by: Philip A. Prindeville <philipp@redfish-solutions.com> Acked-by: Alexandros C. Couloumbis <alex@ozo.com> Acked-by: Andres Salomon <dilinger@queued.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: vmlinux.o(.data+0x15d3ac): Section mismatch in reference from the variable pci_eisa_driver to the function .init.text:pci_eisa_init()
The variable pci_eisa_driver references the function __init pci_eisa_init()
If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
WANG Cong [Tue, 26 Jul 2011 00:13:02 +0000 (17:13 -0700)]
include/linux/kernel.h: fix a headers_check warning
Fix the warning:
usr/include/linux/kernel.h:65: userspace cannot reference function or variable defined in the kernel
As Michal noted, BUILD_BUG_ON stuffs should be moved
under #ifdef __KERNEL__.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
include/linux/ioport.h: new helper to define common struct resource constructs
Resource definitions that just define start, end and flags =
IORESOURCE_MEM or IORESOURCE_IRQ (with start=end) are quite common. So
introduce a shortcut for them. For completeness add macros for
IORESOURCE_DMA and IORESOURCE_IO, too and also make available a set of
macros to specify named resources of all types which are less common.
Maxin B John [Tue, 26 Jul 2011 00:12:59 +0000 (17:12 -0700)]
devres: fix possible use after free
devres uses the pointer value as key after it's freed, which is safe but
triggers spurious use-after-free warnings on some static analysis tools.
Rearrange code to avoid such warnings.
Signed-off-by: Maxin B. John <maxin.john@gmail.com> Reviewed-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 26 Jul 2011 00:12:58 +0000 (17:12 -0700)]
asm-generic/system.h: drop useless __KERNEL__
This header isn't exported to user-space, and even if it was, the
__KERNEL__ check covers the entire file, so we'd get a useless stub in the
first place. So punt it.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Until now UML had no x86_64 vDSO. So glibc always used the vsyscall page
for gettimeday() and friends. Calls to gettimeday() returned falsely the
host time and confused some programs.
This patch adds a vDSO which turns all __vdso_* calls into a system call
so that UML can trap them.
As glibc still uses the vsyscall page for static binaries this patch
improves the situation only for dynamic binaries.
Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When UML is unable to reuse the host's vDSO FIXADDR_USER_START is zero.
To handle this special case correclty we have to implement custom gate
area helper methods.
Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/um/os-Linux/helper.c: In function `helper_child':
arch/um/os-Linux/helper.c:38:7: warning: ignoring return value of `write', declared with attribute warn_unused_result
[richard@nod.at: happens only with -D_FORTIFY_SOURCE=2] Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/um/drivers/cow_user.c: In function `absolutize':
arch/um/drivers/cow_user.c:189:7: warning: ignoring return value of `chdir', declared with attribute warn_unused_result
[richard@nod.at: happens only with -D_FORTIFY_SOURCE=2] Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0954828fcbf3 ("kconfig: replace KERNELVERSION usage by the
mainmenu's prompt") made the kernel version disappear from the generated
.config file when configuring for UML. As UML's Kconfig doesn't have a
mainmenu, kconfig falls back to the default string "Linux Kernel
Configuration".
Add a suitable mainmenu to the main UML Kconfig file to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
um: fix _FORTIFY_SOURCE=2 support for kernel modules
When UML is compiled with _FORTIFY_SOURCE we have to export all _chk()
functions which are used in modules. For now it's only the case for
__sprintf_chk().
There is no need to define VM_{STACK,DATA}_DEFAULT_FLAGS as variable.
It's also useless to test for TIF_IA32 as 64bit UML has no IA32 emulation.
Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
writeback: account NR_WRITTEN at IO completion time
NR_WRITTEN is now accounted at block IO enqueue time, which is not very
accurate as to common understanding. This moves NR_WRITTEN accounting to
the IO completion time and makes it more consistent with BDI_WRITTEN,
which is used for bandwidth estimation.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmem_unuse_inode() and shmem_writepage() contain a little code to cope
with pages inserted independently into the filecache, probably by a
filesystem stacked on top of tmpfs, then fed to its ->readpage() or
->writepage().
Unionfs was indeed experimenting with working in that way three years ago,
but I find no current examples: nowadays the stacking filesystems use vfs
interfaces to the lower filesystem.
It's now illegal: remove most of that code, adding some WARN_ON_ONCEs.
We can now simplify shmem_getpage_gfp(): there is no longer a dilemma of
filepage passed in via shmem_readpage(), then swappage found, which must
then be copied over to it.
Although at first it's tempting to replace the **pagep arg by returning
struct page *, that makes a mess of IS_ERR_OR_NULL(page)s in all the
callers, so leave as is.
Insert BUG_ON(!PageUptodate) when we find and lock page: some of the
complication came from uninitialized pages inserted into filecache prior
to readpage; but now we're in control, and only release pagelock on
filecache once it's uptodate (if an error occurs in reading back from
swap, the page remains in swapcache, never moved to filecache).
The prealloc_page handling in shmem_getpage_gfp() is unnecessarily
complicated: first simplify that before going on to filepage/swappage.
That's right, don't report ENOMEM when the preallocation fails: we may or
may not need the page. But simply report ENOMEM once we find we do need
it, instead of dropping lock, repeating allocation, unwinding on failure
etc. And leave the out label on the fast path, don't goto.
Fix something that looks like a bug but turns out not to be: set
PageSwapBacked on prealloc_page before its mem_cgroup_cache_charge(), as
the removed case was doing. That's important before adding to LRU
(determines which LRU the page goes on), and does affect which path it
takes through memcontrol.c, but in the end MEM_CGROUP_CHANGE_TYPE_ SHMEM
is handled no differently from CACHE.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Shaohua Li <shaohua.li@intel.com> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove that pernicious shmem_readpage() at last: the things we needed it
for (splice, loop, sendfile, i915 GEM) are now fully taken care of by
shmem_file_splice_read() and shmem_read_mapping_page_gfp().
This removal clears the way for a simpler shmem_getpage_gfp(), since page
is never passed in; but leave most of that cleanup until after.
sys_readahead() and sys_fadvise(POSIX_FADV_WILLNEED) will now EINVAL,
instead of unexpectedly trying to read ahead on tmpfs: if that proves to
be an issue for someone, then we can either arrange for them to return
success instead, or try to implement async readahead on tmpfs.
Make shmem_getpage() a wrapper, passing mapping_gfp_mask() down to
shmem_getpage_gfp(), which in turn passes gfp down to shmem_swp_alloc().
Change shmem_read_mapping_page_gfp() to use shmem_getpage_gfp() in the
CONFIG_SHMEM case; but leave tiny !SHMEM using read_cache_page_gfp().
Add a BUG_ON() in case anyone happens to call this on a non-shmem mapping;
though we might later want to let that case route to read_cache_page_gfp().
It annoys me to have these two almost-redundant args, gfp and fault_type:
I can't find a better way; but initialize fault_type only in shmem_fault().
Note that before, read_cache_page_gfp() was allocating i915_gem's pages
with __GFP_NORETRY as intended; but the corresponding swap vector pages
got allocated without it, leaving a small possibility of OOM.
Copy __generic_file_splice_read() and generic_file_splice_read() from
fs/splice.c to shmem_file_splice_read() in mm/shmem.c. Make
page_cache_pipe_buf_ops and spd_release_page() accessible to it.
mm/futex: fix futex writes on archs with SW tracking of dirty & young
I haven't reproduced it myself but the fail scenario is that on such
machines (notably ARM and some embedded powerpc), if you manage to hit
that futex path on a writable page whose dirty bit has gone from the PTE,
you'll livelock inside the kernel from what I can tell.
It will go in a loop of trying the atomic access, failing, trying gup to
"fix it up", getting succcess from gup, go back to the atomic access,
failing again because dirty wasn't fixed etc...
So I think you essentially hang in the kernel.
The scenario is probably rare'ish because affected architecture are
embedded and tend to not swap much (if at all) so we probably rarely hit
the case where dirty is missing or young is missing, but I think Shan has
a piece of SW that can reliably reproduce it using a shared writable
mapping & fork or something like that.
On archs who use SW tracking of dirty & young, a page without dirty is
effectively mapped read-only and a page without young unaccessible in the
PTE.
Additionally, some architectures might lazily flush the TLB when relaxing
write protection (by doing only a local flush), and expect a fault to
invalidate the stale entry if it's still present on another processor.
The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
"fix it up" by causing get_user_pages() which would then be equivalent to
taking the fault.
However that isn't the case. get_user_pages() will not call
handle_mm_fault() in the case where the PTE seems to have the right
permissions, regardless of the dirty and young state. It will eventually
update those bits ... in the struct page, but not in the PTE.
Additionally, it will not handle the lazy TLB flushing that can be
required by some architectures in the fault case.
Basically, gup is the wrong interface for the job. The patch provides a
more appropriate one which boils down to just calling handle_mm_fault()
since what we are trying to do is simulate a real page fault.
The futex code currently attempts to write to user memory within a
pagefault disabled section, and if that fails, tries to fix it up using
get_user_pages().
This doesn't work on archs where the dirty and young bits are maintained
by software, since they will gate access permission in the TLB, and will
not be updated by gup().
In addition, there's an expectation on some archs that a spurious write
fault triggers a local TLB flush, and that is missing from the picture as
well.
I decided that adding those "features" to gup() would be too much for this
already too complex function, and instead added a new simpler
fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
which the futex code can call.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Shan Hai <haishan.bai@gmail.com> Tested-by: Shan Hai <haishan.bai@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Darren Hart <darren.hart@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: page allocator: reconsider zones for allocation after direct reclaim
With zone_reclaim_mode enabled, it's possible for zones to be considered
full in the zonelist_cache so they are skipped in the future. If the
process enters direct reclaim, the ZLC may still consider zones to be full
even after reclaiming pages. Reconsider all zones for allocation if
direct reclaim returns successfully.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: page allocator: initialise ZLC for first zone eligible for zone_reclaim
There have been a small number of complaints about significant stalls
while copying large amounts of data on NUMA machines reported on a
distribution bugzilla. In these cases, zone_reclaim was enabled by
default due to large NUMA distances. In general, the complaints have not
been about the workload itself unless it was a file server (in which case
the recommendation was disable zone_reclaim).
The stalls are mostly due to significant amounts of time spent scanning
the preferred zone for pages to free. After a failure, it might fallback
to another node (as zonelists are often node-ordered rather than
zone-ordered) but stall quickly again when the next allocation attempt
occurs. In bad cases, each page allocated results in a full scan of the
preferred zone.
Patch 1 checks the preferred zone for recent allocation failure
which is particularly important if zone_reclaim has failed
recently. This avoids rescanning the zone in the near future and
instead falling back to another node. This may hurt node locality
in some cases but a failure to zone_reclaim is more expensive than
a remote access.
Patch 2 clears the zlc information after direct reclaim.
Otherwise, zone_reclaim can mark zones full, direct reclaim can
reclaim enough pages but the zone is still not considered for
allocation.
This was tested on a 24-thread 2-node x86_64 machine. The tests were
focused on large amounts of IO. All tests were bound to the CPUs on
node-0 to avoid disturbances due to processes being scheduled on different
nodes. The kernels tested are
MMTests Statistics: vmstat
Page Ins 46268 63840 66008
Page Outs 908215969067112888043732
Swap Ins 0 0 0
Swap Outs 0 0 0
Direct pages scanned 1309169789668638971790
Kswapd pages scanned 0 18300111831116
Kswapd pages reclaimed 0 18290681829930
Direct pages reclaimed 1303777789568288648314
Kswapd efficiency 100% 99% 99%
Kswapd velocity 0.000 810.643 826.346
Direct efficiency 99% 99% 96%
Direct velocity 5340.128 3972.068 4048.788
Percentage direct scans 100% 83% 83%
Page writes by reclaim 0 3 0
Slabs scanned 796672 720640 720256
Direct inode steals 742266771600127088638
Kswapd inode steals 0 17368402021238
Test completes far faster with a large increase in the number of files
created per second. Standard deviation is high as a small number of
iterations were much higher than the mean. The number of pages scanned by
zone_reclaim is reduced and kswapd is used for more work.
LARGE DD
3.0-rc6 3.0-rc6 3.0-rc6
vanilla zlcfirst zlcreconsider
download tar 59 ( 0.00%) 59 ( 0.00%) 55 ( 7.27%)
dd source files 527 ( 0.00%) 296 (78.04%) 320 (64.69%)
delete source 36 ( 0.00%) 19 (89.47%) 20 (80.00%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 125.03 118.98 122.01
Total Elapsed Time (seconds) 624.56 375.02 398.06
This test downloads a large tarfile and copies it with dd a number of
times - similar to the most recent bug report I've dealt with. Time to
completion is reduced. The number of pages scanned directly is still
disturbingly high with a low efficiency but this is likely due to the
number of dirty pages encountered. The figures could probably be improved
with more work around how kswapd is used and how dirty pages are handled
but that is separate work and this result is significant on its own.
Streaming Mapped Writer
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 124.47 111.67 112.64
Total Elapsed Time (seconds) 2138.14 1816.30 1867.56
MMTests Statistics: vmstat
Page Ins 90760 89124 89516
Page Outs 121028340120199524120736696
Swap Ins 0 86 55
Swap Outs 0 0 0
Direct pages scanned 1149893639646143996330619
Kswapd pages scanned 564309485696576357075875
Kswapd pages reclaimed 277432192775204427766606
Direct pages reclaimed 49777 46884 36655
Kswapd efficiency 49% 48% 48%
Kswapd velocity 26392.541 31363.631 30561.736
Direct efficiency 0% 0% 0%
Direct velocity 53780.091 53108.759 51581.004
Percentage direct scans 67% 62% 62%
Page writes by reclaim 385 122 1513
Slabs scanned 43008 39040 42112
Direct inode steals 0 10 8
Kswapd inode steals 733 534 477
This test just creates a large file mapping and writes to it linearly.
Time to completion is again reduced.
The gains are mostly down to two things. In many cases, there is less
scanning as zone_reclaim simply gives up faster due to recent failures.
The second reason is that memory is used more efficiently. Instead of
scanning the preferred zone every time, the allocator falls back to
another zone and uses it instead improving overall memory utilisation.
This patch: initialise ZLC for first zone eligible for zone_reclaim.
The zonelist cache (ZLC) is used among other things to record if
zone_reclaim() failed for a particular zone recently. The intention is to
avoid a high cost scanning extremely long zonelists or scanning within the
zone uselessly.
Currently the zonelist cache is setup only after the first zone has been
considered and zone_reclaim() has been called. The objective was to avoid
a costly setup but zone_reclaim is itself quite expensive. If it is
failing regularly such as the first eligible zone having mostly mapped
pages, the cost in scanning and allocation stalls is far higher than the
ZLC initialisation step.
This patch initialises ZLC before the first eligible zone calls
zone_reclaim(). Once initialised, it is checked whether the zone failed
zone_reclaim recently. If it has, the zone is skipped. As the first zone
is now being checked, additional care has to be taken about zones marked
full. A zone can be marked "full" because it should not have enough
unmapped pages for zone_reclaim but this is excessive as direct reclaim or
kswapd may succeed where zone_reclaim fails. Only mark zones "full" after
zone_reclaim fails if it failed to reclaim enough pages after scanning.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: preallocate page before lock_page() at filemap COW
Currently we are keeping faulted page locked throughout whole __do_fault
call (except for page_mkwrite code path) after calling file system's fault
code. If we do early COW, we allocate a new page which has to be charged
for a memcg (mem_cgroup_newpage_charge).
This function, however, might block for unbounded amount of time if memcg
oom killer is disabled or fork-bomb is running because the only way out of
the OOM situation is either an external event or OOM-situation fix.
In the end we are keeping the faulted page locked and blocking other
processes from faulting it in which is not good at all because we are
basically punishing potentially an unrelated process for OOM condition in
a different group (I have seen stuck system because of ld-2.11.1.so being
locked).
We can do test easily.
% cgcreate -g memory:A
% cgset -r memory.limit_in_bytes=64M A
% cgset -r memory.memsw.limit_in_bytes=64M A
% cd kernel_dir; cgexec -g memory:A make -j
Then, the whole system will live-locked until you kill 'make -j'
by hands (or push reboot...) This is because some important page in a
a shared library are locked.
Considering again, the new page is not necessary to be allocated
with lock_page() held. And usual page allocation may dive into
long memory reclaim loop with holding lock_page() and can cause
very long latency.
There are 3 ways.
1. do allocation/charge before lock_page()
Pros. - simple and can handle page allocation in the same manner.
This will reduce holding time of lock_page() in general.
Cons. - we do page allocation even if ->fault() returns error.
2. do charge after unlock_page(). Even if charge fails, it's just OOM.
Pros. - no impact to non-memcg path.
Cons. - implemenation requires special cares of LRU and we need to modify
page_add_new_anon_rmap()...
3. do unlock->charge->lock again method.
Pros. - no impact to non-memcg path.
Cons. - This may kill LOCK_PAGE_RETRY optimization. We need to release
lock and get it again...
This patch moves "charge" and memory allocation for COW page
before lock_page(). Then, we can avoid scanning LRU with holding
a lock on a page and latency under lock_page() will be reduced.
Then, above livelock disappears.
[akpm@linux-foundation.org: fix code layout] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reported-by: Lutz Vieweg <lvml@5t9.de> Original-idea-by: Michal Hocko <mhocko@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Ying Han <yinghan@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2.6.36's 7e496299d4d2 ("tmpfs: make tmpfs scalable with percpu_counter for
used blocks") to make tmpfs scalable with percpu_counter used
inode->i_lock in place of sbinfo->stat_lock around i_blocks updates; but
that was adverse to scalability, and unnecessary, since info->lock is
already held there in the fast paths.
Remove those uses of i_lock, and add info->lock in the three error paths
where it's then needed across shmem_free_blocks(). It's not actually
needed across shmem_unacct_blocks(), but they're so often paired that it
looks wrong to split them apart.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
truncate_inode_pages_range()'s final loop has a nice pincer property,
bringing start and end together, squeezing out the last pages. But the
range handling missed out on that, just sliding up the range, perhaps
letting pages come in behind it. Add one more test to give it the same
pincer effect.
Make the pagevec_lookup loops in truncate_inode_pages_range(),
invalidate_mapping_pages() and invalidate_inode_pages2_range() more
consistent with each other.
They were relying upon page->index of an unlocked page, but apologizing
for it: accept it, embrace it, add comments and WARN_ONs, and simplify the
index handling.
invalidate_inode_pages2_range() had special handling for a wrapped
page->index + 1 = 0 case; but MAX_LFS_FILESIZE doesn't let us anywhere
near there, and a corrupt page->index in the radix_tree could cause more
trouble than that would catch. Remove that wrapped handling.
invalidate_inode_pages2_range() uses min() to limit the pagevec_lookup
when near the end of the range: copy that into the other two, although
it's less useful than you might think (it limits the use of the buffer,
rather than the indices looked up).
Use consistent variable names in truncate_pagecache(), truncate_setsize(),
vmtruncate() and vmtruncate_range().
unmap_mapping_range() and vmtruncate_range() have mismatched interfaces:
don't change either, but make the vmtruncates more precise about what they
expect unmap_mapping_range() to do.
vmtruncate_range() is currently called only with page-aligned start and
end+1: can handle unaligned start, but unaligned end+1 would hit BUG_ON in
truncate_inode_pages_range() (lacks partial clearing of the end page).
The often-NULL data arg to read_cache_page() and read_mapping_page()
functions is misdescribed as "destination for read data": no, it's the
first arg to the filler function, often struct file * to ->readpage().
Satisfy checkpatch.pl on those filler prototypes, and tidy up the
declarations in linux/pagemap.h.
David S. Miller [Tue, 26 Jul 2011 00:12:22 +0000 (17:12 -0700)]
sparc64: implement get_user_pages_fast()
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 26 Jul 2011 00:12:21 +0000 (17:12 -0700)]
sparc64: add support for _PAGE_SPECIAL
Luckily there are still a few software PTE bits remaining and they even
match up in both the sun4u and sun4v pte layouts.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 26 Jul 2011 00:12:21 +0000 (17:12 -0700)]
sparc64: use RCU page table freeing
Make use of the generic RCU page table freeing on Sparc64, doing so allows
for race-free software page-table walkers like gup_fast().
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 26 Jul 2011 00:12:20 +0000 (17:12 -0700)]
sparc64: kill page table quicklists
With the recent mmu_gather changes that included generic RCU freeing of
page-tables, it is now quite straightforward to implement gup_fast() on
sparc64.
This patch:
Remove the page table quicklists. They are pointless and make it harder
to use RCU page table freeing and share code with other architectures.
BTW, this is the second time this has happened, see commit 3c936465249f
("[SPARC64]: Kill pgtable quicklists and use SLAB.")
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- shmem pages are not immediately available, but they are not
potentially available either, even if we swap them out, they will just
relocate from memory into swap, total amount of immediate and
potentially available memory is not going to be affected, so we
shouldn't count them as potentially free in the first place.
- nr_free_pages() is not an expensive operation anymore, there is no
need to split the decision making in two halves and repeat code.
Signed-off-by: Dmitry Fink <dmitry.fink@palm.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 26 Jul 2011 00:12:18 +0000 (17:12 -0700)]
mm/memblock.c: avoid abuse of RED_INACTIVE
RED_INACTIVE is a slab thing, and reusing it for memblock was
inappropriate, because memblock is dealing with phys_addr_t's which have a
Kconfigurable sizeof().
Create a new poison type for this application. Fixes the sparse warning
David Rientjes [Tue, 26 Jul 2011 00:12:18 +0000 (17:12 -0700)]
oom: make deprecated use of oom_adj more verbose
/proc/pid/oom_adj is deprecated and scheduled for removal in August 2012
according to Documentation/feature-removal-schedule.txt.
This patch makes the warning more verbose by making it appear as a more
serious problem (the presence of a stack trace and being multiline should
attract more attention) so that applications still using the old interface
can get fixed.
Very popular users of the old interface have been converted since the oom
killer rewrite has been introduced. udevd switched to the
/proc/pid/oom_score_adj interface for v162, kde switched in 4.6.1, and
opensshd switched in 5.7p1.
At the start of 2012, this should be changed into a WARN() to emit all
such incidents and then finally remove the tunable in August 2012 as
scheduled.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 26 Jul 2011 00:12:17 +0000 (17:12 -0700)]
oom: remove references to old badness() function
The badness() function in the oom killer was renamed to oom_badness() in a63d83f427fb ("oom: badness heuristic rewrite") since it is a globally
exported function for clarity.
The prototype for the old function still existed in linux/oom.h, so remove
it. There are no existing users.
Also fixes documentation and comment references to badness() and adjusts
them accordingly.
Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Forbes [Tue, 26 Jul 2011 00:12:14 +0000 (17:12 -0700)]
mm: hugetlb: fix coding style issues
Fix coding style issues flagged by checkpatch.pl
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Wright [Tue, 26 Jul 2011 00:12:14 +0000 (17:12 -0700)]
mm/huge_memory.c: minor lock simplification in __khugepaged_exit
The lock is released first thing in all three branches. Simplify this by
unconditionally releasing lock and remove else clause which was only there
to be sure lock was released.
Signed-off-by: Chris Wright <chrisw@sous-sol.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Kiper [Tue, 26 Jul 2011 00:12:13 +0000 (17:12 -0700)]
mm/page_cgroup.c: simplify code by using SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macros
Commit a539f3533b78e3 ("mm: add SECTION_ALIGN_UP() and
SECTION_ALIGN_DOWN() macro") introduced the SECTION_ALIGN_UP() and
SECTION_ALIGN_DOWN() macros. Use those macros to increase code
readability.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Acked-by: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Tue, 26 Jul 2011 00:12:12 +0000 (17:12 -0700)]
mm: remove the leftovers of noswapaccount
In commit a2c8990aed5ab ("memsw: remove noswapaccount kernel parameter"),
Michal forgot to remove some left pieces of noswapaccount in the tree,
this patch removes them all.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally, walk_hugetlb_range() didn't require a caller take any lock.
But commit d33b9f45bd ("mm: hugetlb: fix hugepage memory leak in
walk_page_range") changed its rule. Because it added find_vma() call in
walk_hugetlb_range().
Any locking-rule change commit should write a doc too.
pagewalk: don't look up vma if walk->hugetlb_entry is unused
Currently, walk_page_range() calls find_vma() every page table for walk
iteration. but it's completely unnecessary if walk->hugetlb_entry is
unused. And we don't have to assume find_vma() is a lightweight
operation. So this patch checks the walk->hugetlb_entry and avoids the
find_vma() call if possible.
This patch also makes some cleanups. 1) remove ugly uninitialized_var()
and 2) #ifdef in function body.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pagewalk: fix walk_page_range() don't check find_vma() result properly
The doc of find_vma() says,
/* Look up the first VMA which satisfies addr < vm_end, NULL if none. */
struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
{
(snip)
Thus, caller should confirm whether the returned vma matches a desired one.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Kiper [Tue, 26 Jul 2011 00:12:06 +0000 (17:12 -0700)]
xen/balloon: memory hotplug support for Xen balloon driver
Memory hotplug support for Xen balloon driver. It should be mentioned
that hotplugged memory is not onlined automatically. It should be onlined
by user through standard sysfs interface.
Memory could be hotplugged in following steps:
1) dom0: xl mem-max <domU> <maxmem>
where <maxmem> is >= requested memory size,
2) dom0: xl mem-set <domU> <memory>
where <memory> is requested memory size; alternatively memory
could be added by writing proper value to
/sys/devices/system/xen_memory/xen_memory0/target or
/sys/devices/system/xen_memory/xen_memory0/target_kb on dumU,
3) domU: for i in /sys/devices/system/memory/memory*/state; do \
[ "`cat "$i"`" = offline ] && echo online > "$i"; done
Memory could be onlined automatically on domU by adding following line to
udev rules:
Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Kiper [Tue, 26 Jul 2011 00:12:05 +0000 (17:12 -0700)]
mm: extend memory hotplug API to allow memory hotplug in virtual machines
This patch contains online_page_callback and apropriate functions for
registering/unregistering online page callbacks. It allows to do some
machine specific tasks during online page stage which is required to
implement memory hotplug in virtual machines. Currently this patch is
required by latest memory hotplug support for Xen balloon driver patch
which will be posted soon.
Additionally, originial online_page() function was splited into
following functions doing "atomic" operations:
- __online_page_set_limits() - set new limits for memory management code,
- __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
- __online_page_free() - free page to allocator.
It was done to:
- not duplicate existing code,
- ease hotplug code devolpment by usage of well defined interface,
- avoid stupid bugs which are unavoidable when the same code
(by design) is developed in many places.
[akpm@linux-foundation.org: use explicit indirect-call syntax] Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 26 Jul 2011 00:12:01 +0000 (17:12 -0700)]
backlight: set backlight type and max_brightness before backlights are registered
Since commit a19a6ee "backlight: Allow properties to be passed at
registration" and commit bb7ca74 "backlight: add backlight type", we can
set backlight type and max_brightness before backlights are registered.
Some newly added drivers did not set it properly, let's fix it.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Donghwa Lee <dh09.lee@samsung.com> Cc: InKi Dae <inki.dae@samsung.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 26 Jul 2011 00:12:00 +0000 (17:12 -0700)]
backlight: add ams369fg06 amoled driver
Add the ams369fg06 amoled panel driver. The ams369fg06 amoled panel (480
x 800) driver uses 3-wired SPI inteface. The brightness can be controlled
by gamma setting of amoled panel.
[sfr@canb.auug.org.au: fix build error]
[axel.lin@gmail.com: unregister backlight device when unloading the module]
[axel.lin@gmail.com: staticize ams369fg06_shutdown] Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Inki Dae <inki.dae@samsung.com> Cc: anish singh <anish198519851985@gmail.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 26 Jul 2011 00:11:59 +0000 (17:11 -0700)]
drivers/video/backlight/adp8860_bl.c: remove a redundant assignment for max_brightness
We have set props.max_brightness before registering backlight device.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 26 Jul 2011 00:11:58 +0000 (17:11 -0700)]
drivers/video/backlight/ld9040.c: small fixes
- Fix checking of wrong return value for backlight_device_register()
- Properly free allocated resources in ld9040_probe() error path and
ld9040_remove().
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Donghwa Lee <dh09.lee@samsung.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Tue, 26 Jul 2011 00:11:57 +0000 (17:11 -0700)]
mm/backing-dev.c: reset bdi min_ratio in bdi_unregister()
Vito said:
: The system has many usb disks coming and going day to day, with their
: respective bdi's having min_ratio set to 1 when inserted. It works for
: some time until eventually min_ratio can no longer be set, even when the
: active set of bdi's seen in /sys/class/bdi/*/min_ratio doesn't add up to
: anywhere near 100.
:
: This then leads to an unrelated starvation problem caused by write-heavy
: fuse mounts being used atop the usb disks, a problem the min_ratio setting
: at the underlying devices bdi effectively prevents.
Fix this leakage by resetting the bdi min_ratio when unregistering the
BDI.
WANG Cong [Tue, 26 Jul 2011 00:11:54 +0000 (17:11 -0700)]
xtensa: fix a build error in arch/xtensa/include/asm/uaccess.h
Fix the following build error:
arch/xtensa/include/asm/uaccess.h:403: error: implicit declaration of function 'prefetch'
arch/xtensa/include/asm/uaccess.h:412: error: implicit declaration of function 'prefetchw'
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is incorrect on 32-bit. It causes us to & the pgoff with something that
looks like this (for a 4m hugepage): 0xfff003ff. The mask should be
flipped and *then* shifted, to give you 0x0000_03fff.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc/sem.c: fix race with concurrent semtimedop() timeouts and IPC_RMID
If a semaphore array is removed and in parallel a sleeping task is woken
up (signal or timeout, does not matter), then the woken up task does not
wait until wake_up_sem_queue_do() is completed. This will cause crashes,
because wake_up_sem_queue_do() will read from a stale pointer.
The fix is simple: Regardless of anything, always call get_queue_result().
This function waits until wake_up_sem_queue_do() has finished it's task.
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] s5pv210: make needlessly global symbols static
[CPUFREQ] exynos4210: make needlessly global symbols static
[CPUFREQ] S3C6410: Add some lower frequencies for 800MHz base clock operation
[CPUFREQ] S5PV210: Add reboot notifier to prevent system hang
[CPUFREQ] S5PV210: Adjust udelay prior to voltage scaling down
[CPUFREQ] S5PV210: Lock a mutex while changing the cpu frequency
[CPUFREQ] S5PV210: Add pm_notifier to prevent system unstable
[CPUFREQ] S5PV210: Add arm/int voltage control support
[CPUFREQ] S5PV210: Add additional symantics for "relation" in cpufreq with pm
[CPUFREQ] S3C64xx: Notify transition complete as soon as frequency changed
[CPUFREQ] S3C6410: Support 800MHz operation in cpufreq
[CPUFREQ] s5pv210-cpufreq.c: Add missing clk_put
[CPUFREQ] Move compile for S3C64XX cpufreq to /drivers/cpufreq
[CPUFREQ] Remove some vi noise that escaped into the Makefile.
[CPUFREQ] Move ARM Samsung cpufreq drivers to drivers/cpufreq/
[CPUFREQ/S3C64xx] Move S3C64xx CPUfreq driver into drivers/cpufreq
[CPUFREQ] Handle CPUs with different capabilities in acpi-cpufreq
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
bnx2x: use pci_pcie_cap()
bnx2x: fix bnx2x_stop_on_error flow in bnx2x_sp_rtnl_task
bnx2x: enable internal target-read for 57712 and up only
bnx2x: count statistic ramrods on EQ to prevent MC assert
bnx2x: fix loopback for non 10G link
bnx2x: dcb - send all unmapped priorities to same COS as L2
iwlwifi: Fix build with CONFIG_PM disabled.
gre: fix improper error handling
ipv4: use RT_TOS after some rt_tos conversions
via-velocity: remove duplicated #include
qlge: remove duplicated #include
igb: remove duplicated #include
can: c_can: remove duplicated #include
bnad: remove duplicated #include
net: allow netif_carrier to be called safely from IRQ
bna: Header File Consolidation
bna: HW Error Counter Fix
bna: Add HW Semaphore Unlock Logic
bna: IOC Event Name Change
bna: Mboxq Flush When IOC Disabled
...