Johannes Weiner [Fri, 16 Dec 2011 04:50:33 +0000 (15:50 +1100)]
mm: collect LRU list heads into struct lruvec
Having a unified structure with a LRU list set for both global zones and
per-memcg zones allows to keep that code simple which deals with LRU lists
and does not care about the container itself.
Once the per-memcg LRU lists directly link struct pages, the isolation
function and all other list manipulations are shared between the memcg
case and the global LRU case.
Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Fri, 16 Dec 2011 04:50:32 +0000 (15:50 +1100)]
mm: vmscan: convert global reclaim to per-memcg LRU lists
The global per-zone LRU lists are about to go away on memcg-enabled
kernels, global reclaim must be able to find its pages on the per-memcg
LRU lists.
Since the LRU pages of a zone are distributed over all existing memory
cgroups, a scan target for a zone is complete when all memory cgroups are
scanned for their proportional share of a zone's memory.
The forced scanning of small scan targets from kswapd is limited to zones
marked unreclaimable, otherwise kswapd can quickly overreclaim by
force-scanning the LRU lists of multiple memory cgroups.
Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Fri, 16 Dec 2011 04:50:32 +0000 (15:50 +1100)]
mm: memcg: remove optimization of keeping the root_mem_cgroup LRU lists empty
root_mem_cgroup, lacking a configurable limit, was never subject to limit
reclaim, so the pages charged to it could be kept off its LRU lists. They
would be found on the global per-zone LRU lists upon physical memory
pressure and it made sense to avoid uselessly linking them to both lists.
The global per-zone LRU lists are about to go away on memcg-enabled
kernels, with all pages being exclusively linked to their respective
per-memcg LRU lists. As a result, pages of the root_mem_cgroup must also
be linked to its LRU lists again. This is purely about the LRU list,
root_mem_cgroup is still not charged.
The overhead is temporary until the double-LRU scheme is going away
completely.
Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Fri, 16 Dec 2011 04:50:32 +0000 (15:50 +1100)]
mm: move memcg hierarchy reclaim to generic reclaim code
Memory cgroup limit reclaim and traditional global pressure reclaim will
soon share the same code to reclaim from a hierarchical tree of memory
cgroups.
In preparation of this, move the two right next to each other in
shrink_zone().
The mem_cgroup_hierarchical_reclaim() polymath is split into a soft limit
reclaim function, which still does hierarchy walking on its own, and a
limit (shrinking) reclaim function, which relies on generic reclaim code
to walk the hierarchy.
Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Memory cgroup limit reclaim currently picks one memory cgroup out of the
target hierarchy, remembers it as the last scanned child, and reclaims all
zones in it with decreasing priority levels.
The new hierarchy reclaim code will pick memory cgroups from the same
hierarchy concurrently from different zones and priority levels, it
becomes necessary that hierarchy roots not only remember the last scanned
child, but do so for each zone and priority level.
Until now, we reclaimed memcgs like this:
mem = mem_cgroup_iter(root)
for each priority level:
for each zone in zonelist:
reclaim(mem, zone)
But subsequent patches will move the memcg iteration inside the loop over
the zones:
for each priority level:
for each zone in zonelist:
mem = mem_cgroup_iter(root)
reclaim(mem, zone)
And to keep with the original scan order - memcg -> priority -> zone - the
last scanned memcg has to be remembered per zone and per priority level.
Furthermore, global reclaim will be switched to the hierarchy walk as
well. Different from limit reclaim, which can just recheck the limit
after some reclaim progress, its target is to scan all memcgs for the
desired zone pages, proportional to the memcg size, and so reliably
detecting a full hierarchy round-trip will become crucial.
Currently, the code relies on one reclaimer encountering the same memcg
twice, but that is error-prone with concurrent reclaimers. Instead, use a
generation counter that is increased every time the child with the highest
ID has been visited, so that reclaimers can stop when the generation
changes.
Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Fri, 16 Dec 2011 04:50:31 +0000 (15:50 +1100)]
mm: vmscan: distinguish between memcg triggering reclaim and memcg being scanned
Memory cgroup hierarchies are currently handled completely outside of the
traditional reclaim code, which is invoked with a single memory cgroup as
an argument for the whole call stack.
Subsequent patches will switch this code to do hierarchical reclaim, so
there needs to be a distinction between a) the memory cgroup that is
triggering reclaim due to hitting its limit and b) the memory cgroup that
is being scanned as a child of a).
This patch introduces a struct mem_cgroup_zone that contains the
combination of the memory cgroup and the zone being scanned, which is then
passed down the stack instead of the zone argument.
Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Fri, 16 Dec 2011 04:50:30 +0000 (15:50 +1100)]
mm: vmscan: distinguish global reclaim from global LRU scanning
The traditional zone reclaim code is scanning the per-zone LRU lists
during direct reclaim and kswapd, and the per-zone per-memory cgroup LRU
lists when reclaiming on behalf of a memory cgroup limit.
Subsequent patches will convert the traditional reclaim code to reclaim
exclusively from the per-memory cgroup LRU lists. As a result, using the
predicate for which LRU list is scanned will no longer be appropriate to
tell global reclaim from limit reclaim.
This patch adds a global_reclaim() predicate to tell direct/kswapd reclaim
from memory cgroup limit reclaim and substitutes it in all places where
currently scanning_global_lru() is used for that.
Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Memory control groups are currently bolted onto the side of
traditional memory management in places where better integration would
be preferrable. To reclaim memory, for example, memory control groups
maintain their own LRU list and reclaim strategy aside from the global
per-zone LRU list reclaim. But an extra list head for each existing
page frame is expensive and maintaining it requires additional code.
This patchset disables the global per-zone LRU lists on memory cgroup
configurations and converts all its users to operate on the per-memory
cgroup lists instead. As LRU pages are then exclusively on one list,
this saves two list pointers for each page frame in the system:
page_cgroup array size with 4G physical memory
vanilla: [ 0.000000] allocated 31457280 bytes of page_cgroup
patched: [ 0.000000] allocated 15728640 bytes of page_cgroup
At the same time, system performance for various workloads is
unaffected:
100G sparse file cat, 4G physical memory, 10 runs, to test for code
bloat in the traditional LRU handling and kswapd & direct reclaim
paths, without/with the memory controller configured in
4 unlimited memcgs running kbuild -j32 each, 4G physical memory, 500M
swap on SSD, 10 runs, to test for regressions in kswapd & direct
reclaim using per-memcg LRU lists with multiple memcgs and multiple
allocators within each memcg
Mike Galbraith [Fri, 16 Dec 2011 04:50:30 +0000 (15:50 +1100)]
cpusets, cgroups: disallow attaching kthreadd
Allowing kthreadd to be moved to a non-root group makes no sense, it being
a global resource, and needlessly leads unsuspecting users toward trouble.
1. An RT workqueue worker thread spawned in a task group with no
rt_runtime allocated is not schedulable. Simple user error, but
harmful to the box.
2. A worker thread which acquires PF_THREAD_BOUND can never leave a
cpuset, rendering the cpuset immortal.
Save the user some unexpected trouble, just say no.
Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Paul Menage <paul@paulmenage.org> Cc: Tejun Heo <htejun@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nothing requires that we lock the filesystem until the root inode is
provided.
Also iget5_locked() triggers a warning because we are holding the
filesystem lock while allocating the inode, which result in a lockdep
suspicion that we have a lock inversion against the reclaim path:
The deadlock shouldn't happen since we are doing that allocation in the
mount path, the filesystem is not available for any reclaim. Still the
warning is annoying.
To solve this, acquire the lock later only where we need it, right before
calling reiserfs_read_locked_inode() that wants to lock to walk the tree.
Reported-by: Knut Petersen <Knut_Petersen@t-online.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
journal_init() doesn't need the lock since no operation on the filesystem
is involved there. journal_read() and get_list_bitmap() have yet to be
reviewed carefully though before removing the lock there. Just keep the
it around these two calls for safety.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
reiserfs: delay reiserfs lock until journal initialization
In the mount path, transactions that are made before journal
initialization don't involve the filesystem. We can delay the reiserfs
lock until we play with the journal.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mark Brown [Fri, 16 Dec 2011 04:50:27 +0000 (15:50 +1100)]
drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
Due to changes in the RTC core the period interrupt is now unused so
delete the code managing it.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ChangeLog v2->v3:
- back to square 1. 0x80 is not allowed because the representation
is not two's complement but bit 7 is a sign bit, thus 0x80 is
just another way to say "zero". Sorry for the mess, clarified this
with a comment in the code.
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Mark Godfrey <mark.godfrey@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
WARNING: line over 80 characters
#48: FILE: drivers/rtc/rtc-ab8500.c:268:
+ * Check that the calibration value (which is in units of 0.5 parts-per-million)
ERROR: need consistent spacing around '-' (ctx:WxV)
#64: FILE: drivers/rtc/rtc-ab8500.c:284:
+ rtccal = ~(calibration -1) | 0x80;
^
total: 1 errors, 1 warnings, 139 lines checked
./patches/rtc-ab8500-add-calibration-attribute-to-ab8500-rtc.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Linus Walleij <linus.walleij@stericsson.com> Cc: Mark Godfrey <mark.godfrey@stericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mark Godfrey [Fri, 16 Dec 2011 04:50:26 +0000 (15:50 +1100)]
rtc/ab8500: add calibration attribute to AB8500 RTC
The rtc_calibration attribute allows user-space to get and set the
AB8500's RtcCalibration register. The AB8500 will then use the value in
this register to compensate for RTC drift every 60 seconds.
Signed-off-by: Mark Godfrey <mark.godfrey@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Walleij [Fri, 16 Dec 2011 04:50:26 +0000 (15:50 +1100)]
drivers/rtc/rtc-ab8500.c: change msleep() to usleep_range()
The resolution of msleep is related to HZ, so with HZ set to 100 any
msleep of less than 10ms will become ~10ms. This is not what we want.
Use the hrtimer-based usleep_range() and allow for some slack in the
non-critical path so we have more control of what is happening here.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Jonas Aaberg <jonas.aberg@stericsson.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yauhen Kharuzhy [Fri, 16 Dec 2011 04:50:24 +0000 (15:50 +1100)]
drivers/rtc/rtc-mxc.c: fix setting time for MX1 SoC
There is no way to track year in the i.MX1 RTC: Days Counter register is
9-bit wide only. Attempt to save date after 1970-01-01 plus 512 days
causes endless loop in mxc_rtc_set_mmss(). Fix this by resetting year to
1970.
Signed-off-by: Yauhen Kharuzhy <jekhor@gmail.com> Cc: Daniel Mack <daniel@caiaq.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
YanHong [Fri, 16 Dec 2011 04:50:23 +0000 (15:50 +1100)]
init/do_mounts.c: create /root if it does not exist
If someone supplies an initramfs without /root in it, and we fail to
execute rdinit, we will try to mount root device and fail, for the mount
point does not exits.
But we get error message "VFS: Cannot open root device". It's confusing.
We can give a more detailed error message, or we can go further: if /root
does not exit, create it.
Signed-off-by: YanHong <tempname2@hotmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Woody Suwalski <terraluna977@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Daney [Fri, 16 Dec 2011 04:50:23 +0000 (15:50 +1100)]
MIPS: randomize PIE load address
... by selecting ARCH_BINFMT_ELF_RANDOMIZE_PIE
Signed-off-by: David Daney <david.daney@cavium.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Daney [Fri, 16 Dec 2011 04:50:22 +0000 (15:50 +1100)]
fs: binfmt_elf: create Kconfig variable for PIE randomization
Randomization of PIE load address is hard coded in binfmt_elf.c for X86
and ARM. Create a new Kconfig variable
(CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE) for this and use it instead. Thus
architecture specific policy is pushed out of the generic binfmt_elf.c and
into the architecture Kconfig files.
X86 and ARM Kconfigs are modified to select the new variable so there is
no change in behavior. A follow on patch will select it for MIPS too.
Signed-off-by: David Daney <david.daney@cavium.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jason Baron [Fri, 16 Dec 2011 04:50:22 +0000 (15:50 +1100)]
epoll: limit paths
The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely. A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.
To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited. Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm. In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.
Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.
This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events. I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'. In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5. Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links. This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.
In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'. In this case, each 'source file descriptor' has a 1 path of
length 1. Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous. Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.
In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations. Currently its only used in a subset
of the add paths. I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths. I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead. Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.
Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:
/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1). However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order. Thus, this limit is currently easily bypassed. The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.
Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL. I've also
testing using the piptest.c epoll tester, which showed no difference in
performance. I've also created a number of different epoll networks and
tested that they behave as expectded.
I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.
Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Nelson Elhage <nelhage@ksplice.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joakim Tjernlund [Fri, 16 Dec 2011 04:50:22 +0000 (15:50 +1100)]
crc32: optimize inner loop
Taking a pointer reference to each row in the crc table matrix, one can
reduce the inner loop with a few insn's
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Cc: Bob Pearson <rpearson@systemfabricworks.com> Cc: Frank Zago <fzago@systemfabricworks.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Whitcroft [Fri, 16 Dec 2011 04:50:21 +0000 (15:50 +1100)]
checkpatch: catch all occurences of type and cast spacing errors per line
Fix up type and cast spacing checks such that all occurences on a line are
examined and reported. For example the line below has a valid cast and a
bad type, but currently we check the cast first which is good and stop:
u16* bar = (u16 *)baz;
We will also only report one of the errors in this example:
u16* bar = (u16*)bad;
Move to iterating across all casts and all types, reporting any failure.
Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Whitcroft [Fri, 16 Dec 2011 04:50:18 +0000 (15:50 +1100)]
checkpatch: only apply kconfig help checks for options which prompt
The intent of this check is to catch the options which the user will see
and ensure they are properly described. It is also common for internal
only options to have a brief description. Allow this form.
Reported-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Whitcroft [Fri, 16 Dec 2011 04:50:18 +0000 (15:50 +1100)]
checkpatch: optimise statement scanner when mid-statement
In the middle of a long definition or similar, there is no possibility of
finding a smaller sub-statement. Optimise this case by skipping statement
aquirey where there are no starts of statement (open brace '{' or
semi-colon ';'). We are likely to scan slightly more than needed still
but this is safest.
Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Whitcroft [Fri, 16 Dec 2011 04:50:18 +0000 (15:50 +1100)]
checkpatch: ## is not a valid modifier
Inserting a # into the modifiers list will incorrectly add the null string
to the modifiers list, leading to an infinite loop. As neither of these
is a valid modifier form simply ignore them.
Signed-off-by: Andy Whitcroft <apw@canonical.com> Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 16 Dec 2011 04:50:17 +0000 (15:50 +1100)]
checkpatch: improve memset and min/max with cast checking
Improve the checking of arguments to memset and min/max tests.
Move the checking of min/max to statement blocks instead of single line.
Change $Constant to allow any case type 0x initiator and trailing ul
specifier. Add $FuncArg type as any function argument with or without a
cast. Print the whole statement when showing memset or min/max messages.
Improve the memset with 0 as 3rd argument error message.
There are still weaknesses in the $FuncArg and $Constant code as arbitrary
parentheses and negative signs are not generically supported.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Whitcroft [Fri, 16 Dec 2011 04:50:17 +0000 (15:50 +1100)]
checkpatch: check for common memset parameter issues against statments
Move the memset checks over to work against the statement. Also add
checks for 0 and 1 used as lengths. Generally these indicate badly
ordered parameters.
Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Whitcroft [Fri, 16 Dec 2011 04:50:16 +0000 (15:50 +1100)]
checkpatch: correctly track the end of preprocessor commands in context
When looking for a statement we currently run on through preprocessor
commands. This means that a header file with just definitions is parsed
over and over again combining all of the lines from the current line to
the end of file leading to severe performance issues.
Fix up context accumulation to track preprocessor commands and stop when
reaching the end of them. At the same time vastly simplify the #define
handling.
Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 16 Dec 2011 04:50:15 +0000 (15:50 +1100)]
checkpatch: update signature "might be better as" warning
email header lines can look like signature tags. It's valid to have
multiple email recipients on a single line but not valid to have multiple
signatures on a single line.
Validate signatures only when not in the email headers.
Clear the $in_commit_log flag when the patch filename appears.
Add '-' to the valid chars in a message header for headers
like "Message-Id:" and "In-Reply-To:".
Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 16 Dec 2011 04:50:15 +0000 (15:50 +1100)]
drivers/leds/leds-mc13783.c: fix off-by-one for checking num_leds
The LED id begins from 0. Thus the maximum number of leds should be
MC13783_LED_MAX + 1.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Philippe Retornaz <philippe.retornaz@epfl.ch> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: NeilBrown <neilb@suse.de>
WARNING: please write a paragraph that describes the config symbol fully
#31: FILE: drivers/leds/Kconfig:394:
+ help
WARNING: line over 80 characters
#69: FILE: drivers/leds/leds-tca6507.c:16:
+ * each 8msec that the led is 'on'. The levels are named MASTER, BANK0 and BANK1.
WARNING: line over 80 characters
#71: FILE: drivers/leds/leds-tca6507.c:18:
+ * There are two different blink rates that can be programmed, each with separate
WARNING: line over 80 characters
#75: FILE: drivers/leds/leds-tca6507.c:22:
+ * This drivers does not support double-blink so 'second-off' always matches 'off'.
WARNING: line over 80 characters
#93: FILE: drivers/leds/leds-tca6507.c:40:
+ * brightness is used. As 'full' is always available, the worst case would be to
WARNING: line over 80 characters
#97: FILE: drivers/leds/leds-tca6507.c:44:
+ * Each bank (BANK0 and BANK1) have two usage counts - Leds using the brightness and
WARNING: line over 80 characters
#102: FILE: drivers/leds/leds-tca6507.c:49:
+ * there is a flag saying if it was explicitly requested or defaulted. Similarly
WARNING: line over 80 characters
#103: FILE: drivers/leds/leds-tca6507.c:50:
+ * the banks know if each time was explicit or a default. Defaults are permitted
ERROR: open brace '{' following function declarations go on the next line
#170: FILE: drivers/leds/leds-tca6507.c:117:
+static inline int TO_LEVEL(int brightness) {
ERROR: open brace '{' following function declarations go on the next line
#174: FILE: drivers/leds/leds-tca6507.c:121:
+static inline int TO_BRIGHT(int level) {
WARNING: line over 80 characters
#203: FILE: drivers/leds/leds-tca6507.c:150:
+ int blink; /* 1 if we are hardware-blinking */
WARNING: line over 80 characters
#222: FILE: drivers/leds/leds-tca6507.c:169:
+ * the first pair so there is more change-time visible (i.e. it is softer).
ERROR: space required before the open parenthesis '('
#300: FILE: drivers/leds/leds-tca6507.c:247:
+ switch(bank) {
total: 3 errors, 10 warnings, 732 lines checked
./patches/leds-add-driver-for-tca6507-led-controller.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: NeilBrown <neilb@suse.de> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 16 Dec 2011 04:50:14 +0000 (15:50 +1100)]
drivers/leds/leds-netxbig.c: use gpio_request_one()
Use gpio_request_one() instead of multiple gpiolib calls. This also
simplifies error handling a bit.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Simon Guinot <sguinot@lacie.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 16 Dec 2011 04:50:13 +0000 (15:50 +1100)]
drivers/leds/leds-bd2802.c: use gpio_request_one()
Use gpio_request_one() instead of multiple gpiolib calls.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Kim Kyuwon <q1.kim@samsung.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Samu Onkalo <samu.p.onkalo@nokia.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 16 Dec 2011 04:50:13 +0000 (15:50 +1100)]
leds: convert leds-dac124s085 to module_spi_driver
Factor out some boilerplate code for spi driver registration into
module_spi_driver.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Haojian Zhuang <hzhuang1@marvell.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Michael Hennerich <hennerich@blackfin.uclinux.org> Cc: Mike Rapoport <mike@compulab.co.il> Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 16 Dec 2011 04:50:12 +0000 (15:50 +1100)]
leds: convert led i2c drivers to module_i2c_driver
Factor out some boilerplate code for i2c driver registration
into module_i2c_driver.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Haojian Zhuang <hzhuang1@marvell.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Michael Hennerich <hennerich@blackfin.uclinux.org> Cc: Mike Rapoport <mike@compulab.co.il> Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 16 Dec 2011 04:50:12 +0000 (15:50 +1100)]
leds: convert led platform drivers to module_platform_driver
Factor out some boilerplate code for platform driver registration into
module_platform_driver.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Haojian Zhuang <hzhuang1@marvell.com> [led-88pm860x.c] Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Michael Hennerich <hennerich@blackfin.uclinux.org> Cc: Mike Rapoport <mike@compulab.co.il> Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jingoo Han [Fri, 16 Dec 2011 04:50:11 +0000 (15:50 +1100)]
drivers/video/backlight/ep93xx_bl.c: remove duplicated header include
module.h is included twice.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Donghwa Lee [Fri, 16 Dec 2011 04:50:11 +0000 (15:50 +1100)]
backlight/ld9040.c: regulator control in the driver
This patch supports regulator power control in the driver. Current ld9040
driver was controlled power on/off sequence by callback function in the
board file. But, by doing this, there's no need to register lcd power
on/off callback function in the board file.
Signed-off-by: Donghwa Lee <dh09.lee@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Axel Lin [Fri, 16 Dec 2011 04:50:11 +0000 (15:50 +1100)]
backlight: convert drivers/video/backlight/* to use module_platform_driver()
Convert the drivers in drivers/video/backlight/* to use the
module_platform_driver() macro which makes the code smaller and a bit
simpler.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com> [ep93xx_bl.c] Cc: Mike Rapoport <mike@compulab.co.il> Cc: Richard Purdie <rpurdie@rpsys.net> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Paul Bolle [Fri, 16 Dec 2011 04:50:10 +0000 (15:50 +1100)]
backlight: remove ADX backlight device support
Support for the Avionic Design Xanthos backlight device got added in
commit 3b96ea9ef8 ("backlight: Add support for the Avionic Design Xanthos
backlight device."). That support depends on ARCH_PXA_ADX. The code that
should have provided that Kconfig symbol never got submitted. It has
never been possible to even build this driver. Remove it.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Thierry Reding <thierry.reding@avionic-design.de> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kyungmin Park [Fri, 16 Dec 2011 04:50:10 +0000 (15:50 +1100)]
devfreq: add devfreq maintainer entry
As devfreq is merged at mainline. Also update the maintainer entry.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Kevin Hilman <khilman@ti.com> Cc: MyungJoo Ham <myungjoo.ham@samsung.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 16 Dec 2011 04:50:08 +0000 (15:50 +1100)]
MAINTAINERS: update tulip F: patterns
commit a88394cfb58 ("ewrk3/tulip: Move the DEC - Tulip drivers") moved the
files, update the patterns.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Grant Grundler <grundler@parisc-linux.org> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Tobias Ringstrom <tori@unhappy.mine.nu> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: David Davies <davies@maniac.ultranet.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 16 Dec 2011 04:50:08 +0000 (15:50 +1100)]
MAINTAINERS: update sdhci F: patterns
commit 38576af1f8c ("mmc: sdhci: make sdhci-of device drivers self
registered") moved the files around. Update the patterns.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Chris Ball <cjb@laptop.org> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 16 Dec 2011 04:50:07 +0000 (15:50 +1100)]
MAINTAINERS: update bt8xx gpio F: patterns
Commit c103de240439d ("gpio: reorganize drivers") renamed the file, update
the pattern.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Michael Buesch <m@bues.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joe Perches [Fri, 16 Dec 2011 04:50:06 +0000 (15:50 +1100)]
MAINTAINERS: update adp gpio F: patterns
Commit c103de240439df ("gpio: reorganize drivers") renamed the files,
update the patterns.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Grant Likely <grant.likely@secretlab.ca> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ian Campbell [Fri, 16 Dec 2011 04:50:06 +0000 (15:50 +1100)]
get_maintainers.pl: follow renames when looking up commit signers
I happen to have had a commit to various network drivers since the big
renaming/reorg which happened to drivers/net recently. This means that I
now appear to be in the top few commit signers (by %age) for many of them
so am getting sent all sorts of stuff and people who are involved with the
driver are not. e.g. (to pick one at random):
$ ./scripts/get_maintainer.pl -f drivers/net/ethernet/nvidia/forcedeth.c
"David S. Miller" <davem@davemloft.net> (commit_signer:5/7=71%)
Ian Campbell <ian.campbell@citrix.com> (commit_signer:2/7=29%)
Eric Dumazet <eric.dumazet@gmail.com> (commit_signer:1/7=14%)
Jeff Kirsher <jeffrey.t.kirsher@intel.com> (commit_signer:1/7=14%)
Jiri Pirko <jpirko@redhat.com> (commit_signer:1/7=14%)
netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
linux-kernel@vger.kernel.org (open list)
With the following patch the renames are followed and the result appears
much more sensible:
Andi Kleen [Fri, 16 Dec 2011 04:50:04 +0000 (15:50 +1100)]
brlocks/lglocks: clean up code
lglocks and brlocks are currently generated with some complicated macros
in lglock.h. But there's no reason I can see to not just use common
utility functions that get pointers to the lglock.
Since there are at least two users it makes sense to share this code in a
library.
This will also make it later possible to dynamically allocate lglocks.
In general the users now look more like normal function calls with
pointers, not magic macros.
The patch is rather large because I move over all users in one go to keep
it bisectable. This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jesper Juhl [Fri, 16 Dec 2011 04:50:04 +0000 (15:50 +1100)]
audit: always follow va_copy() with va_end()
A call to va_copy() should always be followed by a call to va_end() in the
same function. In kernel/autit.c::audit_log_vformat() this is not always
done. This patch makes sure va_end() is always called.
Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Carstens [Fri, 16 Dec 2011 04:50:03 +0000 (15:50 +1100)]
mm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCAL
While implementing cmpxchg_double() on s390 I realized that we don't set
CONFIG_CMPXCHG_LOCAL besides the fact that we have support for it.
However setting that option will increase the size of struct page by eight
bytes on 64 bit, which we certainly do not want. Also, it doesn't make
sense that a present cpu feature should increase the size of struct page.
Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and
that it should depend on CMPXCHG_DOUBLE instead.
This patch:
If an architecture supports CMPXCHG_LOCAL this shouldn't result
automatically in larger struct pages if the SLUB allocator is used.
Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which can
be selected if a double word aligned struct page is required. Also update
x86 Kconfig so that it should work as before.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
WARNING: please, no spaces at the start of a line
#57: FILE: arch/m68k/amiga/config.c:515:
+ __noreturn;$
total: 0 errors, 1 warnings, 106 lines checked
./patches/treewide-convert-uses-of-attrib_noreturn-to-__noreturn.patch has style problems, please review.
If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shaohua Li [Fri, 16 Dec 2011 04:49:59 +0000 (15:49 +1100)]
intel_idle: fix API misuse
smp_call_function() only lets all other CPUs execute a specific function,
while we expect all CPUs do in intel_idle. Without the fix, we could have
one cpu which has auto_demotion enabled or has no boradcast timer setup.
Usually we don't see impact because auto demotion just harms power and the
intel_idle init is called in CPU 0, where boradcast timer delivers
interrupt, but this still could be a problem.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Magnus Lynch [Fri, 16 Dec 2011 04:49:59 +0000 (15:49 +1100)]
hpet: factor timer allocate from open
The current implementation of the /dev/hpet driver couples opening the
device with allocating one of the (scarce) timers (aka comparators). This
is a limitation in that the main counter may be valuable to applications
seeking a high-resolution timer who have no use for the interrupt
generating functionality of the comparators.
This patch alters the open semantics so that when the device is opened, no
timer is allocated. Operations that depend on a timer being in context
implicitly attempt allocating a timer, to maintain backward compatibility.
There is also an IOCTL (HPET_ALLOC_TIMER _IO) added so that the
allocation may be done explicitly. (I prefer the explicit open then
allocate pattern but don't know how practical it would be to require all
existing code to be changed.)
/dev/hpet is accessed via mmap(). This is the only interface of /dev/hpet
that is actually used in practice.
[akpm@linux-foundation.org: coding-style tweaks]
[arnd@arndb.de: fix build] Signed-off-by: Magnus Lynch <maglyx@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Acked-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
tracepoint: add tracepoints for debugging oom_score_adj
oom_score_adj is used for guarding processes from OOM-Killer. One of
problem is that it's inherited at fork(). When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.
This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
- creating new task
- renaming a task (exec)
- set oom_score_adj
To debug, users need to enable some trace pointer. Maybe filtering is useful as
KOSAKI Motohiro [Fri, 16 Dec 2011 04:49:57 +0000 (15:49 +1100)]
mm: simplify find_vma_prev()
commit 297c5eee37 ("mm: make the vma list be doubly linked") added the
vm_prev member to vm_area_struct. We can simplify find_vma_prev() by
using it. Also, this change helps to improve page fault performance
because it has stronger locality of reference.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>