]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agoTesting from the XFS folk revealed that there is still too much I/O from
Mel Gorman [Wed, 24 Aug 2011 23:46:57 +0000 (09:46 +1000)]
Testing from the XFS folk revealed that there is still too much I/O from
the end of the LRU in kswapd.  Previously it was considered acceptable by
VM people for a small number of pages to be written back from reclaim with
testing generally showing about 0.3% of pages reclaimed were written back
(higher if memory was low).  That writing back a small number of pages is
ok has been heavily disputed for quite some time and Dave Chinner
explained it well;

It doesn't have to be a very high number to be a problem. IO
is orders of magnitude slower than the CPU time it takes to
flush a page, so the cost of making a bad flush decision is
very high. And single page writeback from the LRU is almost
always a bad flush decision.

To complicate matters, filesystems respond very differently to requests
from reclaim according to Christoph Hellwig;

xfs tries to write it back if the requester is kswapd
ext4 ignores the request if it's a delayed allocation
btrfs ignores the request

As a result, each filesystem has different performance characteristics
when under memory pressure and there are many pages being dirtied.  In
some cases, the request is ignored entirely so the VM cannot depend on the
IO being dispatched.

The objective of this series is to reduce writing of filesystem-backed
pages from reclaim, play nicely with writeback that is already in progress
and throttle reclaim appropriately when writeback pages are encountered.
The assumption is that the flushers will always write pages faster than if
reclaim issues the IO.  The new problem is that reclaim has very little
control over how long before a page in a particular zone or container is
cleaned which is discussed later.  A secondary goal is to avoid the
problem whereby direct reclaim splices two potentially deep call stacks
together.

Patch 1 disables writeback of filesystem pages from direct reclaim
entirely. Anonymous pages are still written.

Patch 2 removes dead code in lumpy reclaim as it is no longer able
to synchronously write pages. This hurts lumpy reclaim but
there is an expectation that compaction is used for hugepage
allocations these days and lumpy reclaim's days are numbered.

Patches 3-4 add warnings to XFS and ext4 if called from
direct reclaim. With patch 1, this "never happens" and is
intended to catch regressions in this logic in the future.

Patch 5 disables writeback of filesystem pages from kswapd unless
the priority is raised to the point where kswapd is considered
to be in trouble.

Patch 6 throttles reclaimers if too many dirty pages are being
encountered and the zones or backing devices are congested.

Patch 7 invalidates dirty pages found at the end of the LRU so they
are reclaimed quickly after being written back rather than
waiting for a reclaimer to find them

I consider this series to be orthogonal to the writeback work but it is
worth noting that the writeback work affects the viability of patch 8 in
particular.

I tested this on ext4 and xfs using fs_mark, a simple writeback test based
on dd and a micro benchmark that does a streaming write to a large mapping
(exercises use-once LRU logic) followed by streaming writes to a mix of
anonymous and file-backed mappings.  The command line for fs_mark when
botted with 512M looked something like

./fs_mark -d  /tmp/fsmark-2676  -D  100  -N  150  -n  150  -L  25  -t  1  -S0  -s  10485760

The number of files was adjusted depending on the amount of available
memory so that the files created was about 3xRAM.  For multiple threads,
the -d switch is specified multiple times.

The test machine is x86-64 with an older generation of AMD processor with
4 cores.  The underlying storage was 4 disks configured as RAID-0 as this
was the best configuration of storage I had available.  Swap is on a
separate disk.  Dirty ratio was tuned to 40% instead of the default of
20%.

Testing was run with and without monitors to both verify that the patches
were operating as expected and that any performance gain was real and not
due to interference from monitors.

Here is a summary of results based on testing XFS.

512M1P-xfs           Files/s  mean                 32.69 ( 0.00%)     34.44 ( 5.08%)
512M1P-xfs           Elapsed Time fsmark                    51.41     48.29
512M1P-xfs           Elapsed Time simple-wb                114.09    108.61
512M1P-xfs           Elapsed Time mmap-strm                113.46    109.34
512M1P-xfs           Kswapd efficiency fsmark                 62%       63%
512M1P-xfs           Kswapd efficiency simple-wb              56%       61%
512M1P-xfs           Kswapd efficiency mmap-strm              44%       42%
512M-xfs             Files/s  mean                 30.78 ( 0.00%)     35.94 (14.36%)
512M-xfs             Elapsed Time fsmark                    56.08     48.90
512M-xfs             Elapsed Time simple-wb                112.22     98.13
512M-xfs             Elapsed Time mmap-strm                219.15    196.67
512M-xfs             Kswapd efficiency fsmark                 54%       56%
512M-xfs             Kswapd efficiency simple-wb              54%       55%
512M-xfs             Kswapd efficiency mmap-strm              45%       44%
512M-4X-xfs          Files/s  mean                 30.31 ( 0.00%)     33.33 ( 9.06%)
512M-4X-xfs          Elapsed Time fsmark                    63.26     55.88
512M-4X-xfs          Elapsed Time simple-wb                100.90     90.25
512M-4X-xfs          Elapsed Time mmap-strm                261.73    255.38
512M-4X-xfs          Kswapd efficiency fsmark                 49%       50%
512M-4X-xfs          Kswapd efficiency simple-wb              54%       56%
512M-4X-xfs          Kswapd efficiency mmap-strm              37%       36%
512M-16X-xfs         Files/s  mean                 60.89 ( 0.00%)     65.22 ( 6.64%)
512M-16X-xfs         Elapsed Time fsmark                    67.47     58.25
512M-16X-xfs         Elapsed Time simple-wb                103.22     90.89
512M-16X-xfs         Elapsed Time mmap-strm                237.09    198.82
512M-16X-xfs         Kswapd efficiency fsmark                 45%       46%
512M-16X-xfs         Kswapd efficiency simple-wb              53%       55%
512M-16X-xfs         Kswapd efficiency mmap-strm              33%       33%

Up until 512-4X, the FSmark improvements were statistically significant.
For the 4X and 16X tests the results were within standard deviations but
just barely.  The time to completion for all tests is improved which is an
important result.  In general, kswapd efficiency is not affected by
skipping dirty pages.

1024M1P-xfs          Files/s  mean                 39.09 ( 0.00%)     41.15 ( 5.01%)
1024M1P-xfs          Elapsed Time fsmark                    84.14     80.41
1024M1P-xfs          Elapsed Time simple-wb                210.77    184.78
1024M1P-xfs          Elapsed Time mmap-strm                162.00    160.34
1024M1P-xfs          Kswapd efficiency fsmark                 69%       75%
1024M1P-xfs          Kswapd efficiency simple-wb              71%       77%
1024M1P-xfs          Kswapd efficiency mmap-strm              43%       44%
1024M-xfs            Files/s  mean                 35.45 ( 0.00%)     37.00 ( 4.19%)
1024M-xfs            Elapsed Time fsmark                    94.59     91.00
1024M-xfs            Elapsed Time simple-wb                229.84    195.08
1024M-xfs            Elapsed Time mmap-strm                405.38    440.29
1024M-xfs            Kswapd efficiency fsmark                 79%       71%
1024M-xfs            Kswapd efficiency simple-wb              74%       74%
1024M-xfs            Kswapd efficiency mmap-strm              39%       42%
1024M-4X-xfs         Files/s  mean                 32.63 ( 0.00%)     35.05 ( 6.90%)
1024M-4X-xfs         Elapsed Time fsmark                   103.33     97.74
1024M-4X-xfs         Elapsed Time simple-wb                204.48    178.57
1024M-4X-xfs         Elapsed Time mmap-strm                528.38    511.88
1024M-4X-xfs         Kswapd efficiency fsmark                 81%       70%
1024M-4X-xfs         Kswapd efficiency simple-wb              73%       72%
1024M-4X-xfs         Kswapd efficiency mmap-strm              39%       38%
1024M-16X-xfs        Files/s  mean                 42.65 ( 0.00%)     42.97 ( 0.74%)
1024M-16X-xfs        Elapsed Time fsmark                   103.11     99.11
1024M-16X-xfs        Elapsed Time simple-wb                200.83    178.24
1024M-16X-xfs        Elapsed Time mmap-strm                397.35    459.82
1024M-16X-xfs        Kswapd efficiency fsmark                 84%       69%
1024M-16X-xfs        Kswapd efficiency simple-wb              74%       73%
1024M-16X-xfs        Kswapd efficiency mmap-strm              39%       40%

All FSMark tests up to 16X had statistically significant improvements.
For the most part, tests are completing faster with the exception of the
streaming writes to a mixture of anonymous and file-backed mappings which
were slower in two cases

In the cases where the mmap-strm tests were slower, there was more
swapping due to dirty pages being skipped.  The number of additional pages
swapped is almost identical to the fewer number of pages written from
reclaim.  In other words, roughly the same number of pages were reclaimed
but swapping was slower.  As the test is a bit unrealistic and stresses
memory heavily, the small shift is acceptable.

4608M1P-xfs          Files/s  mean                 29.75 ( 0.00%)     30.96 ( 3.91%)
4608M1P-xfs          Elapsed Time fsmark                   512.01    492.15
4608M1P-xfs          Elapsed Time simple-wb                618.18    566.24
4608M1P-xfs          Elapsed Time mmap-strm                488.05    465.07
4608M1P-xfs          Kswapd efficiency fsmark                 93%       86%
4608M1P-xfs          Kswapd efficiency simple-wb              88%       84%
4608M1P-xfs          Kswapd efficiency mmap-strm              46%       45%
4608M-xfs            Files/s  mean                 27.60 ( 0.00%)     28.85 ( 4.33%)
4608M-xfs            Elapsed Time fsmark                   555.96    532.34
4608M-xfs            Elapsed Time simple-wb                659.72    571.85
4608M-xfs            Elapsed Time mmap-strm               1082.57   1146.38
4608M-xfs            Kswapd efficiency fsmark                 89%       91%
4608M-xfs            Kswapd efficiency simple-wb              88%       82%
4608M-xfs            Kswapd efficiency mmap-strm              48%       46%
4608M-4X-xfs         Files/s  mean                 26.00 ( 0.00%)     27.47 ( 5.35%)
4608M-4X-xfs         Elapsed Time fsmark                   592.91    564.00
4608M-4X-xfs         Elapsed Time simple-wb                616.65    575.07
4608M-4X-xfs         Elapsed Time mmap-strm               1773.02   1631.53
4608M-4X-xfs         Kswapd efficiency fsmark                 90%       94%
4608M-4X-xfs         Kswapd efficiency simple-wb              87%       82%
4608M-4X-xfs         Kswapd efficiency mmap-strm              43%       43%
4608M-16X-xfs        Files/s  mean                 26.07 ( 0.00%)     26.42 ( 1.32%)
4608M-16X-xfs        Elapsed Time fsmark                   602.69    585.78
4608M-16X-xfs        Elapsed Time simple-wb                606.60    573.81
4608M-16X-xfs        Elapsed Time mmap-strm               1549.75   1441.86
4608M-16X-xfs        Kswapd efficiency fsmark                 98%       98%
4608M-16X-xfs        Kswapd efficiency simple-wb              88%       82%
4608M-16X-xfs        Kswapd efficiency mmap-strm              44%       42%

Unlike the other tests, the fsmark results are not statistically
significant but the min and max times are both improved and for the most
part, tests completed faster.

There are other indications that this is an improvement as well.  For
example, in the vast majority of cases, there were fewer pages scanned by
direct reclaim implying in many cases that stalls due to direct reclaim
are reduced.  KSwapd is scanning more due to skipping dirty pages which is
unfortunate but the CPU usage is still acceptable

In an earlier set of tests, I used blktrace and in almost all cases
throughput throughout the entire test was higher.  However, I ended up
discarding those results as recording blktrace data was too heavy for my
liking.

On a laptop, I plugged in a USB stick and ran a similar tests of tests
using it as backing storage.  A desktop environment was running and for
the entire duration of the tests, firefox and gnome terminal were
launching and exiting to vaguely simulate a user.

1024M-xfs            Files/s  mean               0.41 ( 0.00%)        0.44 ( 6.82%)
1024M-xfs            Elapsed Time fsmark               2053.52   1641.03
1024M-xfs            Elapsed Time simple-wb            1229.53    768.05
1024M-xfs            Elapsed Time mmap-strm            4126.44   4597.03
1024M-xfs            Kswapd efficiency fsmark              84%       85%
1024M-xfs            Kswapd efficiency simple-wb           92%       81%
1024M-xfs            Kswapd efficiency mmap-strm           60%       51%
1024M-xfs            Avg wait ms fsmark                5404.53     4473.87
1024M-xfs            Avg wait ms simple-wb             2541.35     1453.54
1024M-xfs            Avg wait ms mmap-strm             3400.25     3852.53

The mmap-strm results were hurt because firefox launching had a tendency
to push the test out of memory.  On the postive side, firefox launched
marginally faster with the patches applied.  Time to completion for many
tests was faster but more importantly - the "Avg wait" time as measured by
iostat was far lower implying the system would be more responsive.  It was
also the case that "Avg wait ms" on the root filesystem was lower.  I
tested it manually and while the system felt slightly more responsive
while copying data to a USB stick, it was marginal enough that it could be
my imagination.

This patch: do not writeback filesystem pages in direct reclaim.

When kswapd is failing to keep zones above the min watermark, a process
will enter direct reclaim in the same manner kswapd does.  If a dirty page
is encountered during the scan, this page is written to backing storage
using mapping->writepage.

This causes two problems.  First, it can result in very deep call stacks,
particularly if the target storage or filesystem are complex.  Some
filesystems ignore write requests from direct reclaim as a result.  The
second is that a single-page flush is inefficient in terms of IO.  While
there is an expectation that the elevator will merge requests, this does
not always happen.  Quoting Christoph Hellwig;

The elevator has a relatively small window it can operate on,
and can never fix up a bad large scale writeback pattern.

This patch prevents direct reclaim writing back filesystem pages by
checking if current is kswapd.  Anonymous pages are still written to swap
as there is not the equivalent of a flusher thread for anonymous pages.
If the dirty pages cannot be written back, they are placed back on the LRU
lists.  There is now a direct dependency on dirty page balancing to
prevent too many pages in the system being dirtied which would prevent
reclaim making forward progress.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoadd missing ;
Andrew Morton [Wed, 24 Aug 2011 23:46:56 +0000 (09:46 +1000)]
add missing ;

Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoAdd comments to explain the page statistics field in the mm_struct.
Christoph Lameter [Wed, 24 Aug 2011 23:46:56 +0000 (09:46 +1000)]
Add comments to explain the page statistics field in the mm_struct.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoSome kernel components pin user space memory (infiniband and perf) (by
Christoph Lameter [Wed, 24 Aug 2011 23:46:54 +0000 (09:46 +1000)]
Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".

The difference between mlocking and pinning is:

A. mlocked pages are marked with PG_mlocked and are exempt from
   swapping. Page migration may move them around though.
   They are kept on a special LRU list.

B. Pinned pages cannot be moved because something needs to
   directly access physical memory. They may not be on any
   LRU list.

I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:

Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.

This patch introduces a separate counter for pinned pages and
accounts them seperately.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Mike Marciniszyn <infinipath@qlogic.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe nr_force_scan[] tuple holds the effective scan numbers for anon and
Johannes Weiner [Wed, 24 Aug 2011 23:46:54 +0000 (09:46 +1000)]
The nr_force_scan[] tuple holds the effective scan numbers for anon and
file pages in case the situation called for a forced scan and the
regularly calculated scan numbers turned out zero.

However, the effective scan number can always be assumed to be
SWAP_CLUSTER_MAX right before the division into anon and file.  The
numerators and denominator are properly set up for all cases, be it force
scan for just file, just anon, or both, to do the right thing.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofix comment
Andrew Morton [Wed, 24 Aug 2011 23:46:53 +0000 (09:46 +1000)]
fix comment

Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWithout swap, anonymous pages are not scanned. As such, they should not
Johannes Weiner [Wed, 24 Aug 2011 23:46:52 +0000 (09:46 +1000)]
Without swap, anonymous pages are not scanned.  As such, they should not
count when considering force-scanning a small target if there is no swap.

Otherwise, targets are not force-scanned even when their effective scan
number is zero and the other conditions--kswapd/memcg--apply.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWhen we get a bad_page bug report, it's useful to see what modules the
Dave Jones [Wed, 24 Aug 2011 23:46:51 +0000 (09:46 +1000)]
When we get a bad_page bug report, it's useful to see what modules the
user had loaded.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoAdd the leading word "tmpfs" to the Kconfig string to make it blindingly
Robert P. J. Day [Wed, 24 Aug 2011 23:46:51 +0000 (09:46 +1000)]
Add the leading word "tmpfs" to the Kconfig string to make it blindingly
obvious that this selection refers to tmpfs.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoAfter selecting a task to kill, the oom killer iterates all processes and
David Rientjes [Wed, 24 Aug 2011 23:46:49 +0000 (09:46 +1000)]
After selecting a task to kill, the oom killer iterates all processes and
kills all other threads that share the same mm_struct in different thread
groups.  It would not otherwise be helpful to kill a thread if its memory
would not be subsequently freed.

A kernel thread, however, may assume a user thread's mm by using
use_mm().  This is only temporary and should not result in sending a
SIGKILL to that kthread.

This patch ensures that only user threads and not kthreads are sent a
SIGKILL if they share the same mm_struct as the oom killed task.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe tracing ring-buffer used this function briefly, but not anymore.
Johannes Weiner [Wed, 24 Aug 2011 23:46:49 +0000 (09:46 +1000)]
The tracing ring-buffer used this function briefly, but not anymore.
Make it local to the writeback code again.

Also, move the function so that no forward declaration needs to be
reintroduced.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoper-task block plug can reduce block queue lock contention and increase
Shaohua Li [Wed, 24 Aug 2011 23:46:48 +0000 (09:46 +1000)]
per-task block plug can reduce block queue lock contention and increase
request merge.  Currently page reclaim doesn't support it.  I originally
thought page reclaim doesn't need it, because kswapd thread count is
limited and file cache write is done at flusher mostly.

When I test a workload with heavy swap in a 4-node machine, each CPU is
doing direct page reclaim and swap.  This causes block queue lock
contention.  In my test, without below patch, the CPU utilization is about
2% ~ 7%.  With the patch, the CPU utilization is about 1% ~ 3%.  Disk
throughput isn't changed.  This should improve normal kswapd write and
file cache write too (increase request merge for example), but might not
be so obvious as I explain above.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoradix_tree_tag_get()'s BUG (when it sees a tag after saw_unset_tag) was
Hugh Dickins [Wed, 24 Aug 2011 23:46:48 +0000 (09:46 +1000)]
radix_tree_tag_get()'s BUG (when it sees a tag after saw_unset_tag) was
unsafe and removed in 2.6.34, but the pointless saw_unset_tag left behind.

Remove it now, and return 0 as soon as we see unset tag - we already rely
upon the root tag to be correct, returning 0 immediately if it's not set.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agounmap_and_move() is one a big messy function. Clean it up.
Minchan Kim [Wed, 24 Aug 2011 23:46:46 +0000 (09:46 +1000)]
unmap_and_move() is one a big messy function.  Clean it up.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoIn __zone_reclaim case, we don't want to shrink mapped page. Nonetheless,
Minchan Kim [Wed, 24 Aug 2011 23:46:45 +0000 (09:46 +1000)]
In __zone_reclaim case, we don't want to shrink mapped page.  Nonetheless,
we have isolated mapped page and re-add it into LRU's head.  It's
unnecessary CPU overhead and makes LRU churning.

Of course, when we isolate the page, the page might be mapped but when we
try to migrate the page, the page would be not mapped.  So it could be
migrated.  But race is rare and although it happens, it's no big deal.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoIn async mode, compaction doesn't migrate dirty or writeback pages. So,
Minchan Kim [Wed, 24 Aug 2011 23:46:44 +0000 (09:46 +1000)]
In async mode, compaction doesn't migrate dirty or writeback pages.  So,
it's meaningless to pick the page and re-add it to lru list.

Of course, when we isolate the page in compaction, the page might be dirty
or writeback but when we try to migrate the page, the page would be not
dirty, writeback.  So it could be migrated.  But it's very unlikely as
isolate and migration cycle is much faster than writeout.

So, this patch helps cpu overhead and prevent unnecessary LRU churning.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoChange ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally,
Minchan Kim [Wed, 24 Aug 2011 23:46:44 +0000 (09:46 +1000)]
Change ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.

Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags.  INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."

This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoacct_isolated of compaction uses page_lru_base_type which returns only
Minchan Kim [Wed, 24 Aug 2011 23:46:42 +0000 (09:46 +1000)]
acct_isolated of compaction uses page_lru_base_type which returns only
base type of LRU list so it never returns LRU_ACTIVE_ANON or
LRU_ACTIVE_FILE.  In addtion, cc->nr_[anon|file] is used in only
acct_isolated so it doesn't have fields in conpact_control.

This patch removes fields from compact_control and makes clear function of
acct_issolated which counts the number of anon|file pages isolated.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years ago> You might get some speed benefit by optimising for the small copies
Christopher Yeoh [Wed, 24 Aug 2011 23:46:42 +0000 (09:46 +1000)]
> You might get some speed benefit by optimising for the small copies
> here.  Define a local on-stack array of N page*'s and point
> process_pages at that if the number of pages is <= N.  Saves a
> malloc/free and is more cache-friendly.  But only if the result is
> measurable!

I have done some benchmarking on this, and it gains about 5-7% on a
microbenchmark with 4kb size copies and about a 1% gain with a more
realistic (but modified for smaller copies) hpcc benchmark. The
performance gain disappears into the noise by about 64kb sized copies.
No measurable overhead for larger copies. So I think its worth including

Included below is the patch (based on v4) - for ease of review the first diff
is just against the latest version of CMA which has been posted here previously.
The second is the entire CMA patch.

Signed-off-by: Chris Yeoh <cyeoh@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years ago- Add x86_64 specific wire up
Christopher Yeoh [Wed, 24 Aug 2011 23:46:41 +0000 (09:46 +1000)]
- Add x86_64 specific wire up

- Change behaviour so process_vm_readv and process_vm_writev return
  the number of bytes successfully read or written even if an error
  occurs

- Add more kernel doc interface comments

- rename some internal functions (process_vm_rw_check_iovecs,
  process_vm_rw) so they make more sense.

- Add licence message

- Fix kernel-doc comment format

Still need to do benchmarking to see if the optimisation for small copies
using a local on-stack array in process_vm_rw_core is worth it.

Signed-off-by: Chris Yeoh <cyeoh@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe basic idea behind cross memory attach is to allow MPI programs doing
Christopher Yeoh [Wed, 24 Aug 2011 23:46:40 +0000 (09:46 +1000)]
The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWhen we get corruption reports, it's useful to see if the kernel was
Dave Jones [Wed, 24 Aug 2011 23:46:39 +0000 (09:46 +1000)]
When we get corruption reports, it's useful to see if the kernel was
tainted, to rule out problems we can't do anything about.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWhen we get corruption reports, it's useful to see if the kernel was
Dave Jones [Wed, 24 Aug 2011 23:46:38 +0000 (09:46 +1000)]
When we get corruption reports, it's useful to see if the kernel was
tainted, to rule out problems we can't do anything about.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoUnbreak alpha build.
Andrew Morton [Wed, 24 Aug 2011 23:46:38 +0000 (09:46 +1000)]
Unbreak alpha build.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoUnbreak alpha build.
Andrew Morton [Wed, 24 Aug 2011 23:46:37 +0000 (09:46 +1000)]
Unbreak alpha build.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoUnbreak the alpha build.
Andrew Morton [Wed, 24 Aug 2011 23:46:36 +0000 (09:46 +1000)]
Unbreak the alpha build.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe address limit is already set in flush_old_exec() so this assignment of
Mathias Krause [Wed, 24 Aug 2011 23:46:35 +0000 (09:46 +1000)]
The address limit is already set in flush_old_exec() so this assignment of
USER_DS is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobrd_make_request() always returns 0, which doesn't make much sense.
Eric Miao [Wed, 24 Aug 2011 23:46:34 +0000 (09:46 +1000)]
brd_make_request() always returns 0, which doesn't make much sense.

Signed-off-by: Eric Miao <eric.miao@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoInstead of open coding this function use kstrtoul_from_user() directly.
Stephen Boyd [Wed, 24 Aug 2011 23:46:34 +0000 (09:46 +1000)]
Instead of open coding this function use kstrtoul_from_user() directly.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThis does involve additional use of the spin lock in idr.c. Is this an
Jonathan Cameron [Wed, 24 Aug 2011 23:46:33 +0000 (09:46 +1000)]
This does involve additional use of the spin lock in idr.c.  Is this an
issue?

Also, some error mangling was needed to keep the interface the same.  Does
this matter or can we return -ENOSPC instead of -EBUSY?

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoSome mangling of errors was necessary to maintain current interface.
Jonathan Cameron [Wed, 24 Aug 2011 23:46:32 +0000 (09:46 +1000)]
Some mangling of errors was necessary to maintain current interface.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWe leak in drivers/scsi/aacraid/commctrl.c::aac_send_raw_srb() :
Jesper Juhl [Wed, 24 Aug 2011 23:46:32 +0000 (09:46 +1000)]
We leak in drivers/scsi/aacraid/commctrl.c::aac_send_raw_srb() :

We allocate memory:
        ...
                        struct user_sgmap* usg;
                        usg = kmalloc(actual_fibsize - sizeof(struct aac_srb)
                          + sizeof(struct sgmap), GFP_KERNEL);
and then neglect to free it:
        ...
                        for (i = 0; i < usg->count; i++) {
                                u64 addr;
                                void* p;
                                if (usg->sg[i].count >
                                    ((dev->adapter_info.options &
                                     AAC_OPT_NEW_COMM) ?
                                      (dev->scsi_host_ptr->max_sectors << 9) :
                                      65536)) {
                                        rcode = -EINVAL;
                                        goto cleanup;
        ... this 'goto' makes 'usg' go out of scope and leak the memory we
            allocated.
            Other exits properly kfree(usg), it's just here it is neglected.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoFix sparse warnings of right shift bigger than source value size:
Randy Dunlap [Wed, 24 Aug 2011 23:46:31 +0000 (09:46 +1000)]
Fix sparse warnings of right shift bigger than source value size:

drivers/scsi/megaraid.c:311:65: warning: right shift by bigger than source value
drivers/scsi/megaraid.c:313:65: warning: right shift by bigger than source value
drivers/scsi/megaraid.c:317:67: warning: right shift by bigger than source value
drivers/scsi/megaraid.c:319:67: warning: right shift by bigger than source value

Patch suggestion from email by Al Viro:

"Since both are claimed to be strings, I really suspect that this >> 8 is
misspelled >> 4 and they have a character followed by pair of two-digit
packed decimals in there..."

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoFor headers that get exported to userland and make use of u32 style
Alexander Shishkin [Wed, 24 Aug 2011 23:46:30 +0000 (09:46 +1000)]
For headers that get exported to userland and make use of u32 style
type names, it is advised to include linux/types.h.

This fixes a headers_check warning.

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokernel/rtmutex.c: In function '__rt_mutex_slowlock':
Andrew Morton [Wed, 24 Aug 2011 23:46:30 +0000 (09:46 +1000)]
kernel/rtmutex.c: In function '__rt_mutex_slowlock':
kernel/rtmutex.c:605: warning: suggest parentheses around assignment used as tru

Fixes "rcu: Permit rt_mutex_unlock() with irqs disabled".

Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe current implementation of dmi_name_in_vendors() is an invitation to
Jean Delvare [Wed, 24 Aug 2011 23:46:29 +0000 (09:46 +1000)]
The current implementation of dmi_name_in_vendors() is an invitation to
lazy coding and false positives [1].  Searching for a string in 8 know
what you're looking for, so you should know where to look.  strstr isn't
fast, especially when it fails, so we should avoid calling it when it just
can't succeed.

Looking at the current users of the function, it seems clear to me that
they are looking for a system or board vendor name, so let's limit
dmi_name_in_vendors to these two DMI fields.  This much better matches the
function name, BTW.

[1] We currently have code looking for short names in DMI data, such
as "IBM", "ASUS" or "Acer". I let you guess what will happen the day
other vendors ship products named, for example, "SCHREIBMEISTER",
"PEGASUS" or "Acerola".

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWhen do pci remove/rescan on system that have more iommus, got
Yinghai Lu [Wed, 24 Aug 2011 23:46:28 +0000 (09:46 +1000)]
When do pci remove/rescan on system that have more iommus, got

[  894.089745] Set context mapping for c4:00.0
[  894.110890] mpt2sas3: Allocated physical memory: size(4293 kB)
[  894.112556] mpt2sas3: Current Controller Queue Depth(1883), Max Controller Queue Depth(2144)
[  894.127278] mpt2sas3: Scatter Gather Elements per IO(128)
[  894.361295] DRHD: handling fault status reg 2
[  894.364053] DMAR:[DMA Read] Request device [c4:00.0] fault addr fffbe000
[  894.364056] DMAR:[fault reason 02] Present bit in context entry is cl

it turns out when remove/rescan, pci dev will be freed and will get
another new dev.  but drhd units still keep old one...  so
dmar_find_matched_drhd_unit will return wrong drhd and iommu for the
device that is not on first iommu.

So need to update devices in drhd_units during pci remove/rescan.

Could save domain/bus/device/function aside in the list and according that
info restore new dev to drhd_units later.  Then
dmar_find_matched_drdh_unit and device_to_iommu could return right drhd
and iommu.

Add remove_dev_from_drhd/restore_dev_to_drhd functions to do the real
work.  call them in device ADD_DEVICE and UNBOUND_DRIVER

Need to do the samething to atsr.  (expose dmar_atsr_units and add
atsru->segment)

After patch, will right iommu for the new dev and will not get DMAR error
any more.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe address limit is already set in flush_old_exec() so those calls to
Mathias Krause [Wed, 24 Aug 2011 23:46:27 +0000 (09:46 +1000)]
The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
Akinobu Mita [Wed, 24 Aug 2011 23:46:27 +0000 (09:46 +1000)]
The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned.  The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit().  These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit().  The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for
Akinobu Mita [Wed, 24 Aug 2011 23:46:26 +0000 (09:46 +1000)]
ext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for
ext4.  Only two ext4_{set,clear}_bit() calls check the return value.  The
rest of calls ignore the return value and they can be replaced with
__{set,clear}_bit_le().

This changes ext4_{set,clear}_bit() from __test_and_{set,clear}_bit_le()
to __{set,clear}_bit_le() and introduces ext4_test_and_{set,clear}_bit()
for the two places where old bit needs to be returned.

This ext4_{set,clear}_bit() change is considered safe, because if someone
uses these macros without noticing the change, new ext4_{set,clear}_bit
don't have return value and causes compiler errors where the return value
is used.

This also removes unused ext4_find_first_zero_bit().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agodel_timer_sync() calls debug_object_assert_init() to assert that a timer
Christine Chan [Wed, 24 Aug 2011 23:46:25 +0000 (09:46 +1000)]
del_timer_sync() calls debug_object_assert_init() to assert that a timer
has been initialized before calling lock_timer_base().  lock_timer_base()
would spin forever on a NULL(uninit-ed) base.  The check is added to
del_timer() to prevent silent failure, even though it would not get stuck
in an infinite loop.

Signed-off-by: Christine Chan <cschan@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoAdd new check (assert_init) to make sure objects are initialized and
Christine Chan [Wed, 24 Aug 2011 23:46:25 +0000 (09:46 +1000)]
Add new check (assert_init) to make sure objects are initialized and
tracked by debugobjects.

Signed-off-by: Christine Chan <cschan@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe address limit is already set in flush_old_exec() so this
Mathias Krause [Wed, 24 Aug 2011 23:46:24 +0000 (09:46 +1000)]
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe address limit is already set in flush_old_exec() so this
Mathias Krause [Wed, 24 Aug 2011 23:46:24 +0000 (09:46 +1000)]
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoshow_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
Michal Hocko [Wed, 24 Aug 2011 23:46:24 +0000 (09:46 +1000)]
show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
statistics when priting information about idle and iowait times.  This is
OK if we are not using tickless kernel (CONFIG_NO_HZ) because counters are
updated periodically.

With NO_HZ things got more tricky because we are not doing idle/iowait
accounting while we are tickless so the value might get outdated.  Users
of /proc/stat will notice that by unchanged idle/iowait values which is
then interpreted as 0% idle/iowait time.  From the user space POV this is
an unexpected behavior and a change of the interface.

Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
total idle/iowait time since boot and it doesn't rely on sampling or any
other periodic activity.  Fall back to the previous behavior if NO_HZ is
disabled or not configured.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoget_cpu_{idle,iowait}_time_us update idle/iowait counters unconditionally
Michal Hocko [Wed, 24 Aug 2011 23:46:23 +0000 (09:46 +1000)]
get_cpu_{idle,iowait}_time_us update idle/iowait counters unconditionally
if the given CPU is in the idle loop.  This doesn't work well outside of
CPU governors which are singletons so nobody (except for IRQ) can race
with them.

We will need to use both functions from /proc/stat handler to properly
handle nohz idle/iowait times.

Let's update those counters only if the given last_update_time parameter
is non-NULL which means that the caller is interested in updating.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoupdate_ts_time_stat currently updates idle time even if we are in iowait
Michal Hocko [Wed, 24 Aug 2011 23:46:23 +0000 (09:46 +1000)]
update_ts_time_stat currently updates idle time even if we are in iowait
loop at the moment.  The only real users of the idle counter (via
get_cpu_idle_time_us) are CPU governors and they expect to get cumulative
time for both idle and iowait times.  The value (idle_sleeptime) is also
printed to userspace by print_cpu but it prints both idle and iowait times
so the idle part is misleading.

Let's clean this up and fix update_ts_time_stat to account both counters
properly and update consumers of idle to consider iowait time as well.  If
we do this we might use get_cpu_{idle,iowait}_time_us from other contexts
as well and we will get expected values.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThis patchset aims at addressing /proc/stat issue which has been
Michal Hocko [Wed, 24 Aug 2011 23:46:22 +0000 (09:46 +1000)]
This patchset aims at addressing /proc/stat issue which has been
introduced with tickless kernel.  In short, show_stat (proc handler)
relies on kstat_cpu(i).cpustat statistics which are updated periodically
so those numbers are more or less accurate.

This is, however, not true with tickless kernel for idle and iowait
counters because those are not updated while the cpu is in the tickless
state.  As the time when CPU might be tickless is not bounded, we can see
really outdated values.

The biggest problem is that tools which read /proc/stat interpret
unchanged idle/iowait numbers as 0% idle/iowait which might confuse those
who rely on them.

The first patch in this series is just a minor clean-up.

The second one changes update_ts_time_stat semantic.  The current
implementation updates idle counter regardless we are in iowait loop at
the moment.  I see it as an optimization because cpufreq drivers, which
are only users of those counters, care about busy vs.  non-busy states so
idle+iowait makes perfect sense.  This, however, makes idle counter
useless for others.

I think that using get_cpu_idle_time_us + get_cpu_iowait_time_us should
have the same meaning (at least this is what we do for jiffies variants).

The third patch changes get_cpu_{idle,iowait}_time_us semantic.  Both
functions call update_ts_time_stat so they update counters as a side
effect.  This should be OK most of the time as governors (the only users)
are singletons.  I can still see a potential problem because they might
race with IRQ:

irq_enter
  tick_check_idle
    tick_check_nohz
      tick_nohz_stop_idle

but this is a separate issue IMO.

Anyway, we shouldn't update those counters from other contexts so let's
make updating conditional based on the last_update_time parameter.

The final patch is the actual fix.  It uses get_cpu_{idle,iowait}_time_us
to get precise counters.  We still fall back to kstat_cpu if tickless
kernel is disabled.

The patchset is based on top of and gave it some testing (although I am
still not sure about the cpufreq part and possible side effects).  My
testing was quite trivial (8 CPU machine):

mount -t cgroup -o cpuset none /mnt/cgroup
mkdir /mnt/cgroup/a
echo 0-5 >  /mnt/cgroup/a/cpuset.cpus
echo 0 > /mnt/cgroup/a/cpuset.mems
for i in `cat /mnt/cgroup/tasks`; do echo $i > /mnt/cgroup/a/tasks; done
[only kernel threads will stay in the root cgroup]
mkdir /mnt/cgroup/b
echo 6,7 >  /mnt/cgroup/a/cpuset.cpus
echo 0 > /mnt/cgroup/a/cpuset.mems
[no task in that group so CPU6,7 should be idle most of the time]

Without the last patch I can see unchanged values for CPU[67] taking up to
several seconds.

This patch:

Get rid of semicolon so that those expressions can be used also somewhere
else than just in an assignment.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoA straightforward looking use of idr for a device id.
Jonathan Cameron [Wed, 24 Aug 2011 23:46:22 +0000 (09:46 +1000)]
A straightforward looking use of idr for a device id.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agohwmon was using an idr with a NULL pointer, so convert to an
Jonathan Cameron [Wed, 24 Aug 2011 23:46:21 +0000 (09:46 +1000)]
hwmon was using an idr with a NULL pointer, so convert to an
ida which then allows use of Rusty's ida_simple_get.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agofb_set_suspend() must be called with the console semaphore held, which
Andrea Righi [Wed, 24 Aug 2011 23:46:21 +0000 (09:46 +1000)]
fb_set_suspend() must be called with the console semaphore held, which
means the code path coming in here will first take the console_lock() and
then call lock_fb_info().

However several framebuffer ioctl commands acquire these locks in reverse
order (lock_fb_info() and then console_lock()).  This gives rise to
potential AB-BA deadlock.

Fix this by changing the order of acquisition in the ioctl commands that
make use of console_lock().

Signed-off-by: Andrea Righi <arighi@develer.com>
Reported-by: Peter Nordström (Palm GBU) <peter.nordstrom@palm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agokbuf is a buffer that is local to this function, so all of the error paths
Julia Lawall [Wed, 24 Aug 2011 23:46:21 +0000 (09:46 +1000)]
kbuf is a buffer that is local to this function, so all of the error paths
leaving the function should release it.

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoSome messing with error codes to return 0 on out id's and match
Jonathan Cameron [Wed, 24 Aug 2011 23:46:20 +0000 (09:46 +1000)]
Some messing with error codes to return 0 on out id's and match
current situation.  Is this necessary? Looks a touch 'interesting'.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe RETE bit in IECSR is cleared by writing a 1 to it.
Liu Gang-B34182 [Wed, 24 Aug 2011 23:46:20 +0000 (09:46 +1000)]
The RETE bit in IECSR is cleared by writing a 1 to it.

Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoDon't dereference em if it's NULL or an error pointer.
Roel Kluin [Wed, 24 Aug 2011 23:46:19 +0000 (09:46 +1000)]
Don't dereference em if it's NULL or an error pointer.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoA call to va_copy() should always be followed by a call to va_end() in the
Jesper Juhl [Wed, 24 Aug 2011 23:46:19 +0000 (09:46 +1000)]
A call to va_copy() should always be followed by a call to va_end() in the
same function.  In kernel/autit.c::audit_log_vformat() this is not always
done.  This patch makes sure va_end() is always called.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe address limit is already set in flush_old_exec() so this
Mathias Krause [Wed, 24 Aug 2011 23:46:19 +0000 (09:46 +1000)]
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoDon't allow everybody to use a modem.
Vasiliy Kulikov [Wed, 24 Aug 2011 23:46:18 +0000 (09:46 +1000)]
Don't allow everybody to use a modem.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe current interrupt traces from irq_handler_entry and irq_handler_exit
Vaibhav Nagarnaik [Wed, 24 Aug 2011 23:46:18 +0000 (09:46 +1000)]
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.

The following patch adds the event definition and trace instrumentation
for interrupt vectors.  For x86, a lookup table is provided to print out
readable IRQ vector names.  The template can be used to provide interrupt
vector lookup tables on other architectures.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe x86 timer interrupt handler is the only handler not traced in the
Vaibhav Nagarnaik [Wed, 24 Aug 2011 23:46:17 +0000 (09:46 +1000)]
The x86 timer interrupt handler is the only handler not traced in the
irq/irq_handler_{entry|exit} trace events.

Add tracepoints to the interrupt handler to trace it.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoReplace the bubble sort in sanitize_e820_map() with a call to the generic
Mike Ditto [Wed, 24 Aug 2011 23:46:17 +0000 (09:46 +1000)]
Replace the bubble sort in sanitize_e820_map() with a call to the generic
kernel sort function to avoid pathological performance with large maps.

On large (thousands of entries) E820 maps, the previous code took minutes
to run; with this change it's now milliseconds.

Signed-off-by: Mike Ditto <mditto@google.com>
Cc: Stefan Assmann <sassmann@kpanic.de>
Cc: Nancy Yuen <yuenn@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoCc: Ed Wildgoose <git@wildgooses.com>
Andrew Morton [Wed, 24 Aug 2011 23:46:17 +0000 (09:46 +1000)]
Cc: Ed Wildgoose <git@wildgooses.com>
Cc: Ed Wildgoose <kernel@wildgooses.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThis new driver replaces the old PCEngines Alix 2/3 LED driver with a new
Ed Wildgoose [Wed, 24 Aug 2011 23:46:16 +0000 (09:46 +1000)]
This new driver replaces the old PCEngines Alix 2/3 LED driver with a new
driver that controls the LEDs through the leds-gpio driver.  The old
driver accessed GPIOs directly, which created a conflict and prevented
also loading the cs5535-gpio driver to read other GPIOs on the Alix board.
 With this new driver, we hook into leds-gpio which in turn uses GPIO to
control the LEDs and therefore it's possible to control both the LEDs and
access onboard GPIOs

Driver is moved to platform/geode and any other geode initialisation
modules should move here also.

This driver is inspired by leds-net5501.c by Alessandro Zummo.

Ideally, leds-net5501.c should also be moved to platform/geode.
Additionally the driver relies on parts of the patch: 7f131cf3ed ("leds:
leds-alix2c - take port address from MSR) by Daniel Mack to perform
detection of the Alix board.

Signed-off-by: Ed Wildgoose <kernel@wildgooses.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Daniel Mack <daniel@caiaq.de>
Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoOn x86_32 casting the unsigned int result of get_random_int() to long may
Ludwig Nussel [Wed, 24 Aug 2011 23:46:16 +0000 (09:46 +1000)]
On x86_32 casting the unsigned int result of get_random_int() to long may
result in a negative value.  On x86_32 the range of mmap_rnd() therefore
was -255 to 255.  The 32bit mode on x86_64 used 0 to 255 as intended.

The bug was introduced by 675a081 ("x86: unify mmap_{32|64}.c") in January
2008.

Signed-off-by: Ludwig Nussel <ludwig.nussel@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThis makes the iris driver use the platform API, so it is properly exposed
Shérab [Wed, 24 Aug 2011 23:46:15 +0000 (09:46 +1000)]
This makes the iris driver use the platform API, so it is properly exposed
in /sys.

[akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout]
Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoAdd support for Aspire 1410 BIOS v1.3314. Fixes the following error:
Clay Carpenter [Wed, 24 Aug 2011 23:46:15 +0000 (09:46 +1000)]
Add support for Aspire 1410 BIOS v1.3314.  Fixes the following error:

acerhdf: unknown (unsupported) BIOS version Acer/Aspire 1410/v1.3314,
please report, aborting!

Signed-off-by: Clay Carpenter <claycarpenter@gmail.com>
Signed-off-by: Peter Feuerer <peter@piie.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoSince the commit below which added O_PATH support to the *at() calls, the
Andy Whitcroft [Wed, 24 Aug 2011 23:46:15 +0000 (09:46 +1000)]
Since the commit below which added O_PATH support to the *at() calls, the
error return for readlink/readlinkat for the empty pathname has switched
from ENOENT to EINVAL:

  commit 65cfc6722361570bfe255698d9cd4dccaf47570d
  Author: Al Viro <viro@zeniv.linux.org.uk>
  Date:   Sun Mar 13 15:56:26 2011 -0400

    readlinkat(), fchownat() and fstatat() with empty relative pathnames

This is both unexpected for userspace and makes readlink/readlinkat
inconsistant with all other interfaces; and inconsistant with our stated
return for these pathnames.

As the readlinkat call does not have a flags parameter we cannot use the
AT_EMPTY_PATH approach used in the other calls.  Therefore expose whether
the original path is infact entry via a new user_path_at_empty() path
lookup function.  Use this to determine whether to default to EINVAL or
ENOENT for failures.

BugLink: http://bugs.launchpad.net/bugs/817187
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe parameter's origin type is long. On an i386 architecture, it can
hank [Wed, 24 Aug 2011 23:46:14 +0000 (09:46 +1000)]
The parameter's origin type is long.  On an i386 architecture, it can
easily be larger than 0x80000000, causing this function to convert it to a
sign-extended u64 type.  Change the type to unsigned long so we get the
correct result.

[akpm@linux-foundation.org: build fix]
Signed-off-by: hank <pyu@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoFix the following memory leak:
WANG Cong [Wed, 24 Aug 2011 23:46:14 +0000 (09:46 +1000)]
Fix the following memory leak:

unreferenced object 0xffff880107266800 (size 512):
  comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81133940>] create_object+0x187/0x28b
    [<ffffffff814ac103>] kmemleak_alloc+0x73/0x98
    [<ffffffff811232ba>] __kmalloc_node+0x104/0x159
    [<ffffffff81044b98>] kzalloc_node.clone.97+0x15/0x17
    [<ffffffff8104cb90>] build_sched_domains+0xb7/0x7f3
    [<ffffffff8104d4df>] partition_sched_domains+0x1db/0x24a
    [<ffffffff8109ee4a>] do_rebuild_sched_domains+0x3b/0x47
    [<ffffffff810a00c7>] rebuild_sched_domains+0x10/0x12
    [<ffffffff8104d5ba>] sched_power_savings_store+0x6c/0x7b
    [<ffffffff8104d5df>] sched_mc_power_savings_store+0x16/0x18
    [<ffffffff8131322c>] sysdev_class_store+0x20/0x22
    [<ffffffff81193876>] sysfs_write_file+0x108/0x144
    [<ffffffff81135b10>] vfs_write+0xaf/0x102
    [<ffffffff81135d23>] sys_write+0x4d/0x74
    [<ffffffff814c8a42>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoFix kconfig unmet dependency warning. BACKLIGHT_CLASS_DEVICE depends on
Randy Dunlap [Wed, 24 Aug 2011 23:46:13 +0000 (09:46 +1000)]
Fix kconfig unmet dependency warning.  BACKLIGHT_CLASS_DEVICE depends on
BACKLIGHT_LCD_SUPPORT, so select the latter along with the former.

warning: (DRM_RADEON_KMS && DRM_I915 && STUB_POULSBO && FB_BACKLIGHT && PANEL_SHARP_LS037V7DW01 && PANEL_ACX565AKM && USB_APPLEDISPLAY && FB_OLPC_DCON && ASUS_LAPTOP && SONY_LAPTOP && THINKPAD_ACPI && EEEPC_LAPTOP && ACPI_ASUS && ACPI_CMPC && SAMSUNG_Q10) selects BACKLIGHT_CLASS_DEVICE which has unmet direct dependencies (HAS_IOMEM && BACKLIGHT_LCD_SUPPORT)

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWhen no floppy is found the module code can be released while a timer
Carsten Emde [Wed, 24 Aug 2011 23:46:13 +0000 (09:46 +1000)]
When no floppy is found the module code can be released while a timer
function is pending or about to be executed.

CPU0                                  CPU1
      floppy_init()
timer_softirq()
   spin_lock_irq(&base->lock);
   detach_timer();
   spin_unlock_irq(&base->lock);
   -> Interrupt
del_timer();
        return -ENODEV;
                                      module_cleanup();
   <- EOI
   call_timer_fn();
   OOPS

Use del_timer_sync() to prevent this.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoBecause of x86-implement-strict-user-copy-checks-for-x86_64.patch
KAMEZAWA Hiroyuki [Wed, 24 Aug 2011 23:46:12 +0000 (09:46 +1000)]
Because of x86-implement-strict-user-copy-checks-for-x86_64.patch

When compiling mm/mempolicy.c the following warning is shown.

In file included from arch/x86/include/asm/uaccess.h:572,
                 from include/linux/uaccess.h:5,
                 from include/linux/highmem.h:7,
                 from include/linux/pagemap.h:10,
                 from include/linux/mempolicy.h:70,
                 from mm/mempolicy.c:68:
In function `copy_from_user',
    inlined from `compat_sys_get_mempolicy' at mm/mempolicy.c:1415:
arch/x86/include/asm/uaccess_64.h:64: warning: call to `copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct
  LD      mm/built-in.o

Fix this by passing correct buffer size value.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agobd2802_unregister_led_classdev() should unregister all registered
Axel Lin [Wed, 24 Aug 2011 23:46:12 +0000 (09:46 +1000)]
bd2802_unregister_led_classdev() should unregister all registered
instances of led_classdev class that had registered by
bd2802_register_led_classdev().

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Kim Kyuwon <q1.kim@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoFix the following build errors:
WANG Cong [Wed, 24 Aug 2011 23:46:11 +0000 (09:46 +1000)]
Fix the following build errors:

src/drivers/tty/serial/8250_early.c:160: error: 'BASE_BAUD' undeclared (first use in this function): 1 errors in 1 logs
        v3.0/cris/cris-allyesconfig
src/drivers/tty/serial/8250_early.c:160: error: (Each undeclared identifier is reported only once: 1 errors in 1 logs
        v3.0/cris/cris-allyesconfig
src/drivers/tty/serial/8250_early.c:160: error: for each function it appears in.): 1 errors in 1 logs
        v3.0/cris/cris-allyesconfig
src/drivers/tty/serial/8250_early.c:37:24: error: asm/serial.h: No such file or directory: 1 errors in 1 logs
        v3.0/cris/cris-allyesconfig

I am not sure if (1843200 / 16) is suitable for cris, but most other
arch's define it as this value.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoSince 43cc71eed1250 ("platform: prefix MODALIAS with "platform:""), the
Axel Lin [Wed, 24 Aug 2011 23:46:11 +0000 (09:46 +1000)]
Since 43cc71eed1250 ("platform: prefix MODALIAS with "platform:""), the
platform modalias is prefixed with "platform:".

This patch changes the MODULE_ALIAS to "platform:ab8500-pwm".

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Arun Murthy <arun.murthy@stericsson.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMake sure we are passing the same cookie in all calls to
Axel Lin [Wed, 24 Aug 2011 23:46:10 +0000 (09:46 +1000)]
Make sure we are passing the same cookie in all calls to
request_threaded_irq() and free_irq().

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Donggeun Kim <dg77.kim@samsung.com>
Cc: Minkyu Kang <mk7.kang@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThis is a i2c driver, not a platform driver, thus use "i2c" prefix for the
Axel Lin [Wed, 24 Aug 2011 23:46:10 +0000 (09:46 +1000)]
This is a i2c driver, not a platform driver, thus use "i2c" prefix for the
module alias.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWe need a callback to do some things after pwm_enable, pwm_disable
Dilan Lee [Wed, 24 Aug 2011 23:46:10 +0000 (09:46 +1000)]
We need a callback to do some things after pwm_enable, pwm_disable
and pwm_config.

Signed-off-by: Dilan Lee <dilee@nvidia.com>
Reviewed-by: Robert Morell <rmorell@nvidia.com>
Reviewed-by: Arun Murthy <arun.murthy@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoCommit 79dfdac ("memcg: make oom_lock 0 and 1 based rather than counter")
Johannes Weiner [Wed, 24 Aug 2011 23:46:09 +0000 (09:46 +1000)]
Commit 79dfdac ("memcg: make oom_lock 0 and 1 based rather than counter")
tried to oom lock the hierarchy and roll back upon encountering an already
locked memcg.

The code is confused when it comes to detecting a locked memcg, though, so
it would fail and rollback after locking one memcg and encountering an
unlocked second one.

The result is that oom-locking hierarchies fails unconditionally and that
every oom killer invocation simply goes to sleep on the oom waitqueue
forever.  The tasks practically hang forever without anyone intervening,
possibly holding locks that trip up unrelated tasks, too.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoAdd missing include of linux/module.h for drivers that use interfaces from
Axel Lin [Wed, 24 Aug 2011 23:46:09 +0000 (09:46 +1000)]
Add missing include of linux/module.h for drivers that use interfaces from
linux/module.h.  This patch fixes build errors.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Jonathan McDowell <noodles@earth.li>
Acked-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoep93xx_bl.c uses interfaces from linux/module.h,
Axel Lin [Wed, 24 Aug 2011 23:46:08 +0000 (09:46 +1000)]
ep93xx_bl.c uses interfaces from linux/module.h,
so it should include that file. This patch fixes below build errors.

  CC [M]  drivers/video/backlight/ep93xx_bl.o
drivers/video/backlight/ep93xx_bl.c:138: error: 'THIS_MODULE' undeclared here (not in a function)
drivers/video/backlight/ep93xx_bl.c:158: error: expected declaration specifiers or '...' before string constant
drivers/video/backlight/ep93xx_bl.c:158: warning: data definition has no type or storage class
drivers/video/backlight/ep93xx_bl.c:158: warning: type defaults to 'int' in declaration of 'MODULE_DESCRIPTION'
drivers/video/backlight/ep93xx_bl.c:158: warning: function declaration isn't a prototype
drivers/video/backlight/ep93xx_bl.c:159: error: expected declaration specifiers or '...' before string constant
drivers/video/backlight/ep93xx_bl.c:159: warning: data definition has no type or storage class
drivers/video/backlight/ep93xx_bl.c:159: warning: type defaults to 'int' in declaration of 'MODULE_AUTHOR'
drivers/video/backlight/ep93xx_bl.c:159: warning: function declaration isn't a prototype
drivers/video/backlight/ep93xx_bl.c:160: error: expected declaration specifiers or '...' before string constant
drivers/video/backlight/ep93xx_bl.c:160: warning: data definition has no type or storage class
drivers/video/backlight/ep93xx_bl.c:160: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/video/backlight/ep93xx_bl.c:160: warning: function declaration isn't a prototype
drivers/video/backlight/ep93xx_bl.c:161: error: expected declaration specifiers or '...' before string constant
drivers/video/backlight/ep93xx_bl.c:161: warning: data definition has no type or storage class
drivers/video/backlight/ep93xx_bl.c:161: warning: type defaults to 'int' in declaration of 'MODULE_ALIAS'
drivers/video/backlight/ep93xx_bl.c:161: warning: function declaration isn't a prototype

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoReplace/remove use of RIO v.1.2 registers/bits that are not
Alexandre Bounine [Wed, 24 Aug 2011 23:46:08 +0000 (09:46 +1000)]
Replace/remove use of RIO v.1.2 registers/bits that are not
forward-compatible with newer versions of RapidIO specification.

RapidIO specification v.  1.3 removed Write Port CSR, Doorbell CSR,
Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.

Use of removed (since RIO v.1.3) register bits affects users of currently
available 1.3 and 2.x compliant devices who may use not so recent kernel
versions.

Removing checks for unsupported bits makes corresponding routines
compatible with all versions of RapidIO specification.  Therefore,
backporting makes stable kernel versions compliant with RIO v.1.3 and
later as well.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoZONE_CONGESTED is only cleared in kswapd, but pages can be freed in any
Shaohua Li [Wed, 24 Aug 2011 23:46:07 +0000 (09:46 +1000)]
ZONE_CONGESTED is only cleared in kswapd, but pages can be freed in any
task.  It's possible ZONE_CONGESTED isn't cleared in some cases: 1.  the
zone is already balanced just entering balance_pgdat() for order-0 because
concurrent tasks free memory.  In this case, later check will skip the
zone as it's balanced so the flag isn't cleared.  2.  high order balance
fallbacks to order-0.  quote from Mel: At the end of balance_pgdat(),
kswapd uses the following logic;

 If reclaiming at high order {
     for each zone {
             if all_unreclaimable
                     skip
             if watermark is not met
                     order = 0
                     loop again

             /* watermark is met */
             clear congested
     }
 }

i.e.  it clears ZONE_CONGESTED if it the zone is balanced.  if not, it
restarts balancing at order-0.  However, if the higher zones are balanced
for order-0, kswapd will miss clearing ZONE_CONGESTED as that only happens
after a zone is shrunk.  This can mean that wait_iff_congested() stalls
unnecessarily.  This patch makes kswapd clear ZONE_CONGESTED during its
initial highmem->dma scan for zones that are already balanced.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoIt seems that 7bf693951a8e ("console: allow to retain boot console via
Nishanth Aravamudan [Wed, 24 Aug 2011 23:46:07 +0000 (09:46 +1000)]
It seems that 7bf693951a8e ("console: allow to retain boot console via
boot option keep_bootcon") doesn't always achieve what it aims, as when
printk_late_init() runs it unconditionally turns off all boot consoles.
With this patch, I am able to see more messages on the boot console in KVM
guests than I can without, when keep_bootcon is specified.

I think it is appropriate for the relevant -stable trees.  However, it's
more of an annoyance than a serious bug (ideally you don't need to keep
the boot console around as console handover should be working -- I was
encountering a situation where the console handover wasn't working and not
having the boot console available meant I couldn't see why).

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Cc: <stable@kernel.org> [2.6.39.x, 3.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoPaul said: I left Google at the end of last week - if it's not bouncing
Wanlong Gao [Wed, 24 Aug 2011 23:46:06 +0000 (09:46 +1000)]
Paul said: I left Google at the end of last week - if it's not bouncing
already, menage@google.com isn't going to work for much longer.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoI get below warnning:
Shaohua Li [Wed, 24 Aug 2011 23:46:06 +0000 (09:46 +1000)]
I get below warnning:
BUG: using smp_processor_id() in preemptible [00000000] code: bash/746
caller is native_sched_clock+0x37/0x6e
Pid: 746, comm: bash Tainted: G        W   3.0.0+ #254
Call Trace:
 [<ffffffff813435c6>] debug_smp_processor_id+0xc2/0xdc
 [<ffffffff8104158d>] native_sched_clock+0x37/0x6e
 [<ffffffff81116219>] try_to_free_mem_cgroup_pages+0x7d/0x270
 [<ffffffff8114f1f8>] mem_cgroup_force_empty+0x24b/0x27a
 [<ffffffff8114ff21>] ? sys_close+0x38/0x138
 [<ffffffff8114ff21>] ? sys_close+0x38/0x138
 [<ffffffff8114f257>] mem_cgroup_force_empty_write+0x17/0x19
 [<ffffffff810c72fb>] cgroup_file_write+0xa8/0xba
 [<ffffffff811522d2>] vfs_write+0xb3/0x138
 [<ffffffff8115241a>] sys_write+0x4a/0x71
 [<ffffffff8114ffd9>] ? sys_close+0xf0/0x138
 [<ffffffff8176deab>] system_call_fastpath+0x16/0x1b

sched_clock() can't be used with preempt enabled.  And we don't need fast
approach to get clock here, so let's use ktime API.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe various basic memory allocation function return NULL, not an ERR_PTR.
Thomas Meyer [Wed, 24 Aug 2011 23:46:06 +0000 (09:46 +1000)]
The various basic memory allocation function return NULL, not an ERR_PTR.

The semantic patch that makes this change is available in
scripts/coccinelle/null/eno.cocci.

More information about semantic patching is available at
http://coccinelle.lip6.fr/

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Cc: Niranjana Vishwanathapura <nvishwan@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe test for bad usage of min_t() and max_t() is missing the --ignore
Hui Zhu [Wed, 24 Aug 2011 23:46:05 +0000 (09:46 +1000)]
The test for bad usage of min_t() and max_t() is missing the --ignore
type.  Add it.

Signed-off-by: Hui Zhu <teawater@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoChange to new git tree -
Ralf Thielow [Wed, 24 Aug 2011 23:46:05 +0000 (09:46 +1000)]
Change to new git tree -
(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git).

Signed-off-by: Ralf Thielow <ralf.thielow@googlemail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoCommit d1a05b6 ("memcg do not try to drain per-cpu caches without pages")
Johannes Weiner [Wed, 24 Aug 2011 23:46:04 +0000 (09:46 +1000)]
Commit d1a05b6 ("memcg do not try to drain per-cpu caches without pages")
added a drain_local_stock() call to a preemptible section.

The draining task looks up the cpu-local stock twice to set the
draining-flag, then to drain the stock and clear the flag again.  If the
task is migrated to a different CPU in between, noone will clear the flag
on the first stock and it will be forever undrainable.  Its charge can not
be recovered and the cgroup can not be deleted anymore.

Properly pin the task to the executing CPU while draining stocks.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoSigned-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Evgeniy Polyakov [Wed, 24 Aug 2011 23:46:04 +0000 (09:46 +1000)]
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe for loop was looking for i <= 0 instead of i >= 0 so this function
Dan Carpenter [Wed, 24 Aug 2011 23:46:02 +0000 (09:46 +1000)]
The for loop was looking for i <= 0 instead of i >= 0 so this function
never did anything.  Also we started with i = NB_SYSFS_BIN_FILES instead
of "NB_SYSFS_BIN_FILES - 1" which is an off by one bug.

Reported-by: Bojan Prtvar <prtvar.b@gmail.com>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Jean-Franois Dagenais <dagenaisj@sonatest.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoWARNING: line over 80 characters
Andrew Morton [Wed, 24 Aug 2011 23:46:01 +0000 (09:46 +1000)]
WARNING: line over 80 characters
#110: FILE: arch/alpha/kernel/osf_sys.c:637:
+  w = (current_thread_info()->flags >> ALPHA_UAC_SHIFT) & UAC_BITMASK;

ERROR: code indent should use tabs where possible
#110: FILE: arch/alpha/kernel/osf_sys.c:637:
+ ^I^Iw = (current_thread_info()->flags >> ALPHA_UAC_SHIFT) & UAC_BITMASK;$

WARNING: please, no space before tabs
#110: FILE: arch/alpha/kernel/osf_sys.c:637:
+ ^I^Iw = (current_thread_info()->flags >> ALPHA_UAC_SHIFT) & UAC_BITMASK;$

WARNING: please, no spaces at the start of a line
#110: FILE: arch/alpha/kernel/osf_sys.c:637:
+ ^I^Iw = (current_thread_info()->flags >> ALPHA_UAC_SHIFT) & UAC_BITMASK;$

WARNING: line over 80 characters
#121: FILE: arch/alpha/kernel/osf_sys.c:761:
+ new = new | (w & UAC_BITMASK) << ALPHA_UAC_SHIFT;

total: 1 errors, 4 warnings, 58 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

./patches/alpha-unbreak-osf_setsysinfossi_nvpairs.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoThe bug was accidentally found by the following program:
Sergei Trofimovich [Wed, 24 Aug 2011 23:46:01 +0000 (09:46 +1000)]
The bug was accidentally found by the following program:

    #include <asm/sysinfo.h>
    #include <asm/unistd.h>
    #include <sys/syscall.h>
    static int setsysinfo(unsigned long op, void *buffer, unsigned long size,
                          int *start, void *arg, unsigned long flag) {
        return syscall(__NR_osf_setsysinfo, op, buffer, size, start, arg, flag);
    }

    int main(int argc, char **argv) {
        short x[10];
        unsigned int buf[2] = { SSIN_UACPROC, UAC_SIGBUS, };
        setsysinfo(SSI_NVPAIRS, buf, 1, 0, 0, 0);

        int  *y = (int*) (x+1);
        *y = 0;
        return 0;
    }

The program shoud fail on SIGBUS, but didn't.

The patch is a second part of userspace flag fix:
> commit 745dd2405e281d96c0a449103bdf6a895048f28c
> Author: Michael Cree <mcree@orcon.net.nz>
> Date:   Mon Nov 30 22:44:40 2009 -0500
>
>    Alpha: Rearrange thread info flags fixing two regressions
>
>    Both regressions fixed by (1) rearranging TIF_NOTIFY_RESUME flag to be
>    in lower 8 bits of the thread info flags, and (2) making sure that
>    ALPHA_UAC_SHIFT matches the rearrangement of the thread info flags.

Deleted outdated out-of-sync 'UAC_SHIFT' (the cause of bug) in favour of
'ALPHA_UAC_SHIFT'.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Acked-by: Michael Cree <mcree@orcon.net.nz>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoFound on allmodconfig build (ARCH=alpha)
Sergei Trofimovich [Wed, 24 Aug 2011 23:46:00 +0000 (09:46 +1000)]
Found on allmodconfig build (ARCH=alpha)

    drivers/misc/pti.c: In function 'get_id':
    drivers/misc/pti.c:249: error: implicit declaration of function 'kmalloc'
    drivers/misc/pti.c: In function 'pti_char_write':
    drivers/misc/pti.c:658: error: implicit declaration of function 'copy_from_user'

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: J Freyensee <james_p_freyensee@linux.intel.com>
Cc: Jeremy Rocher <rocher.jeremy@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 years agoMerge remote-tracking branch 'kvmtool/master'
Stephen Rothwell [Thu, 25 Aug 2011 05:30:09 +0000 (15:30 +1000)]
Merge remote-tracking branch 'kvmtool/master'

Conflicts:
include/net/9p/9p.h

12 years agoMerge remote-tracking branch 'moduleh/module.h-split'
Stephen Rothwell [Thu, 25 Aug 2011 05:23:32 +0000 (15:23 +1000)]
Merge remote-tracking branch 'moduleh/module.h-split'

Conflicts:
arch/arm/mach-bcmring/mm.c
drivers/staging/iio/accel/adis16220_core.c
include/linux/dmaengine.h
include/linux/mtd/mtd.h

12 years agoMerge remote-tracking branch 'writeback/next'
Stephen Rothwell [Thu, 25 Aug 2011 05:07:12 +0000 (15:07 +1000)]
Merge remote-tracking branch 'writeback/next'

12 years agoMerge remote-tracking branch 'tmem/linux-next'
Stephen Rothwell [Thu, 25 Aug 2011 05:02:10 +0000 (15:02 +1000)]
Merge remote-tracking branch 'tmem/linux-next'

12 years agoMerge remote-tracking branch 'staging/staging-next'
Stephen Rothwell [Thu, 25 Aug 2011 04:48:50 +0000 (14:48 +1000)]
Merge remote-tracking branch 'staging/staging-next'

Conflicts:
drivers/staging/xgifb/XGI_main_26.c