Anton Vorontsov [Thu, 18 Jun 2009 23:49:02 +0000 (16:49 -0700)]
powerpc/86xx: add MMC SPI support for MPC8610HPCD boards
This patch adds spi and mmc-spi-slot nodes, plus a gpio-controller for
PIXIS' sdcsr bank that is used for managing SPI chip-select and for
reading card's states.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: Kumar Gala <galak@gate.crashing.org> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Thu, 18 Jun 2009 23:49:01 +0000 (16:49 -0700)]
spi_mpc83xx: add small delay after asserting chip-select line
This is needed for some underlaying GPIO controllers that may be a bit
slow, or if chip-select signal need some time to stabilize.
For what it's worth, we already have the similar delay for chip-select
de-assertion case.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: Kumar Gala <galak@gate.crashing.org> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Thu, 18 Jun 2009 23:49:01 +0000 (16:49 -0700)]
spi_mpc83xx: quieten down the "Requested speed is too low" message
When a platform is running at high frequencies it's not always possible to
scale-down a frequency to a requested value, and using mmc_spi driver this
leads to the following printk flood during card polling:
...
mmc_spi spi32766.0: Requested speed is too low: 400000 Hz. Will use
520828 Hz instead.
mmc_spi spi32766.0: Requested speed is too low: 400000 Hz. Will use
520828 Hz instead.
...
Fix this by using WARN_ONCE(), it's better than the flood, and also better
than turning dev_err() into dev_dbg(), since we actually want to warn that
some things may not work correctly.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: Kumar Gala <galak@gate.crashing.org> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Thu, 18 Jun 2009 23:48:59 +0000 (16:48 -0700)]
spi_mpc83xx: handle other Freescale processors
With this patch we'll able to select spi_mpc83xx driver on the MPC86xx
platforms. Let the driver depend on FSL_SOC, so we don't have to worry
about Kconfig anymore.
Also remove the "experimental" dependency, the driver has been tested to
work on a various hardware, and surely not experimental anymore.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: Kumar Gala <galak@gate.crashing.org> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Baruch Siach [Thu, 18 Jun 2009 23:48:58 +0000 (16:48 -0700)]
gpio: driver for PrimeCell PL061 GPIO controller
This is a driver for the ARM PrimeCell PL061 GPIO AMBA peripheral. The
driver is implemented using the gpiolib framework.
This driver also includes support for the use of the PL061 as an interrupt
controller (secondary).
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Cc: David Brownell <david-b@pacbell.net> Acked-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 18 Jun 2009 21:07:46 +0000 (14:07 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: clean up jbd2_journal_try_to_free_buffers()
ext4: Don't update ctime for non-extent-mapped inodes
ext4: Fix up whitespace issues in fs/ext4/inode.c
ext4: Fix 64-bit block type problem on 32-bit platforms
ext4: teach the inode allocator to use a goal inode number
ext4: Use a hash of the topdir directory name for the Orlov parent group
ext4: document the "abort" mount option
ext4: move the abort flag from s_mount_opts to s_mount_flags
ext4: update the s_last_mounted field in the superblock
ext4: change s_mount_opt to be an unsigned int
ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl
ext4: avoid unnecessary spinlock in critical POSIX ACL path
ext3: avoid unnecessary spinlock in critical POSIX ACL path
ext4: convert instrumentation from markers to tracepoints
jbd2: convert instrumentation from markers to tracepoints
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (56 commits)
sh: Fix declaration of __kernel_sigreturn and __kernel_rt_sigreturn
sh: Enable soc-camera in ap325rxa/migor/se7724 defconfigs.
sh: remove stray markers.
sh: defconfig updates.
sh: pci: Initial PCI-Express support for SH7786 Urquell board.
sh: Generic HAVE_PERF_COUNTER support.
SH: convert migor to soc-camera as platform-device
SH: convert ap325rxa to soc-camera as platform-device
soc-camera: unify i2c camera device platform data
sh: add platform data for r8a66597-hcd in setup-sh7723
sh: add platform data for r8a66597-hcd in setup-sh7366
sh: x3proto: add platform data for r8a66597-hcd
sh: highlander: add platform data for r8a66597-hcd
sh: sh7785lcr: add platform data for r8a66597-hcd
sh: turn off irqs when disabling CMT/TMU timers
sh: use kzalloc() for cpg clocks
sh: unbreak WARN_ON()
sh: Use generic atomic64_t implementation.
sh: Revised clock function in highlander
sh: Update r7780mp defconfig
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
netxen: fix tx ring accounting
netxen: fix detection of cut-thru firmware mode
forcedeth: fix dma api mismatches
atm: sk_wmem_alloc initial value is one
net: correct off-by-one write allocations reports
via-velocity : fix no link detection on boot
Net / e100: Fix suspend of devices that cannot be power managed
TI DaVinci EMAC : Fix rmmod error
net: group address list and its count
ipv4: Fix fib_trie rebalancing, part 2
pkt_sched: Update drops stats in act_police
sky2: version 1.23
sky2: add GRO support
sky2: skb recycling
sky2: reduce default transmit ring
sky2: receive counter update
sky2: fix shutdown synchronization
sky2: PCI irq issues
sky2: more receive shutdown
sky2: turn off pause during shutdown
...
Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] hpwdt: Add NMI sourcing
[WATCHDOG] iTCO_wdt: Fix ICH7+ reboot issue.
[WATCHDOG] iTCO_wdt: fix memory corruption when RCBA is disabled by hardware
[WATCHDOG] Correct WDIOF_MAGICCLOSE flag
[WATCHDOG] move platform probe and remove function to devinit and devexit
[WATCHDOG] Some more general cleanup
[WATCHDOG] iTCO_wdt: Cleanup code
Linus Torvalds [Thu, 18 Jun 2009 20:11:50 +0000 (13:11 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (39 commits)
md/raid5: correctly update sync_completed when we reach max_resync
md/raid5: add missing call to schedule() after prepare_to_wait()
md/linear: use call_rcu to free obsolete 'conf' structures.
md linear: Protecting mddev with rcu locks to avoid races
md: Move check for bitmap presence to personality code.
md: remove chunksize rounding from common code.
md: raid0/linear: ensure device sizes are rounded to chunk size.
md: move assignment of ->utime so that it never gets skipped.
md: Push down reconstruction log message to personality code.
md: merge reconfig and check_reshape methods.
md: remove unnecessary arguments from ->reconfig method.
md: raid5: check stripe cache is large enough in start_reshape
md: raid0: chunk_sectors cleanups.
md: fix some comments.
md/raid5: Use is_power_of_2() in raid5_reconfig()/raid6_reconfig().
md: convert conf->chunk_size and conf->prev_chunk to sectors.
md: Convert mddev->new_chunk to sectors.
md: Make mddev->chunk_size sector-based.
md: raid0 :Enables chunk size other than powers of 2.
md: prepare for non-power-of-two chunk sizes
...
Florian Fainelli [Wed, 17 Jun 2009 23:28:38 +0000 (16:28 -0700)]
lib: add lib/gcd.c
This patch adds lib/gcd.c which contains a greatest common divider
implementation taken from sound/core/pcm_timer.c
Several usages of this new library function will be sent to subsystem
maintainers.
[akpm@linux-foundation.org: use swap() (pointed out by Joe)]
[akpm@linux-foundation.org: just add gcd.o to obj-y, remove Kconfig changes] Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Simon Horman <horms@verge.net.au> Cc: Julius Volz <juliusv@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rodolfo Giometti [Wed, 17 Jun 2009 23:28:37 +0000 (16:28 -0700)]
LinuxPPS: core support
This patch adds the kernel side of the PPS support currently named
"LinuxPPS".
PPS means "pulse per second" and a PPS source is just a device which
provides a high precision signal each second so that an application can
use it to adjust system clock time.
Common use is the combination of the NTPD as userland program with a GPS
receiver as PPS source to obtain a wallclock-time with sub-millisecond
synchronisation to UTC.
To obtain this goal the userland programs shoud use the PPS API
specification (RFC 2783 - Pulse-Per-Second API for UNIX-like Operating
Systems, Version 1.0) which in part is implemented by this patch. It
provides a set of chars devices, one per PPS source, which can be used to
get the time signal. The RFC's functions can be implemented by accessing
to these char devices.
Signed-off-by: Rodolfo Giometti <giometti@linux.it> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:35 +0000 (16:28 -0700)]
gru: remove references to the obsolete global status handle
Delete references to the SGI GRU GSH hardware resources. These GRU
resources have been deleted from the hardware. (These resources have
never benn used, anyway).
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:34 +0000 (16:28 -0700)]
gru: fixes to grudump utility
Minor fixes to the SGI GRU grudump facility:
- fix address where user data is written
- add gru number to data passed to user
- indicate if context is locked
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:33 +0000 (16:28 -0700)]
gru: fix potential use-after-free when purging GRU tlbs
Fix potential SGI GRU bug that could cause a use-after-free. If one
thread in a task is flushing the GRU and another thread destroys the GRU
context, there is the potential to access a table after it has been freed.
Copy the gms pointer to a local variable before unlocking the gts table.
Note that no refcnt is needed for the gms - the reference is held
indirectly by the task's mm_struct.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:32 +0000 (16:28 -0700)]
gru: generic infrastructure for context options
Change the user GRU request for specifying the "task_slice" option to use
a generic infrastructure that can be expanded in the future to include
additional context options. No new capabilities are added with this
patch.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:31 +0000 (16:28 -0700)]
gru: delete user request for fetching chiplet status
Delete the user request for fetching the status of a GRU chiplet. This
request has been made obsolete by other changes. Note: this is not a
change to a user API - there are no compatibility issues with this change.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:30 +0000 (16:28 -0700)]
gru: collect per-context user statistics
Collect GRU statistics for each user GRU context. Statistics are kept for
TLB misses & content resource contention. Add user request for retrieving
the statistics.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:28 +0000 (16:28 -0700)]
gru: fix cache coherency issues with instruction retry
Fix two problems related to GRU instruction failures. Cache coherency is
not maintained for CBEs except when loading or unloading contexts. When
reading a CBE to extract error information, the CBE must first be flushed
from the cache.
The function that reads kerrnel CBEs was reading the wrong CBE.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:27 +0000 (16:28 -0700)]
gru: update to rev 0.9 of gru spec
Update GRU driver to the latest version of the GRU spec. This consists
of minor updates:
- changes & additions to error status bits
- new restriction on handling of TLB misses while in FMM mode
- new field (not used by software) in TFH
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:25 +0000 (16:28 -0700)]
gru: support instruction completion interrupts
Add support for interrupts generated by GRU instruction completion.
Previously, the only interrupts were for TLB misses. The hardware also
supports interrupts on instruction completion. This will be supported for
instructions issued by the kernel.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:24 +0000 (16:28 -0700)]
gru: check context state on reload
Check whether the gru state being loaded into a gru is from a new context
or a previously unloaded context. If new, simply zero out the hardware
context; if unloaded and valid, reload the old state.
This change is primarily for reloading kernel contexts where the previous
is not required to be saved.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:23 +0000 (16:28 -0700)]
gru: fix handling of mesq failures
Fix endcase in handling GRU message queue failures due to NACKs of PUT
requests. Must ensure that the "present" bits are cleared before
resending the message.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:23 +0000 (16:28 -0700)]
gru: support contexts with zero dsrs or cbrs
Support alocation of GRU contexts that contain zero DSR or CBR resources.
Some instructions do not require DSR resources. Contexts without CBR
resources are useful for diagnostics.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:22 +0000 (16:28 -0700)]
gru: change resource assignment for kernel threads
Change the way GRU resources are assigned for kernel threads. GRU
contexts for kernel threads are now allocated on demand and can be stolen
by user processes when idle. This allows MPI jobs to use ALL of the GRU
resources when the kernel is not using them.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:21 +0000 (16:28 -0700)]
gru: support cch_allocate for kernel threads
Change the interface to cch_allocate so that it can be used to allocate
GRU contexts for kernel threads. Kernel threads use the GRU in unmapped
mode and do not require ASIDs for the GRU TLB.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:20 +0000 (16:28 -0700)]
gru: change context load and unload
Remove "static" from the functions for loading/unloading GRU contexts.
These functions will be called from other GRU files. Fix bug in unlocking
gru context.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:20 +0000 (16:28 -0700)]
gru: dynamic allocation of kernel contexts
Change the interface to gru_alloc_gts() so that it can be used to allocate
GRU contexts for kernel threads. Kernel threads do not have vdata
structures for the GRU contexts. The GRU resource count are now passed
explicitly instead of inside the vdata structure.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jack Steiner [Wed, 17 Jun 2009 23:28:19 +0000 (16:28 -0700)]
gru: bug fixes for GRU exception handling
Bug fixes for GRU exception handling. Additional fields from the CBR must
be returned to the user to allow the user to correctly diagnose GRU
exceptions.
Handle endcase in TFH TLB miss handling. Verify that TFH actually
indicates a pending exception.
Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Neil Horman [Wed, 17 Jun 2009 23:28:17 +0000 (16:28 -0700)]
kexec: sysrq: simplify sysrq-c handler
Currently the sysrq-c handler is bit over-engineered. Its behavior is
dependent on a few compile time and run time factors that alter its
behavior which is really unnecessecary.
If CONFIG_KEXEC is not configured, sysrq-c, crashes the system with a NULL
pointer dereference. If CONFIG_KEXEC is configured, it calls crash_kexec
directly, which implies that the kexec kernel will either be booted (if
its been previously loaded), or it will simply do nothing (the no kexec
kernel has been loaded).
It would be much easier to just simplify the whole thing to dereference a
NULL pointer all the time regardless of configuration. That way, it will
always try to crash the system, and if a kexec kernel has been loaded into
reserved space, it will still boot from the page fault trap handler
(assuming panic_on_oops is set appropriately).
[akpm@linux-foundation.org: build fix] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Brayan Arraes <brayan@yack.com.br> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Mack [Wed, 17 Jun 2009 23:28:15 +0000 (16:28 -0700)]
w1-gpio: add external pull-up enable callback
On embedded devices, sleep mode conditions can be tricky to handle,
Especially when processors tend to pull-down the w1 bus during sleep. Bus
slaves (such as the ds2760) may interpret this as a reason for power-down
conditions and entirely switch off the device.
This patch adds a callback function pointer to let users switch on and off
the external pull-up resistor. This lets the outside world know whether
the processor is currently actively driving the bus or not.
When this callback is not provided, the code behaviour won't change.
Signed-off-by: Daniel Mack <daniel@caiaq.de> Acked-by: Ville Syrjala <syrjala@sci.fi> Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Wed, 17 Jun 2009 23:28:10 +0000 (16:28 -0700)]
dma-mapping: add asm-generic/dma-mapping-common.h
We unified x86 and IA64's handling of multiple dma mapping operations
(struct dma_map_ops in linux/dma-mapping.h) so we can remove duplication
in their arch/include/asm/dma-mapping.h.
This patchset adds include/asm-generic/dma-mapping-common.h that provides
some generic dma mapping function definitions for the users of struct
dma_map_ops. This enables us to remove about 100 lines. This also
enables us to easily add CONFIG_DMA_API_DEBUG support, which only x86
supports for now. The 4th patch adds CONFIG_DMA_API_DEBUG support to IA64
by adding only 8 lines.
This patch:
This header file provides some mapping function definitions that the users
of struct dma_map_ops can use.
Enable gcov profiling of the entire kernel on x86_64. Required changes
include disabling profiling for:
* arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
not linked to main kernel
* arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
profiling causes segfaults during boot (incompatible context)
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enable the use of GCC's coverage testing tool gcov [1] with the Linux
kernel. gcov may be useful for:
* debugging (has this code been reached at all?)
* test improvement (how do I change my test to cover these lines?)
* minimizing kernel configurations (do I need this option if the
associated code is never run?)
The profiling patch incorporates the following changes:
* change kbuild to include profiling flags
* provide functions needed by profiling code
* present profiling data as files in debugfs
Note that on some architectures, enabling gcc's profiling option
"-fprofile-arcs" for the entire kernel may trigger compile/link/
run-time problems, some of which are caused by toolchain bugs and
others which require adjustment of architecture code.
For this reason profiling the entire kernel is initially restricted
to those architectures for which it is known to work without changes.
This restriction can be lifted once an architecture has been tested
and found compatible with gcc's profiling. Profiling of single files
or directories is still available on all platforms (see config help
text).
[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
seq_write() can be used to construct seq_files containing arbitrary data.
Required by the gcov-profiling interface to synthesize binary profiling
data files.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Call constructors (gcc-generated initcall-like functions) during kernel
start and module load. Constructors are e.g. used for gcov data
initialization.
Disable constructor support for usermode Linux to prevent conflicts with
host glibc.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harry Ciao [Wed, 17 Jun 2009 23:28:00 +0000 (16:28 -0700)]
edac: cpc925 MC platform device setup
Fix up the number of cells for the values of CPC925 Memory Controller,
and setup related platform device during system booting up, against
which CPC925 Memory Controller EDAC driver would be matched.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harry Ciao [Wed, 17 Jun 2009 23:27:59 +0000 (16:27 -0700)]
edac: add edac_device_alloc_index()
Add edac_device_alloc_index(), because for MAPLE platform there may
exist several EDAC driver modules that could make use of
edac_device_ctl_info structure at the same time. The index allocation
for these structures should be taken care of by EDAC core.
[akpm@linux-foundation.org: cleanups] Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harry Ciao [Wed, 17 Jun 2009 23:27:58 +0000 (16:27 -0700)]
edac: add CPC925 Memory Controller driver
Introduce IBM CPC925 EDAC driver, which makes use of ECC, CPU and
HyperTransport Link error detections and corrections on the IBM
CPC925 Bridge and Memory Controller.
[akpm@linux-foundation.org: cleanup] Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Helsley [Wed, 17 Jun 2009 23:27:58 +0000 (16:27 -0700)]
futex: documentation: fix inconsistent description of futex list_op_pending
Strictly speaking list_op_pending points to the 'lock entry', not the
'lock word' (which is actually at 'offset' from 'lock entry'). We can
infer this based on reading the code in kernel/futex.c:
Alexey Dobriyan [Wed, 17 Jun 2009 23:27:56 +0000 (16:27 -0700)]
nsproxy: extract create_nsproxy()
clone_nsproxy() does useless copying of old nsproxy -- every pointer will
be rewritten to new ns or to old ns. Remove copying, rename
clone_nsproxy(), create_nsproxy() will be used by C/R code to create fresh
nsproxy on restart.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Wed, 17 Jun 2009 23:27:52 +0000 (16:27 -0700)]
pidns: make create_pid_namespace() accept parent pidns
create_pid_namespace() creates everything, but caller has to assign parent
pidns by hand, which is unnatural. At the moment of call new ->level has
to be taken from somewhere and parent pidns is already available.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
find_task_by_pid_type_ns is only used to implement find_task_by_vpid and
find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument.
So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use
find_task_by_pid_ns to implement find_task_by_vpid.
While we're at it also remove the exports for find_task_by_pid_ns and
find_task_by_vpid - we don't have any modular callers left as the only
modular caller of he old pre pid namespace find_task_by_pid (gfs2) was
switched to pid_task which operates on a struct pid pointer instead of a
pid_t. Given the confusion about pid_t values vs namespace that's
generally the better option anyway and I think we're better of restricting
modules to do it that way.
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Wed, 17 Jun 2009 23:27:48 +0000 (16:27 -0700)]
Char: isicom: fix build warning
Fix this:
isicom.c: In function `isicom_probe':
isicom.c:1587: warning: `signature' may be used uninitialized in this function
by uninitialized_var(), because if the signature is not initialized in
reset_card(), we won't use it.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:45 +0000 (16:27 -0700)]
kthreads: simplify migration_thread() exit path
Now that kthread_stop() can be used even if the task has already exited,
we can kill the "wait_to_die:" loop in migration_thread(). But we must
pin rq->migration_thread after creation.
Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for
->migration_thread exit. Perhaps we can simplify this code a bit more.
migration_call() can set ->should_stop and forget about this thread. But
we need a new helper in kthred.c for that.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:45 +0000 (16:27 -0700)]
kthreads: rework kthread_stop()
Based on Eric's patch which in turn was based on my patch.
kthread_stop() has the nasty problems:
- it runs unpredictably long with the global semaphore held.
- it deadlocks if kthread itself does kthread_stop() before it obeys
the kthread_should_stop() request.
- it is not useable if kthread exits on its own, see for example the
ugly "wait_to_die:" hack in migration_thread()
- it is not possible to just tell kthread it should stop, we must always
wait for its exit.
With this patch kthread() allocates all neccesary data (struct kthread) on
its own stack, globals kthread_stop_xxx are deleted. ->vfork_done is used
as a pointer into "struct kthread", this means kthread_stop() can easily
wait for kthread's exit.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:43 +0000 (16:27 -0700)]
kthreads: simplify the startup synchronization
We use two completions two create the kernel thread, this is a bit ugly.
kthread() wakes up create_kthread() via ->started, then create_kthread()
wakes up the caller kthread_create() via ->done. But kthread() does not
need to wait for kthread(), it can just return. Instead kthread() itself
can wake up the caller of kthread_create().
Kill kthread_create_info->started, ->done is enough. This improves the
scalability a bit and sijmplifies the code.
The only problem if kernel_thread() fails, in that case create_kthread()
must do complete(&create->done).
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:42 +0000 (16:27 -0700)]
do_wait: fix the theoretical race with stop/trace/cont
do_wait:
current->state = TASK_INTERRUPTIBLE;
read_lock(&tasklist_lock);
... search for the task to reap ...
In theory, the ->state changing can leak into the critical section. Since
the child can change its status under read_lock(tasklist) in parallel
(finish_stop/ptrace_stop), we can miss the wakeup if __wake_up_parent()
sees us in TASK_RUNNING state. Add the barrier.
Also, use __set_current_state() to set TASK_RUNNING.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:41 +0000 (16:27 -0700)]
do_wait: kill the old BUG_ON, use while_each_thread()
do_wait() does BUG_ON(tsk->signal != current->signal), this looks like a
raher obsolete check. At least, I don't think do_wait() is the best place
to verify that all threads have the same ->signal. Remove it.
Also, change the code to use while_each_thread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we don't pass &retval down to other helpers we can simplify
the code more.
- kill tsk_result, just use retval
- add the "notask" label right after the main loop, and
s/got end/goto notask/ after the fastpath pid check.
This way we don't need to initialize retval before this
check and the code becomes a bit more clean, if this pid
has no attached tasks we should just skip the list search.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:39 +0000 (16:27 -0700)]
shift "ptrace implies WUNTRACED" from ptrace_do_wait() to wait_task_stopped()
No functional changes, preparation for the next patch.
ptrace_do_wait() adds WUNTRACED to options for wait_task_stopped() which
should always accept the stopped tracee, even if do_wait() was called
without WUNTRACED.
Change wait_task_stopped() to check "ptrace || WUNTRACED" instead. This
makes the code more explicit, and "int options" argument becomes const in
do_wait() pathes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:37 +0000 (16:27 -0700)]
copy_process(): remove the unneeded clear_tsk_thread_flag(TIF_SIGPENDING)
The forked child can have TIF_SIGPENDING if it was copied from parent's
ti->flags. But this is harmless and actually almost never happens,
because copy_process() can't succeed if signal_pending() == T.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:36 +0000 (16:27 -0700)]
ptrace: don't take tasklist to get/set ->last_siginfo
Change ptrace_getsiginfo/ptrace_setsiginfo to use lock_task_sighand()
without tasklist_lock. Perhaps it makes sense to make a single helper
with "bool rw" argument.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:35 +0000 (16:27 -0700)]
ptrace: do_notify_parent_cldstop: fix the wrong ->nsproxy usage
If the non-traced sub-thread calls do_notify_parent_cldstop(), we send the
notification to group_leader->real_parent and we report group_leader's
pid.
But, if group_leader is traced we use the wrong ->parent->nsproxy->pid_ns,
the tracer and parent can live in different namespaces. Change the code
to use "parent" instead of tsk->parent.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:33 +0000 (16:27 -0700)]
ptrace: do not use task_lock() for attach
Remove the "Nasty, nasty" lock dance in ptrace_attach()/ptrace_traceme() -
from now task_lock() has nothing to do with ptrace at all.
With the recent changes nobody uses task_lock() to serialize with ptrace,
but in fact it was never needed and it was never used consistently.
However ptrace_attach() calls __ptrace_may_access() and needs task_lock()
to pin task->mm for get_dumpable(). But we can call __ptrace_may_access()
before we take tasklist_lock, ->cred_exec_mutex protects us against
do_execve() path which can change creds and MMF_DUMP* flags.
(ugly, but we can't use ptrace_may_access() because it hides the error
code, so we have to take task_lock() and use __ptrace_may_access()).
NOTE: this change assumes that LSM hooks, security_ptrace_may_access() and
security_ptrace_traceme(), can be called without task_lock() held.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:32 +0000 (16:27 -0700)]
ptrace: cleanup check/set of PT_PTRACED during attach
ptrace_attach() and ptrace_traceme() are the last functions which look as
if the untraced task can have task->ptrace != 0, this must not be
possible. Change the code to just check ->ptrace != 0 and s/|=/=/ to set
PT_PTRACED.
Also, a couple of trivial whitespace cleanups in ptrace_attach().
And move ptrace_traceme() up near ptrace_attach() to keep them close to
each other.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:31 +0000 (16:27 -0700)]
ptrace: ptrace_attach: check PF_KTHREAD + exit_state instead of ->mm
- Add PF_KTHREAD check to prevent attaching to the kernel thread
with a borrowed ->mm.
With or without this change we can race with daemonize() which
can set PF_KTHREAD or clear ->mm after ptrace_attach() does the
check, but this doesn't matter because reparent_to_kthreadd()
does ptrace_unlink().
- Kill "!task->mm" check. We don't really care about ->mm != NULL,
and the task can call exit_mm() right after we drop task_lock().
What we need is to make sure we can't attach after exit_notify(),
check task->exit_state != 0 instead.
Also, move the "already traced" check down for cosmetic reasons.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:29 +0000 (16:27 -0700)]
ptrace: mm_need_new_owner: use ->real_parent to search in the siblings
"Search in the siblings" should use ->real_parent, not ->parent. If the
task is traced then ->parent == tracer, while the task's parent is always
->real_parent.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 17 Jun 2009 23:27:23 +0000 (16:27 -0700)]
allow_signal: kill the bogus ->mm check, add a note about CLONE_SIGHAND
allow_signal() checks ->mm == NULL. Not sure why. Perhaps to make sure
current is the kernel thread. But this helper must not be used unless we
are the kernel thread, kill this check.
Also, document the fact that the CLONE_SIGHAND kthread must not use
allow_signal(), unless the caller really wants to change the parent's
->sighand->action as well.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Try to fix memcg's lru rotation sanity: make memcg use the same logic as
the global LRU does.
Now, at __isolate_lru_page() retruns -EBUSY, the page is rotated to the
tail of LRU in global LRU's isolate LRU pages. But in memcg, it's not
handled. This makes memcg do the same behavior as global LRU and rotate
LRU in the page is busy.
memcg: fix behavior under memory.limit equals to memsw.limit
A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when the
user just want to limit the total size of applications, in other words,
not very interested in memory usage itself. In this case, swap-out will
be done only by global-LRU.
But, under current implementation, memory.limit_in_bytes is checked at
first and try_to_free_page() may do swap-out. But, that swap-out is
useless for memsw.limit_in_bytes and the thread may hit limit again.
This patch tries to fix the current behavior at memory.limit ==
memsw.limit case. And documentation is updated to explain the behavior of
this special case.