Thomas Petazzoni [Wed, 21 Nov 2012 22:12:54 +0000 (23:12 +0100)]
ARM: mvebu: update defconfig with I2C and RTC support
Now that we have support for the I2C busses on Armada 370/XP, and
support for the RTC on the OpenBlocks AX3-4 platform, include the
necessary options in mvebu_defconfig.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
ARM: mvebu: Add support for I2C on OpenBlocks AX3-4
The OpenBlocks AX3-4 board, based on the Armada XP SoC, has an I2C
bus. This patch enables this bus and sets the clock frequency of the
bus.
[Thomas Petazzoni: updated with other changes on OpenBlocks, rephrased
commit log.] Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
ARM: mvebu: Add support for I2C controllers in Armada 370/XP
The Armada 370 and Armada XP have the same I2C controllers as previous
Marvell SoCs, so the existing mv64xxx-i2c driver works fine.
[Thomas Petazzoni: updated on top of other Armada 370/XP changes,
rephrased the commit log]. Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Jason Cooper [Wed, 21 Nov 2012 20:02:46 +0000 (20:02 +0000)]
Merge tag 'marvell-hwiocc-for-3.8' of git://github.com/MISL-EBU-System-SW/mainline-public into mvebu/everything
Add hardware I/O coherency support for Armada 370/XP
The purpose of this patch set is to add hardware I/O Coherency support
for Armada 370 and Armada XP. Theses SoCs come with an unit called
coherency fabric. A beginning of the support for this unit have been
introduced with the SMP patch set. This series extend this support:
the coherency fabric unit allows to use the Armada XP and the Armada
370 as nearly coherent architectures.
The third patches enables this new feature and register our own set
of DMA ops, to benefit this hardware enhancement.
The first patches exports a dma operation function needed to register
our own set of dma ops.
The second patch introduces a new flag for the address decoding
configuration in order to be able to set the memory windows as
shared memory.
Jason Cooper [Wed, 21 Nov 2012 20:01:15 +0000 (20:01 +0000)]
Merge tag 'marvell-armadaxp-smp-for-3.8' of git://github.com/MISL-EBU-System-SW/mainline-public into mvebu/everything
SMP support for Armada XP
The purpose of this series is to add the SMP support for the Armada XP
SoCs. Beside the SMP support itself brought by the last 3 commits,
this series also adds the support for the coherency fabric unit and
the power management service unit.
The coherency fabric is responsible for ensuring hardware coherency
between all CPUs and between CPUs and I/O masters. This unit is also
available for Armada 370 and will be used in an incoming patch set
for hardware I/O cache coherency.
The power management service unit is responsible for powering down and
waking up CPUs and other SOC units.
Gregory CLEMENT [Fri, 12 Oct 2012 17:20:36 +0000 (19:20 +0200)]
arm: mvebu: Add hardware I/O Coherency support
Armada 370 and XP come with an unit called coherency fabric. This unit
allows to use the Armada 370/XP as a nearly coherent architecture. The
coherency mechanism uses snoop filters to ensure the coherency between
caches, DRAM and devices. This mechanism needs a synchronization
barrier which guarantees that all the memory writes initiated by the
devices have reached their target and do not reside in intermediate
write buffers. That's why the architecture is not totally coherent and
we need to provide our own functions for some DMA operations.
Beside the use of the coherency fabric, the device units will have to
set the attribute flag of the decoding address window to select the
accurate coherency process for the memory transaction. This is done
each device driver programs the DRAM address windows. The value of the
attribute set by the driver is retrieved through the
orion_addr_map_cfg struct filled during the early initialization of
the platform.
Gregory CLEMENT [Fri, 12 Oct 2012 15:59:48 +0000 (17:59 +0200)]
arm: plat-orion: Add coherency attribute when setup mbus target
Recent SoC such as Armada 370/XP came with the possibility to deal
with the I/O coherency by hardware. In this case the transaction
attribute of the window must be flagged as "Shared transaction". Once
this flag is set, then the transactions will be forced to be sent
through the coherency block, in other case transaction is driven
directly to DRAM.
Gregory CLEMENT [Wed, 21 Nov 2012 08:39:19 +0000 (09:39 +0100)]
arm: dma mapping: Export a dma ops function arm_dma_set_mask
Expose another DMA operations function: arm_dma_set_mask. This
function will be added to a custom DMA ops for Armada 370/XP.
Depending of its configuration Armada 370/XP can be set as a "nearly"
coherent architecture. In this case the DMA ops is made of:
- specific functions for this architecture
- already exposed arm DMA related functions
- the arm_dma_set_mask which was not exposed yet.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Gregory CLEMENT [Wed, 14 Nov 2012 21:51:08 +0000 (22:51 +0100)]
arm: mvebu: Add SMP support for Armada XP
This enables SMP support on the Armada XP processor. It adds the
mandatory functions to support SMP such as: the SMP initialization
functions in platsmp.c, the secondary CPU entry point in headsmp.S and
the CPU hotplug initial support in hotplug.c.
Gregory CLEMENT [Wed, 3 Oct 2012 09:58:07 +0000 (11:58 +0200)]
arm: mm: Add support for PJ4B cpu and init routines
PJ4B is an implementation of the ARMv7 (such as the Cortex A9 for
example) released by Marvell. This CPU is currently found in
Armada 370 and Armada XP SoCs. This patch provides a support for the
specific initialization of this CPU.
Gregory CLEMENT [Thu, 2 Aug 2012 08:17:51 +0000 (11:17 +0300)]
arm: mvebu: Add initial support for power managmement service unit
The Armada 370 and Armada XP SOCs have a power management service unit
which is responsible for powering down and waking up CPUs and other
SOC units. This patch adds support for this unit.
Gregory CLEMENT [Thu, 2 Aug 2012 08:16:29 +0000 (11:16 +0300)]
arm: mvebu: Add support for coherency fabric in mach-mvebu
The Armada 370 and Armada XP SOCs have a coherency fabric unit which
is responsible for ensuring hardware coherency between all CPUs and
between CPUs and I/O masters. This patch provides the basic support
needed for SMP.
Thomas Petazzoni [Mon, 19 Nov 2012 13:19:29 +0000 (14:19 +0100)]
arm: mvebu: remove 'clock-frequency' properties from Armada 370/XP Ethernet nodes
The mvneta driver for the Marvell Armada 370/XP Ethernet devices has
gained proper clock framework integration, and the corresponding
Device Tree nodes now have a correct 'clocks' pointer.
The 'clock-frequency' properties in the various .dts files for Armada
370/XP boards have therefore become useless.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 19 Nov 2012 13:18:09 +0000 (14:18 +0100)]
arm: mvebu: add 'clocks' property to Ethernet nodes for Armada 370/XP SoCs
The mvneta driver now understands a standard 'clocks' clock pointer
property in the Device Tree nodes for the Ethernet devices, so we add
the right clock reference for the different Ethernet ports of the
Armada 370/XP SoCs.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 19 Nov 2012 13:40:02 +0000 (14:40 +0100)]
net: mvneta: fix section mismatch warning caused by mvneta_deinit()
mvneta_deinit() can be called from the ->probe() hook in the error
path, so it shouldn't be marked as __devexit. It fixes the following
section mismatch warning:
WARNING: vmlinux.o(.devinit.text+0x239c): Section mismatch in reference
from the function mvneta_probe() to the function .devexit.text:mvneta_deinit()
The function __devinit mvneta_probe() references
a function __devexit mvneta_deinit().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 19 Nov 2012 13:15:25 +0000 (14:15 +0100)]
net: mvneta: add clk support
Now that the Armada 370/XP platform has gained proper integration with
the clock framework, we add clk support in the Marvell Armada 370/XP
Ethernet driver.
Since the existing Device Tree binding that exposes a
'clock-frequency' property has never been exposed in any stable kernel
release, we take the freedom of removing this property to replace it
with the standard 'clocks' clock pointer property.
The Device Tree binding documentation is updated accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 19 Nov 2012 10:41:25 +0000 (11:41 +0100)]
net: mvneta: adjust multiline comments to net/ style
As reported by checkpatch, the multiline comments for net/ and
drivers/net/ have a slightly different format than the one used in the
rest of the kernel, so we adjust our multiline comments accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 19 Nov 2012 10:40:15 +0000 (11:40 +0100)]
net: mvmdio: adjust multiline comment to net/ style
As reported by checkpatch, the multiline comments for net/ and
drivers/net/ have a slightly different format than the one used in the
rest of the kernel, so we adjust our multiline comment accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Andrew Lunn [Sun, 18 Nov 2012 10:44:56 +0000 (11:44 +0100)]
dma: mv_xor: Add a device_control function
The dmatest module for DMA engines calls
device_control(dtc->chan, DMA_TERMINATE_ALL, 0);
after completing the tests. The documentation in
include/linux/dmaengine.h suggests this function is optional and
dma_async_device_register() also does not BUG_ON() when not passed a
function. However, dmatest is not the only code in the kernel
unconditionally calling device_control. So add an implementation
indicating all operations are not implemented.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 15:47:58 +0000 (16:47 +0100)]
dma: mv_xor: add Device Tree binding
This patch finally adds a Device Tree binding to the mv_xor
driver. Thanks to the previous cleanup patches, the Device Tree
binding is relatively simply: one DT node per XOR engine, with
sub-nodes for each XOR channel of the XOR engine. The binding
obviously comes with the necessary documentation.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: devicetree-discuss@lists.ozlabs.org
Thomas Petazzoni [Thu, 15 Nov 2012 14:55:30 +0000 (15:55 +0100)]
dma: mv_xor: remove the pool_size from platform_data
The pool_size is always PAGE_SIZE, and since it is a software
configuration paramter (and not a hardware description parameter), we
cannot make it part of the Device Tree binding, so we'd better remove
it from the platform_data as well.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 14:36:37 +0000 (15:36 +0100)]
dma: mv_xor: remove hw_id field from platform_data
There is no need for the platform_data to give this ID, it is simply
the channel number, so we can compute it inside the driver when
registering the channels.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 14:29:53 +0000 (15:29 +0100)]
dma: mv_xor: rename mv_xor_private to mv_xor_device
Now that mv_xor_device is no longer used to designate the per-channel
DMA devices, use it know to designate the XOR engine themselves
(currently composed of two XOR channels).
So, now we have the nice organization where:
- mv_xor_device represents each XOR engine in the system
- mv_xor_chan represents each XOR channel of a given XOR engine
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 14:17:05 +0000 (15:17 +0100)]
dma: mv_xor: merge mv_xor_device and mv_xor_chan
Even though the DMA engine infrastructure has support for multiple
channels per device, the mv_xor driver registers one DMA engine device
for each channel, because the mv_xor channels inside the same XOR
engine have different capabilities, and the DMA engine infrastructure
only allows to express capabilities at the DMA engine device level.
The mv_xor driver has therefore been registering one DMA engine device
and one DMA engine channel for each XOR channel since its introduction
in the kernel. However, it kept two separate internal structures,
mv_xor_device and mv_xor_channel, which didn't make a lot of sense
since there was a 1:1 mapping between those structures.
This patch gets rid of this duplication, and merges everything into
the mv_xor_chan structure.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 14:09:42 +0000 (15:09 +0100)]
dma: mv_xor: use mv_xor_chan pointers as arguments to self-test functions
In preparation for the removal of the mv_xor_device structure, we
directly pass mv_xor_chan pointers to the self-test functions included
in the driver. These functions were anyway selecting the first (and
only channel) available in each DMA device, so the behaviour is
unchanged.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 14:00:25 +0000 (15:00 +0100)]
dma: mv_xor: in mv_xor_device, rename 'common' to 'dmadev'
The mv_xor_device structure embeds a 'struct dma_device', which is
named 'common', a not very meaningful name. Rename it to 'dmadev',
which will help avoid confusions later as we merge the mv_xor_device
and mv_xor_chan structures together.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 13:57:44 +0000 (14:57 +0100)]
dma: mv_xor: in mv_xor_chan, rename 'common' to 'dmachan'
The mv_xor_chan structure embeds a 'struct dma_chan', which is named
'common', a not very meaningful name. Rename it to 'dmachan', which
will help avoid confusions later as we merge the mv_xor_device and
mv_xor_chan structures together.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Thu, 15 Nov 2012 13:17:18 +0000 (14:17 +0100)]
dma: mv_xor: introduce a mv_chan_to_devp() helper
In many place, we need to get the 'struct device *' pointer from a
'struct mv_chan *', so we add a helper that makes this a bit
easier. It will also help reducing the change noise in further
patches.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
In mv_xor_memcpy_self_test() and mv_xor_xor_self_test(), all DMA
functions are called by passing dma_chan->device->dev as the 'device
*', except the calls to dma_sync_single_for_cpu() which uselessly goes
through mv_chan->device->pdev->dev.
Simplify this by using dma_chan->device->dev direclty in
dma_sync_single_for_cpu() calls.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Tue, 30 Oct 2012 10:59:42 +0000 (11:59 +0100)]
dma: mv_xor: change the driver name to 'mv_xor'
Since we got rid of the per-XOR channel 'mv_xor' driver, now the
per-XOR engine driver that used to be called 'mv_xor_shared' can
simply be named 'mv_xor'.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Tue, 30 Oct 2012 10:58:14 +0000 (11:58 +0100)]
dma: mv_xor: rename mv_xor_shared_platform_data to mv_xor_platform_data
'struct mv_xor_shared_platform_data' used to be the platform_data
structure for the 'mv_xor_shared', but this driver is going to be
renamed simply 'mv_xor', so also rename its platform_data structure
accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Tue, 30 Oct 2012 10:56:26 +0000 (11:56 +0100)]
dma: mv_xor: rename mv_xor_platform_data to mv_xor_channel_data
mv_xor_platform_data used to be the platform_data structure associated
to the 'mv_xor' driver. This driver no longer exists, and this data
structure really contains the properties of each XOR channel part of a
given XOR engine. Therefore 'struct mv_xor_channel_data' is a more
appropriate name.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Tue, 30 Oct 2012 10:54:34 +0000 (11:54 +0100)]
dma: mv_xor: remove sub-driver 'mv_xor'
Now that XOR channels are directly registered by the main
'mv_xor_shared' device ->probe() function and all users of the
'mv_xor' device have been removed, we can get rid of the latter.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Tue, 30 Oct 2012 10:11:36 +0000 (11:11 +0100)]
arm: plat-orion: convert the registration of the xor1 engine to the single driver
Instead of registering one 'mv_xor_shared' device for the XOR engine,
and then two 'mv_xor' devices for the XOR channels, pass the channels
properties as platform_data for the main 'mv_xor_shared' device.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 29 Oct 2012 16:45:21 +0000 (17:45 +0100)]
arm: plat-orion: convert the registration of the xor0 engine to the single driver
Instead of registering one 'mv_xor_shared' device for the XOR engine,
and then two 'mv_xor' devices for the XOR channels, pass the channels
properties as platform_data for the main 'mv_xor_shared' device.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 29 Oct 2012 15:54:49 +0000 (16:54 +0100)]
dma: mv_xor: allow channels to be registered directly from the main device
Extend the XOR engine driver (currently called "mv_xor_shared") so
that XOR channels can be passed in the platform_data structure, and be
registered from there.
This will allow the users of the driver to be converted to the single
platform_driver variant of the mv_xor driver.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 29 Oct 2012 15:45:46 +0000 (16:45 +0100)]
dma: mv_xor: split initialization/cleanup of XOR channels
Instead of doing the initialization/cleanup of the XOR channels
directly in the ->probe() and ->remove() hooks, we create separate
utility functions mv_xor_channel_add() and mv_xor_channel_remove().
This will allow to easily introduce in a future patch a different way
of registering XOR channels: instead of having one platform_device per
channel, we'll trigger the registration of all XOR channels of a given
XOR engine directly from the XOR engine ->probe() function.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Thomas Petazzoni [Mon, 29 Oct 2012 15:27:34 +0000 (16:27 +0100)]
dma: mv_xor: do not use pool_size from platform_data within the driver
The driver currently pokes into the platform_data structure during its
normal operation to get the pool_size value. Poking into the
platform_data structure is not nice when moving to the Device Tree, so
this commit adds a new pool_size field in the mv_xor_device structure,
which gets initialized at ->probe() time. The driver then uses this
field instead of the platform_data.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Gregory CLEMENT [Fri, 26 Oct 2012 12:30:46 +0000 (14:30 +0200)]
arm: mvebu: increase atomic coherent pool size for armada 370/XP
For Armada 370/XP we have the same problem that for the commit cb01b63, so we applied the same solution: "The default 256 KiB
coherent pool may be too small for some of the Kirkwood devices, so
increase it to make sure that devices will be able to allocate their
buffers with GFP_ATOMIC flag"
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Andrew Lunn [Sat, 17 Nov 2012 14:22:28 +0000 (15:22 +0100)]
ARM: Kirkwood: switch to DT clock providers
With true DT clock providers available switch Kirkwood clock setup in
DT- enabled boards. While AUXDATA can be removed completely from bus
probing, some devices still don't know about DT. Therefore, some clkdev
aliases are created until these devices also move to DT.
With true DT clock providers available switch Dove clock setup in DT-
enabled boards. While AUXDATA can be removed completely from bus probing,
some devices still don't know about DT at all. Therefore, some clock
aliases are created until the devices also move to DT.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
clk: mvebu: add clock gating control provider for DT
This driver allows to provide DT clocks for clock gates found on
Marvell Dove and Kirkwood SoCs. The clock gates are referenced by
the phandle index of the corresponding bit in the clock gating control
register to ease lookup in the datasheet.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
This driver allows to provide DT clocks for core clocks found on
Marvell Kirkwood, Dove & 370/XP SoCs. The core clock frequencies and
ratios are determined by decoding the Sample-At-Reset registers.
Although technically correct, using a divider of 0 will lead to
div_by_zero panic. Let's use a ratio of 0/1 instead to fail later
with a zero clock.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by Gregory CLEMENT <gregory.clement@free-electrons.com>
Linus Torvalds [Fri, 16 Nov 2012 23:26:38 +0000 (15:26 -0800)]
Merge branch 'akpm' (Fixes from Andrew)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (12 patches)
revert "mm: fix-up zone present pages"
tmpfs: change final i_blocks BUG to WARNING
tmpfs: fix shmem_getpage_gfp() VM_BUG_ON
mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address
mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"
rapidio: fix kernel-doc warnings
swapfile: fix name leak in swapoff
memcg: fix hotplugged memory zone oops
mips, arc: fix build failure
memcg: oom: fix totalpages calculation for memory.swappiness==0
mm: fix build warning for uninitialized value
mm: add anon_vma_lock to validate_mm()
Andrew Morton [Fri, 16 Nov 2012 22:15:06 +0000 (14:15 -0800)]
revert "mm: fix-up zone present pages"
Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages")
That patch tried to fix a issue when calculating zone->present_pages,
but it caused a regression on 32bit systems with HIGHMEM. With that
change, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator. Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.
Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.
Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 16 Nov 2012 22:15:04 +0000 (14:15 -0800)]
tmpfs: change final i_blocks BUG to WARNING
Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.
It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
and the lack of coherent locking between mapping's nrpages and shmem's
swapped count. There's a window in shmem_writepage(), between lowering
nrpages in shmem_delete_from_page_cache() and then raising swapped
count, when the freed count appears to be +1 when it should be 0, and
then the asymmetry stops it from being corrected with -1 before hitting
the BUG.
One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected. Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether,
but previous attempts at that failed.
So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.