Linus Walleij [Tue, 9 Jun 2009 07:11:42 +0000 (08:11 +0100)]
[ARM] 5546/1: ARM PL022 SSP/SPI driver v3
This adds a driver for the ARM PL022 PrimeCell SSP/SPI
driver found in the U300 platforms as well as in some
ARM reference hardware, and in a modified version on the
Nomadik board.
Reviewed-by: Alessandro Rubini <rubini-list@gnudd.com> Reviewed-by: Russell King <linux@arm.linux.org.uk> Reviewed-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch updates omap_4430sdp_defconfig to add SMP and LOCAL_TIMER
support for OMAP4430 SDP platform.
Additionally the defconfig is made in sync with 2.6.30-rc7
ARM: OMAP4: SMP: Add mpu timer support for OMAP4430
This patch adds SMP platform specific parts for local(mpu) timer support
for OMAP4430 platform. Each Cortex-a9 core has it's own local timer in the
MPU domain. These timers are not in wakeup domain.
This patch adds SMP platform files support for OMAP4430SDP. TI's OMAP4430
SOC is based on ARM Cortex-A9 SMP architecture. It's a dual core SOC
with GIC used for interrupt handling and SCU for cache coherency.
Nicolas Pitre [Wed, 3 Jun 2009 01:43:45 +0000 (21:43 -0400)]
[ARM] Kirkwood: create a mapping for the Security Accelerator SRAM
Always creating the physical mapping should do no harm, so let's remove
the interface that was provided for its optional creation and make the
mapping static.
The security accelerator which can act as a puppet player for the crypto
engine requires its commands in the sram. This patch adds support for the
phys mapping and creates a platform device for the actual driver.
[ nico: renamed device name from "mv,orion5x-crypto" to "mv_crypto"
so to match the module name and be more generic for Kirkwood use ]
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Nicolas Pitre <nico@marvell.com>
Thomas Reitmayr [Mon, 1 Jun 2009 11:38:33 +0000 (13:38 +0200)]
[ARM] orion5x: Change names of defines for Reset-Out-Mask register
The name of the define for the Reset-Out-Mask register as well as its
bit for the watchdog reset are changed to match the names used for
Kirkwood (which in turn match the processor specification more
closely). There is no functional change.
This patch prepares for adding orion5x_wdt as a platform device to
Kirkwood.
Signed-off-by: Thomas Reitmayr <treitmayr@devbase.at> Signed-off-by: Nicolas Pitre <nico@marvell.com>
Nicolas Pitre [Wed, 27 May 2009 02:06:25 +0000 (22:06 -0400)]
[ARM] Kirkwood: only map peripheral register space once
Just like commit 1419468ab548, let's save some TLB entries by making
ioremap() return pointers into the boot-time Kirkwood peripheral
iotable mapping whenever someone tries to ioremap any part of the Kirkwood
peripheral register space.
Nicolas Pitre [Fri, 15 May 2009 04:42:36 +0000 (00:42 -0400)]
[ARM] orion: make sure sched_clock() usage of cnt32_to_63() is safe
With a TCLK = 200MHz, the half period of the hardware timer is roughly
10 seconds. Because cnt32_to_63() must be called at least once per
half period of the base hardware counter, it is a bit risky to rely
solely on scheduling to generate frequent enough calls. Let's use a
kernel timer to ensure this.
Stefan Agner [Tue, 12 May 2009 17:30:41 +0000 (10:30 -0700)]
[ARM] orion: sched_clock implementation for orion platforms
sched_clock implementation for orion platform. Its realized using
free-running clocksource timer, which provides a resolution of 7.5ns
(depending on tclk). It's derived from PXA's sched_clock implementation.
[ nico: renamed orion2ns to tclk2ns, fixed max value in the comment ]
Signed-off-by: Stefan Agner <stefan.agner@yahoo.com> Signed-off-by: Nicolas Pitre <nico@marvell.com>
Rabeeh Khoury [Tue, 24 Mar 2009 14:10:15 +0000 (16:10 +0200)]
[ARM] Kirkwood: CPU idle driver
The patch adds support for Kirkwood cpu idle.
Two idle states are defined:
1. Wait-for-interrupt (replacing default kirkwood wfi)
2. Wait-for-interrupt and DDR self refresh
Signed-off-by: Rabeeh Khoury <rabeeh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com>
Martin Fuzzey [Sat, 6 Jun 2009 14:36:44 +0000 (16:36 +0200)]
MXC : update i.MX21 clock support for USB host.
* Use correct clkdev style usb clock name
* Implement rate setting for USB clock
* Introduce _clk_generic_round_rate to factorize the (now 3) uses of rounding code.
Signed-off-by: Martin Fuzzey <mfuzzey@gmail.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
[ARM] 5541/1: serial/amba-pl011.c: add support for the modified port found in Nomadik
The Nomadik 8815 SoC has a slightly modified version of the PL011 block.
The patch uses the different ID value as a key to select a vendor
structure that is used to keep track of the differences, as suggested
by Russell King.
Signed-off-by: Alessandro Rubini <rubini@unipv.it> Acked-by: Andrea Gallo <andrea.gallo@stericsson.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sascha Hauer wrote:
> On Tue, Jun 02, 2009 at 04:18:42PM -0400, Daniel Schaeffer wrote:
>> Add basic support for the Logic i.MX27LITE board.
>>
>> Signed-off-by: Daniel Schaeffer <daniel.schaeffer@timesys.com>
>
> Besides the comment made by Fabio this looks ok to me.
>
> Sascha
>
>
Fixed issues pointed out by Fabio and Magnus, and rebased to mxc-master head.
Signed-off-by: Daniel Schaeffer <daniel.schaeffer@timesys.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Daniel Mack [Sun, 31 May 2009 10:57:22 +0000 (12:57 +0200)]
Support for lilly-1131 modules and baseboards [v2]
On Thu, May 28, 2009 at 08:42:23PM +0200, Sascha Hauer wrote:
> > > Mail-Followup-To: Daniel Mack <daniel@caiaq.de>,
> > > linux-arm-kernel@lists.arm.linux.org.uk
> >
> > ... which causes my mutt to only reply to the list.
>
> Ah, ok. /me hacking in muttrc... Does it work now?
Yep :)
> > mxc_register_device(&mxc_uart_device0, &uart_pdata);
> > + mxc_register_device(&mxc_uart_device1, &uart_pdata);
> > + mxc_register_device(&mxc_uart_device2, &uart_pdata);
>
> What about the RXD3/TXD3 pins?
You're right - I got the IOMUX tables wrong and thought UART0 pins are
selected unconditionally. But as it turns out TXD1/RXD1 is for UART0
(mxc_uart_device0), TXD2/RXD2 for UART1 (mxc_uart_device1) etc.
Below is a new patch.
Thanks,
Daniel
From e7eb5fa0fed09d667a4b2f168fe466e2cc645abb Mon Sep 17 00:00:00 2001
From: Daniel Mack <daniel@caiaq.de>
Date: Wed, 27 May 2009 12:22:51 +0200
Subject: [PATCH] ARM: MX3: add two more UARTs to lilly-1131-db
Signed-off-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Alan Cox [Tue, 2 Jun 2009 15:58:10 +0000 (16:58 +0100)]
parport: quickfix the proc registration bug
Ideally we should have a directory of drivers and a link to the 'active'
driver. For now just show the first device which is effectively the existing
semantics without a warning.
This is an update on the original buggy patch that I then forgot to
resubmit. Confusingly it was proposed by Red Hat, written by Etched Pixels
fixed and submitted by Intel ...
Resolves-Bug: http://bugzilla.kernel.org/show_bug.cgi?id=9749 Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 2 Jun 2009 16:47:21 +0000 (09:47 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: prevent deadlock in xfs_qm_shake()
xfs: fix overflow in xfs_growfs_data_private
xfs: fix double unlock in xfs_swap_extents()
Minoru Usui [Tue, 2 Jun 2009 09:17:34 +0000 (02:17 -0700)]
net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.
When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained. After it's chained,
tc_ctl_tfilter() calls classifier's change(). When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.
In addition, cls_cgroup is initialized in change() not in init(). It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 2 Jun 2009 08:29:58 +0000 (01:29 -0700)]
e1000: add missing length check to e1000 receive routine
Patch to fix bad length checking in e1000. E1000 by default does two
things:
1) Spans rx descriptors for packets that don't fit into 1 skb on recieve
2) Strips the crc from a frame by subtracting 4 bytes from the length prior to
doing an skb_put
Since the e1000 driver isn't written to support receiving packets that span
multiple rx buffers, it checks the End of Packet bit of every frame, and
discards it if its not set. This places us in a situation where, if we have a
spanning packet, the first part is discarded, but the second part is not (since
it is the end of packet, and it passes the EOP bit test). If the second part of
the frame is small (4 bytes or less), we subtract 4 from it to remove its crc,
underflow the length, and wind up in skb_over_panic, when we try to skb_put a
huge number of bytes into the skb. This amounts to a remote DOS attack through
careful selection of frame size in relation to interface MTU. The fix for this
is already in the e1000e driver, as well as the e1000 sourceforge driver, but no
one ever pushed it to e1000. This is lifted straight from e1000e, and prevents
small frames from causing the underflow described above
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ed Swierk [Tue, 2 Jun 2009 07:19:52 +0000 (00:19 -0700)]
forcedeth: add phy_power_down parameter, leave phy powered up by default (v2)
Add a phy_power_down parameter to forcedeth: set to 1 to power down the
phy and disable the link when an interface goes down; set to 0 to always
leave the phy powered up.
The phy power state persists across reboots; Windows, some BIOSes, and
older versions of Linux don't bother to power up the phy again, forcing
users to remove all power to get the interface working (see
http://bugzilla.kernel.org/show_bug.cgi?id=13072). Leaving the phy
powered on is the safest default behavior. Users accustomed to seeing
the link state reflect the interface state and/or wanting to minimize
power consumption can set phy_power_down=1 if compatibility with other
OSes is not an issue.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Blyakher [Mon, 1 Jun 2009 18:13:24 +0000 (13:13 -0500)]
xfs: prevent deadlock in xfs_qm_shake()
It's possible to recurse into filesystem from the memory
allocation, which deadlocks in xfs_qm_shake(). Add check
for __GFP_FS, and bail out if it is not set.
Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Hedi Berriche <hedi@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Sandeen [Sat, 23 May 2009 19:30:12 +0000 (14:30 -0500)]
xfs: fix overflow in xfs_growfs_data_private
In the case where growing a filesystem would leave the last AG
too small, the fixup code has an overflow in the calculation
of the new size with one fewer ag, because "nagcount" is a 32
bit number. If the new filesystem has > 2^32 blocks in it
this causes a problem resulting in an EINVAL return from growfs:
Reported-by: richard.ems@cape-horn-eng.com Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
Felix Blyakher [Fri, 8 May 2009 00:49:45 +0000 (19:49 -0500)]
xfs: fix double unlock in xfs_swap_extents()
Regreesion from commit ef8f7fc, which rearranged the code in
xfs_swap_extents() leading to double unlock of xfs inode ilock.
That resulted in xfs_fsr deadlocking itself on platforms, which
don't handle double unlock of rw_semaphore nicely. It caused the
count go negative, which represents the write holder, without
really having one. ia64 is one of the platforms where deadlock
was easily reproduced and the fix was tested.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>
Add fan limit alarm 'max_alarm' to the alarm section.
Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
Several EISA device IDs for 3c509 family network cards are missing from
the driver, making the cards unusable in their EISA mode. Here's a fix to
add them based on the EISA configuration files distributed by 3Com and our
eisa.ids database.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vašut [Thu, 21 May 2009 12:11:05 +0000 (13:11 +0100)]
[ARM] 5522/1: PalmLD: IDE support
Support for Palm LifeDrive's internal harddrive.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Herbert Xu [Sun, 31 May 2009 13:09:22 +0000 (23:09 +1000)]
crypto: hash - Fix handling of sg entry that crosses page boundary
A quirk that we've always supported is having an sg entry that's
bigger than a page, or more generally an sg entry that crosses
page boundaries. Even though it would be better to explicitly have
to sg entries for this, we need to support it for the existing users,
in particular, IPsec.
The new ahash sg walking code did try to handle this, but there was
a bug where we didn't increment the page so kept on walking on the
first page over an dover again.
This patch fixes it.
Tested-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Sat, 30 May 2009 14:57:33 +0000 (07:57 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
fsldma: Fix compile warnings
fsldma: fix memory leak on error path in fsl_dma_prep_memcpy()
fsldma: snooping is not enabled for last entry in descriptor chain
fsldma: fix infinite loop on multi-descriptor DMA chain completion
fsldma: fix "DMA halt timeout!" errors
fsldma: fix check on potential fdev->chan[] overflow
fsldma: update mailling list address in MAINTAINERS
Yevgeny Petrilin [Mon, 25 May 2009 20:57:21 +0000 (20:57 +0000)]
mlx4_en: Fix a kernel panic when waking tx queue
When the transmit queue gets full we enable interrupts for TX completions
There was a race that we handled the TX queue both from the interrupt context
and from the transmit function. Using "spin_trylock_irq()" ensures this
doesn't happen.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Frans Pop [Fri, 22 May 2009 08:23:40 +0000 (10:23 +0200)]
ACPI processor: remove spurious newline from warning message
Commit 4973b22a ("ACPI processor: reset the throttling state once it's
invalid") introduced a new warning which prints a spurious newline.
The ACPI_WARNING macro that is used already takes care of adding a
newline, after adding ACPI_CA_VERSION to the message. Remove the newline
to avoid the message getting split into two lines.
Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: Len Brown <len.brown@intel.com>
Currently acpi_video_exit() is exported as well as using __exit which causes:
WARNING: drivers/acpi/video.o(__ksymtab+0x0): Section mismatch in reference from the variable __ksymtab_acpi_video_exit to the function .exit.text:acpi_video_exit()
The symbol acpi_video_exit is exported and annotated __exit
Fix this by removing the __exit annotation of acpi_video_exit or drop the export.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Tue, 26 May 2009 19:11:06 +0000 (15:11 -0400)]
ACPI: sanity check _PSS frequency to prevent cpufreq crash
When BIOS SETUP is changed to disable EIST, some BIOS
hand the OS an un-initialized _PSS:
Name (_PSS, Package (0x06)
{
Package (0x06)
{
0x80000000, // frequency [MHz]
0x80000000, // power [mW]
0x80000000, // latency [us]
0x80000000, // BM latency [us]
0x80000000, // control
0x80000000 // status
},
...
These are outrageous values for frequency,
power and latency, raising the question where to draw
the line between legal and illegal. We tend to survive
garbage in the power and latency fields, but we can BUG_ON
when garbage is in the frequency field.
Cpufreq multiplies the frequency by 1000 and stores it in a u32 KHz.
So disregard a _PSS with a frequency so large
that it can't be represented by cpufreq.
Linus Torvalds [Fri, 29 May 2009 23:07:39 +0000 (16:07 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] update mach-types
[ARM] Add cmpxchg support for ARMv6+ systems (v5)
[ARM] barriers: improve xchg, bitops and atomic SMP barriers
Gemini: Fix SRAM/ROM location after memory swap
MAINTAINER: Add F: entries for Gemini and FA526
[ARM] disable NX support for OABI-supporting kernels
[ARM] add coherent DMA mask for mv643xx_eth
[ARM] pxa/palm: fix PalmLD/T5/TX AC97 MFP
[ARM] pxa: add parameter to clksrc_read() for pxa168/910
[ARM] pxa: fix the incorrectly defined drive strength macros for pxa{168,910}
[ARM] Orion: Remove explicit name for platform device resources
[ARM] Kirkwood: Correct MPP for SATA activity/presence LEDs of QNAP TS-119/TS-219.
[ARM] pxa/ezx: fix pin configuration for low power mode
[ARM] pxa/spitz: provide spitz_ohci_exit() that unregisters USB_HOST GPIO
[ARM] pxa: enable GPIO receivers after configuring pins
[ARM] pxa: allow gpio_reset drive high during normal work
[ARM] pxa: save/restore PGSR on suspend/resume.
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
Revert "USB: Correct Makefile to make isp1760 buildable"
usb-serial: fix crash when sub-driver updates firmware
USB: isp1760: urb_dequeue doesn't always find the urbs
USB: Yet another Conexant Clone to add to cdc-acm.c
USB: atmel_usb_udc: Use kzalloc() to allocate ep structures
USB: atmel-usba-udc : fix control out requests.
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Driver Core: do not oops when driver_unregister() is called for unregistered drivers
sysfs: file.c: use create_singlethread_workqueue()
Linus Torvalds [Fri, 29 May 2009 15:49:09 +0000 (08:49 -0700)]
Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
svcrdma: dma unmap the correct length for the RPCRDMA header page.
nfsd: Revert "svcrpc: take advantage of tcp autotuning"
nfsd: fix hung up of nfs client while sync write data to nfs server
Linus Torvalds [Fri, 29 May 2009 15:48:25 +0000 (08:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: libps2 - better handle bad scheduler decisions
Input: usb1400_ts - fix access to "device data" in resume function
Input: multitouch - augment event semantics documentation
Input: multitouch - add tracking ID to the protocol
Linus Torvalds [Fri, 29 May 2009 15:48:13 +0000 (08:48 -0700)]
Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
i915: Set object to gtt domain when faulting it back in
drm/i915: Apply a big hammer to 865 GEM object CPU cache flushing.
drm/i915: Fix tiling pitch handling on 8xx.
Linus Torvalds [Fri, 29 May 2009 15:47:53 +0000 (08:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Compaq Presario CQ60 patching for Conexant
sound: usb-audio: make the MotU Fastlane work again
ALSA: Enable PCM hw_ptr_jiffies check only in xrun_debug mode
ALSA: Fix invalid jiffies check after pause
Alan Cox [Thu, 28 May 2009 13:01:35 +0000 (14:01 +0100)]
8250: Fix oops from setserial
If you setserial a port which has never been initialised we change the type
but don't update the I/O method pointers. The same problem is true if you
change the io type of a port - but nobody ever does that so nobody noticed!
Remember the old type and when attaching if the type has changed reload the
port accessor pointers. We can't do it blindly as some 8250 drivers load custom
accessors and we must not stomp those.
Tested-by: Victor Seryodkin <vvscore@gmail.com>
Closes-bug: #13367 Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harry Ciao [Thu, 28 May 2009 21:34:43 +0000 (14:34 -0700)]
edac: AMD8111 & AMD8131 Kconfig fixup
The amd8111_edac.c driver will fail allmodconfig on architectures other
than PPC, introduce Kconfig dependency to avoid this, since both AMD8111
and AMD8131 chips are only adopted on Maple so far.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hugetlbfs reserves huge pages but does not fault them at mmap() time to
ensure that future faults succeed. The reservation behaviour differs
depending on whether the mapping was mapped MAP_SHARED or MAP_PRIVATE.
For MAP_SHARED mappings, hugepages are reserved when mmap() is first
called and are tracked based on information associated with the inode.
Other processes mapping MAP_SHARED use the same reservation. MAP_PRIVATE
track the reservations based on the VMA created as part of the mmap()
operation. Each process mapping MAP_PRIVATE must make its own
reservation.
hugetlbfs currently checks if a VMA is MAP_SHARED with the VM_SHARED flag
and not VM_MAYSHARE. For file-backed mappings, such as hugetlbfs,
VM_SHARED is set only if the mapping is MAP_SHARED and the file was opened
read-write. If a shared memory mapping was mapped shared-read-write for
populating of data and mapped shared-read-only by other processes, then
hugetlbfs would account for the mapping as if it was MAP_PRIVATE. This
causes processes to fail to map the file MAP_SHARED even though it should
succeed as the reservation is there.
This patch alters mm/hugetlb.c and replaces VM_SHARED with VM_MAYSHARE
when the intent of the code was to check whether the VMA was mapped
MAP_SHARED or MAP_PRIVATE.
Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <starlight@binnacle.cx> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On x86 and x86-64, it is possible that page tables are shared beween
shared mappings backed by hugetlbfs. As part of this,
page_table_shareable() checks a pair of vma->vm_flags and they must match
if they are to be shared. All VMA flags are taken into account, including
VM_LOCKED.
The problem is that VM_LOCKED is cleared on fork(). When a process with a
shared memory segment forks() to exec() a helper, there will be shared
VMAs with different flags. The impact is that the shared segment is
sometimes considered shareable and other times not, depending on what
process is checking.
What happens is that the segment page tables are being shared but the
count is inaccurate depending on the ordering of events. As the page
tables are freed with put_page(), bad pmd's are found when some of the
children exit. The hugepage counters also get corrupted and the Total and
Free count will no longer match even when all the hugepage-backed regions
are freed. This requires a reboot of the machine to "fix".
This patch addresses the problem by comparing all flags except VM_LOCKED
when deciding if pagetables should be shared or not for hugetlbfs-backed
mapping.
Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <starlight@binnacle.cx> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>