git.karo-electronics.de Git - linux-beck.git/log

ARM: mx51: Add audmux support

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx51: add imx-ssi devices

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx51: dynamically register imx-uart devices

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5: Add Nand clock support

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/iomux-mx51: Fix input path of some pins in gpio mode

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/iomux-mx51: Add aud3 primary function defines

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/iomux-mx51: Add SPI controller pads

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/mx51_3ds: add SPI NOR flash in the board init stage

A 2M bytes SPI NOR flash(sst25vf016b) is soldered on the mx51_3ds
board. So add the corresponding device for it.

Signed-off-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/mx51_3ds: add eCSPI2 support on the imx51_3ds board

Signed-off-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/iomux-mx51: add iomux definitions for eCSPI2 on the imx51_3ds board

On the imx51_3ds board, eCSPI2 is connected to a SPI NOR flash,
now add iomux definitions for those used pins.

Signed-off-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/clock-mx51: add spi clocks

Signed-off-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/clock-mx51: new macro that defines a clk with all members

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5/clock-mx51: refactor ccgr callbacks to use common code

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5: add spi_imx device registration

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: imx: use platform ids for spi_imx devices

The driver recently learned to handle platform ids. Make use of this
new feature. The up side is that the driver needs less knowledge about
the spi interfaces used on different SoCs.

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

spi/imx: add support for imx51's eCSPI and CSPI

i.MX51 comes with two eCSPI interfaces (that are quite different from
what was known before---the tried and tested Freescale way) and a CSPI
interface that is identical to the devices found on i.MX25 and i.MX35.

This patch is a merge of two very similar patches (by Jason Wang and Sascha
Hauer resp.) plus a (now hopefully correct) reimplementation of the
clock calculation.

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

spi/imx: save the spi chip select in config struct, not the gpio to use

This prepares adding support for imx51's eCSPI. This IP has seperate
control and config bits for all four supported chip selects, so the
config routine needs to know which chip select is being used even if
the chipselect is realized by a gpio.

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

spi/imx: get rid of more ifs depending on the used cpu

Nearly everything that is needed is provided by the version of the SPI IP.
Now the only checks left using cpu_is_... are clk divider tuning on mx21/mx27
and autodetection (which will die soon).

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

spi/imx: convert driver to use platform ids

This has the advantage not to need to much cpu_is_... macros.  Still more
when imx51 support is added which has two different spi interfaces which
would introduce additional checks on the device id.

With this setup it's not possible for the compiler anymore to detect the
unused functions, so four additional kconfig symbols are introduced to
ifdef out the unneeded functions in the callback array and all these
functions are marked with __maybe_unused to suppress the corresponding
gcc warnings.

Comparing the driver footprint with and without the patch for a mx27
kernel yields:

add/remove: 2/0 grow/shrink: 2/0 up/down: 280/0 (280)
function                                     old     new   delta
spi_imx_devtype                                -     192    +192
spi_imx_probe                                980    1032     +52
spi_imx_devtype_data                           -      32     +32
spi_imx_setupxfer                            276     280      +4

Later when the platform code is updated to use the platform ids, the
autodetection can be removed which will make the driver a bit smaller
again.  (~60 Bytes in my test.)

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

spi/imx: default to m on platforms that have such devices

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: imx: fix name of macros to add imx-i2c devices

This is a follow up to

c698715 (ARM: imx: dynamically register imx-i2c devices (imx27))
2b92084 (ARM: imx: dynamically register imx-i2c devices (imx21))
6348e6b (ARM: imx: dynamically register imx-i2c devices (imx1))

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: mx5: dynamically register imx-i2c devices

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

ARM: imx: reorganize imx-i2c device registration to use a struct per SoC

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: imx: dynamically allocate imx-ssi devices

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: imx: change the way imx-uarts are registered

For mx1_defconfig this yields:

add/remove: 1/0 grow/shrink: 1/4 up/down: 49/-108 (-59)
function                                     old     new   delta
imx1_imx_uart_data                             -      48     +48
kernel_config_data                          7277    7278      +1
imx_add_imx_uart_1irq                        132     128      -4
imx_add_imx_uart_3irq                        164     156      -8
scb9328_init                                  96      64     -32
mx1ads_init                                  220     156     -64

for mx21_defconfig this yields:

add/remove: 1/0 grow/shrink: 0/3 up/down: 64/-52 (12)
function                                     old     new   delta
imx21_imx_uart_data                            -      64     +64
imx_add_imx_uart_3irq                        160     156      -4
imx_add_imx_uart_1irq                        140     136      -4
mx21ads_board_init                           220     176     -44

for a random mx25 config this yields:

add/remove: 1/0 grow/shrink: 0/5 up/down: 80/-56 (24)
function                                     old     new   delta
imx25_imx_uart_data                            -      80     +80
imx_add_imx_uart_3irq                        160     156      -4
imx_add_imx_uart_1irq                        140     136      -4
mx25pdk_init                                 288     272     -16
eukrea_mbimxsd_baseboard_init                272     256     -16
eukrea_cpuimx25_init                         252     236     -16

for mx27_defconfig this yields:

add/remove: 1/0 grow/shrink: 0/10 up/down: 96/-280 (-184)
function                                     old     new   delta
imx27_imx_uart_data                            -      96     +96
imx_add_imx_uart_3irq                        160     156      -4
imx_add_imx_uart_1irq                        140     136      -4
pca100_init                                  560     544     -16
mx27pdk_init                                 112      96     -16
mx27lite_init                                 92      76     -16
eukrea_cpuimx27_init                         332     316     -16
pcm038_init                                  388     348     -40
mxt_td60_board_init                          320     280     -40
eukrea_mbimx27_baseboard_init                476     436     -40
mx27ads_board_init                           368     280     -88

and finally for mx3_defconfig:

add/remove: 2/0 grow/shrink: 0/9 up/down: 128/-344 (-216)
function                                     old     new   delta
imx31_imx_uart_data                            -      80     +80
imx35_imx_uart_data                            -      48     +48
imx_add_imx_uart_1irq                        132     128      -4
imx_add_imx_uart_3irq                        164     152     -12
mx31moboard_devboard_init                    360     344     -16
mx31lite_db_init                             176     160     -16
mx31moboard_smartbot_init                    384     360     -24
kzm_board_init                               232     208     -24
armadillo5x0_init                            392     364     -28
mx31lilly_db_init                            248     208     -40
mxc_board_init                              3760    3580    -180

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: imx: change the way spi-imx devices are registered

Group soc specific data in a global struct instead of repeating it for
each call to imxXX_add_spi_imxX. The structs holding the actual data
are placed in .init.constdata and so don't do much harm. Compared to
the previous approach this reduces code size to call imx_add_spi_imx.

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx51: use IMX_IO_ADDRESS to define MX51_IO_ADDRESS

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx51: fix naming of spi related defines

The names used now match the processor's reference manual. Also remove
MXC from the interrupt defines to match the other imx platforms.

Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx51: clean up mx51 header

This makes the header more look like the other ones, i.e.

- sort #defines by value
- use lowercase hex constants
- use a consistently named header guard

Acked-by: Jason Wang <jason77.wang@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx51_defconfig: add new boards MACH_MX51_3DS and MACH_EUKREA_CPUIMX51

Further remove FIXED_PHY as it breaks the ethernet.
VGA_CONSOLE isn't selectable anymore since fb78b51cb11e.
EXT3_DEFAULTS_TO_ORDERED defaults to y since aa32a796389b.
INOTIFY is gone since 2dfc1cae4c42.
CONFIG_DETECT_SOFTLOCKUP is gone since e16bb1d7fe07.
Enable TMPFS for udev.
KEYS is selected by NFS_USE_KERNEL_DNS

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: remove mx31pdk_defconfig

This machine is enabled in mx3_defconfig and so mx31pdk_defconfig isn't
really useful.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx3_defconfig: add new machine MACH_EUKREA_CPUIMX35

Furthermore INOTIFY is gone since

2dfc1ca (inotify: remove inotify in kernel interface)

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx27_defconfig: enable switches used in mx27 code

- enable all mx27 machines (MACH_CPUIMX27, MACH_IMX27_VISSTRIM_M10,
  MACH_PCA100, MACH_MXT_TD60) including optional features for
  CPUIMX27
- eukrea_mbimx27-baseboard.c uses TOUCHSCREEN_ADS7846
- mach-cpuimx27.c uses SERIAL_8250
- several machines make use of SPI_IMX (selects SPI_BITBANG)
- drop VGA_CONSOLE as this isn't selectable anymore since fb78b51cb11e
- several machines make use of USB_ULPI (depends on USB, but don't
  enable USB_DEVICE_CLASS as it's deprecated)

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx5/mx51_babbage: don't use PHYS_OFFSET

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx5/mx51_babbage: fix compiler warnings

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: imx: ehci: use void __iomem * to hold i/o addresses

This fixes:

arch/arm/plat-mxc/ehci.c: In function 'mxc_initialize_usb_hw':
arch/arm/plat-mxc/ehci.c:260: warning: assignment makes integer from pointer without a cast
arch/arm/plat-mxc/ehci.c:263: warning: assignment makes integer from pointer without a cast
arch/arm/plat-mxc/ehci.c:270: warning: assignment makes integer from pointer without a cast

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: imx: remove #ifdefery for unmerged flexcan driver

The flexcan driver was merged as e955cead.

Cc: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx5/mx51_babbage: Add FEC support

Tested it by booting a rootfs via NFS.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx3/imx35: Add EPIT support

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: imx: Add EPIT support

The Enhanced Periodic Interrupt Timer (EPIT) is found on newer
i.MX SoCs and can be used as an alternative system timer.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx3/mx35_3ds: add usb host2 support

we still have to toggle two pins on the mc9sdz60:

/* MUX3_CTR to be low for USB Host2 DP&DM */
pmic_gpio_set_bit_val(MCU_GPIO_REG_GPIO_CONTROL_2, 6, 0);

/* CAN_PWDN to be high for USB Host2 Power&OC */
pmic_gpio_set_bit_val(MCU_GPIO_REG_GPIO_CONTROL_2, 1, 1);

until we've a proper driver for the mx9sdz60 in linux we'll do this in
barebox (a.k.a. u-boot-v2)

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx3/mx35_3ds: rename usb otg platform data variable name

Rename the variable holding the usb otg platform data to avoid clash
with usb host platform data variable.

usb_pdata -> usb_otg_pdata

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx3/mx35_3ds: add NAND flash

The mx35_3ds comes with 2 GiByte NAND flash. This adds the
corresponding platform device.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: mx3/mx35_3ds: add physmap-flash NOR at CS0

The mx35_3ds comes with 64 MiByte for NOR flash at CS0, add physmap-flash
platform device for it.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ARM: imx: Add support for Vista Silicon Visstrim_m10 board

Vista Silicon Visstrim_m10 i.MX27 based board is used
as multimedia streaming server, access control and other
custom applications.

Signed-off-by: Javier Martin <javier.martin@vista-silicon.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

Linux 2.6.36-rc4

docbook: skip files with no docs since they generate scary warnings

Fix docbook templates that reference files that do not contain the
expected kernel-doc notation.

Fixes these warnings:

  Warning(arch/x86/include/asm/unaligned.h): no structured comments found
  Warning(lib/vsprintf.c): no structured comments found

These cause errors in the generated html output, like below, so drop
these lines.

  Name
  arch/x86/include/asm/unaligned.h - Document generation inconsistency
  Oops
  Warning
  The template for this document tried to insert the structured comment from the file arch/x86/include/asm/unaligned.h at this point, but none was found. This dummy section is inserted to allow generation to continue.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

docbook: warn on unused doc entries

When you don't use !E or !I but only !F, then it's very easy to miss
including some functions, structs etc.  in documentation.  To help
finding which ones were missed, allow printing out the unused ones as
warnings.

For example, using this on mac80211 yields a lot of warnings like this:

  Warning: didn't use docs for DOC: mac80211 workqueue
  Warning: didn't use docs for ieee80211_max_queues
  Warning: didn't use docs for ieee80211_bss_change
  Warning: didn't use docs for ieee80211_bss_conf

when generating the documentation for it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel-doc: ignore case when stripping attributes

There are valid attributes that could have upper case letters, but we
still want to remove, like for example
__attribute__((aligned(NETDEV_ALIGN)))
as encountered in the wireless code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / Hibernate: Avoid hitting OOM during preallocation of memory
  PM QoS: Correct pr_debug() misuse and improve parameter checks
  PM: Prevent waiting forever on asynchronous resume after failing suspend

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] fix use-after-free in scsi_init_io()
  [SCSI] sd: fix medium-removal bug
  [SCSI] qla2xxx: Update version number to 8.03.04-k0.
  [SCSI] qla2xxx: Check for empty slot in request queue before posting Command type 6 request.
  [SCSI] qla2xxx: Cover UNDERRUN case where SCSI status is set.
  [SCSI] qla2xxx: Correctly set fw hung and complete only waiting mbx.
  [SCSI] qla2xxx: Reset seconds_since_last_heartbeat correctly.
  [SCSI] qla2xxx: make rport deletions explicit during vport removal
  [SCSI] qla2xxx: Fix vport delete issues
  [SCSI] sd, sym53c8xx: Remove warnings after vsprintf %pV introducation.
  [SCSI] Fix warning: zero-length gnu_printf format string
  [SCSI] hpsa: disable doorbell reset on reset_devices
  [SCSI] be2iscsi: Fix for Login failure
  [SCSI] fix bio.bi_rw handling

PM / Hibernate: Avoid hitting OOM during preallocation of memory

There is a problem in hibernate_preallocate_memory() that it calls
preallocate_image_memory() with an argument that may be greater than
the total number of available non-highmem memory pages. If that's
the case, the OOM condition is guaranteed to trigger, which in turn
can cause significant slowdown to occur during hibernation.

To avoid that, make preallocate_image_memory() adjust its argument
before calling preallocate_image_pages(), so that the total number of
saveable non-highem pages left is not less than the minimum size of
a hibernation image. Change hibernate_preallocate_memory() to try to
allocate from highmem if the number of pages allocated by
preallocate_image_memory() is too low.

Modify free_unnecessary_pages() to take all possible memory
allocation patterns into account.

Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: M. Vefa Bicakci <bicave@superonline.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (28 commits)
  ipheth: remove incorrect devtype to WWAN
  MAINTAINERS: Add CAIF
  sctp: fix test for end of loop
  KS8851: Correct RX packet allocation
  udp: add rehash on connect()
  net: blackhole route should always be recalculated
  ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
  niu: Fix kernel buffer overflow for ETHTOOL_GRXCLSRLALL
  ipvs: fix active FTP
  gro: Re-fix different skb headrooms
  via-velocity: Turn scatter-gather support back off.
  ipv4: Fix reverse path filtering with multipath routing.
  UNIX: Do not loop forever at unix_autobind().
  PATCH: b44 Handle RX FIFO overflow better (simplified)
  irda: off by one
  3c59x: Fix deadlock in vortex_error()
  netfilter: discard overlapping IPv6 fragment
  ipv6: discard overlapping fragment
  net: fix tx queue selection for bridged devices implementing select_queue
  bonding: Fix jiffies overflow problems (again)
  ...

Fix up trivial conflicts due to the same cgroup API thinko fix going
through both Andrew and the networking tree.  However, there were small
differences between the two, with Andrew's version generally being the
nicer one, and the one I merged first. So pick that one.

Conflicts in: include/linux/cgroup.h and kernel/cgroup.c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Kill all BKL usage.

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, tsc: Fix a preemption leak in restore_sched_clock_state()
sched: Move sched_avg_update() to update_cpu_load()

x86, tsc: Fix a preemption leak in restore_sched_clock_state()

Doh, a real life genuine preemption leak..

This caused a suspend failure.

Reported-bisected-and-tested-by-the-invaluable: Jeff Chua <jeff.chua.linux@gmail.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nico Schottelius <nico-linux-20100709@schottelius.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Pritz <flo@xssn.at>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: <stable@kernel.org> # Greg, please apply after: cd7240c ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from")
sleep states
LKML-Reference: <1284150773.402.122.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel

* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
  drm/i915: don't enable self-refresh on Ironlake
  drm/i915: Double check that the wait_request is not pending before warning
  Revert "drm/i915: Warn if we run out of FIFO space for a mode"
  Revert "drm/i915: Allow LVDS on pipe A on gen4+"
  Revert "drm/i915: Enable RC6 on Ironlake."

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log IO completion workqueue is a high priority queue
xfs: prevent reading uninitialized stack memory

x86, tsc: Fix a preemption leak in restore_sched_clock_state()

A real life genuine preemption leak..

Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

PM QoS: Correct pr_debug() misuse and improve parameter checks

Correct some pr_debug() misuse and add a stronger parameter check to
pm_qos_write() for the ASCII hex value case. Thanks to Dan Carpenter
for pointing out the problem!

Signed-off-by: mark gross <markgross@thegnar.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

xfs: log IO completion workqueue is a high priority queue

The workqueue implementation in 2.6.36-rcX has changed, resulting
in the workqueues no longer having dedicated threads for work
processing. This has caused severe livelocks under heavy parallel
create workloads because the log IO completions have been getting
held up behind metadata IO completions. Hence log commits would
stall, memory allocation would stall because pages could not be
cleaned, and lock contention on the AIL during inode IO completion
processing was being seen to slow everything down even further.

By making the log Io completion workqueue a high priority workqueue,
they are queued ahead of all data/metadata IO completions and
processed before the data/metadata completions. Hence the log never
gets stalled, and operations needed to clean memory can continue as
quickly as possible. This avoids the livelock conditions and allos
the system to keep running under heavy load as per normal.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

execve: make responsive to SIGKILL with large arguments

An execve with a very large total of argument/environment strings
can take a really long time in the execve system call.  It runs
uninterruptibly to count and copy all the strings.  This change
makes it abort the exec quickly if sent a SIGKILL.

Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending().  It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.

Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

execve: improve interactivity with large arguments

This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings(). There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.

When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.

Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

setup_arg_pages: diagnose excessive argument size

The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
check the size of the argument/environment area on the stack.
When it is unworkably large, shift_arg_pages() hits its BUG_ON.
This is exploitable with a very large RLIMIT_STACK limit, to
create a crash pretty easily.

Check that the initial stack is not too large to make it possible
to map in any executable.  We're not checking that the actual
executable (or intepreter, for binfmt_elf) will fit.  So those
mappings might clobber part of the initial stack mapping.  But
that is just userland lossage that userland made happen, not a
kernel problem.

Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Perform hardware_enable in CPU_STARTING callback
  KVM: i8259: fix migration
  KVM: fix i8259 oops when no vcpus are online
  KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
  perf symbols: Fix multiple initialization of symbol system
  perf: Fix CPU hotplug
  perf, trace: Fix module leak
  tracing/kprobe: Fix handling of C-unlike argument names
  tracing/kprobes: Fix handling of argument names
  perf probe: Fix handling of arguments names
  perf probe: Fix return probe support
  tracing/kprobe: Fix a memory leak in error case
  tracing: Do not allow llseek to set_ftrace_filter

KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyring

Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
of the parent process's session keyring whether or not the parent has a session
keyring [CVE-2010-2960].

This results in the following oops:

  BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
  IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443
  ...
  Call Trace:
   [<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443
   [<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0
   [<ffffffff811af98c>] sys_keyctl+0xb4/0xb8
   [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b

if the parent process has no session keyring.

If the system is using pam_keyinit then it mostly protected against this as all
processes derived from a login will have inherited the session keyring created
by pam_keyinit during the log in procedure.

To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.

Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KEYS: Fix RCU no-lock warning in keyctl_session_to_parent()

There's an protected access to the parent process's credentials in the middle
of keyctl_session_to_parent().  This results in the following RCU warning:

  ===================================================
  [ INFO: suspicious rcu_dereference_check() usage. ]
  ---------------------------------------------------
  security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 0
  1 lock held by keyctl-session-/2137:
   #0:  (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236

  stack backtrace:
  Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
  Call Trace:
   [<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3
   [<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236
   [<ffffffff811af77e>] sys_keyctl+0xb4/0xb6
   [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b

The code should take the RCU read lock to make sure the parents credentials
don't go away, even though it's holding a spinlock and has IRQ disabled.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: Range check cpu in blk_cpu_to_group
  scatterlist: prevent invalid free when alloc fails
  writeback: Fix lost wake-up shutting down writeback thread
  writeback: do not lose wakeup events when forking bdi threads
  cciss: fix reporting of max queue depth since init
  block: switch s390 tape_block and mg_disk to elevator_change()
  block: add function call to switch the IO scheduler from a driver
  fs/bio-integrity.c: return -ENOMEM on kmalloc failure
  bio-integrity.c: remove dependency on __GFP_NOFAIL
  BLOCK: fix bio.bi_rw handling
  block: put dev->kobj in blk_register_queue fail path
  cciss: handle allocation failure
  cfq-iosched: Documentation help for new tunables
  cfq-iosched: blktrace print per slice sector stats
  cfq-iosched: Implement tunable group_idle
  cfq-iosched: Do group share accounting in IOPS when slice_idle=0
  cfq-iosched: Do not idle if slice_idle=0
  cciss: disable doorbell reset on reset_devices
  blkio: Fix return code for mkdir calls

Merge branch 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91

* 'at91-fixes-for-linus' of git://github.com/at91linux/linux-2.6-at91:
  AT91: at91sam9261ek: remove C99 comments but keep information
  AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC
  AT91: dm9000 initialization update
  AT91: SAM9G45 - add a separate clock entry for every single TC block
  AT91: clock: peripheral clocks can have other parent than mck
  AT91: change dma resource index

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: rawmidi: fix the get next midi device ioctl
  ALSA: hda - Fix wrong HP pin detection in snd_hda_parse_pin_def_config()
  ALSA: seq/oss - Fix double-free at error path of snd_seq_oss_open()
  ALSA: msnd-classic: Fix invalid cfg parameter
  ALSA: hda - Enable PC-beep for EeePC with ALC269 codec
  ALSA: hda - Add errata initverb sequence for CS42xx codecs
  ALSA: usb - Release capture substream URBs properly
  ALSA: virtuoso: fix setting of Xonar DS line-in/mic-in controls
  ALSA: virtuoso: work around missing reset in the Xonar DS Windows driver
  ALSA: hda - Add quirk for Lenovo T400s
  ALSA: usb-audio: fix detection of vendor-specific device protocol settings
  ALSA: usb-audio: Assume first control interface is for audio
  ALSA: hda - Add a new hp-laptop model for Conexant 5066, tested on HP G60

drm/i915: don't enable self-refresh on Ironlake

We don't know how to enable it safely, especially as outputs turn on and
off. When disabling LP1 we also need to make sure LP2 and 3 are already
disabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29173
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29082
Reported-by: Chris Lord <chris@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

xfs: prevent reading uninitialized stack memory

The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
bytes of uninitialized stack memory, because the fsxattr struct
declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
the 12-byte fsx_pad member before copying it back to the user. This
patch takes care of it.

Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

AT91: at91sam9261ek: remove C99 comments but keep information

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

AT91: at91sam9261ek board: remove warnings related to use of SPI or SD/MMC

The sd/mmc data structure is not used if SPI is selected. The configuration
of PIO on the board prevent from using both interfaces at the same time
(board dependent).
Remove the warnings at compilation time adding a preprocessor condition.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

AT91: dm9000 initialization update

Add information in dm9000 mac/phy chip initialization:
- irq resource details
- platform data details

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>

block: Range check cpu in blk_cpu_to_group

While testing CPU DLPAR, the following problem was discovered.
We were DLPAR removing the first CPU, which in this case was
logical CPUs 0-3. CPUs 0-2 were already marked offline and
we were in the process of offlining CPU 3. After marking
the CPU inactive and offline in cpu_disable, but before the
cpu was completely idle (cpu_die), we ended up in __make_request
on CPU 3. There we looked at the topology map to see which CPU
to complete the I/O on and found no CPUs in the cpu_sibling_map.
This resulted in the block layer setting the completion cpu
to be NR_CPUS, which then caused an oops when we tried to
complete the I/O.

Fix this by sanity checking the value we return from blk_cpu_to_group
to be a valid cpu value.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

Merge branch 'fix/hda' into for-linus

Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent

Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent

Merge branch 'vhost-net' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

ipheth: remove incorrect devtype to WWAN

The 'wwan' devtype is meant for devices that require preconfiguration
and *every* time setup before the ethernet interface can be used, like
cellular modems which require a series of setup commands on serial ports
or other mechanisms before the ethernet interface will handle packets.

As ipheth only requires one-per-hotplug pairing setup with no
preconfiguration (like APN, phone #, etc) and the network interface is
usable at any time after that initial setup, remove the incorrect
devtype wwan.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: Add CAIF

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata-sff: Reenable Port Multiplier after libata-sff remodeling.
  libata: skip EH autopsy and recovery during suspend
  ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs
  ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs
  libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()
  ahci: fix hang on failed softreset
  pata_artop: Fix device ID parity check

tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread

Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
having properly started the iterator to iterate the hash.  This case is
degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
to misuse a pointer.  This causes a NULL ptr deref with possible security
implications.  Tracked as CVE-2010-3079.

Cc: Robert Swiecki <swiecki@google.com>
Cc: Eugene Teo <eugene@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

libata-sff: Reenable Port Multiplier after libata-sff remodeling.

Keep track of the link on the which the current request is in progress.
It allows support of links behind port multiplier.

Not all libata-sff is PMP compliant. Code for native BMDMA controller
does not take in accound PMP.

Tested on Marvell 7042 and Sil7526.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: skip EH autopsy and recovery during suspend

For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend. As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs

This patch adds the Intel Patsburg (PCH) SATA AHCI and RAID Controller
DeviceIDs.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs

This patch adds the Intel Patsburg (PCH) IDE mode SATA Controller DeviceIDs.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()

Commit 978c0666 (libata: Remove excess delay in the tf_load path)
removed ata_wait_idle() from ata_sff_tf_load() and via_tf_load().
This caused obscure detection problems in sata_sil.

https://bugzilla.kernel.org/show_bug.cgi?id=16606

The commit was pure performance optimization. Revert it for now.

Reported-by: Dieter Plaetinck <dieter@plaetinck.be>
Reported-by: Jan Beulich <JBeulich@novell.com>
Bisected-by: gianluca <gianluca@sottospazio.it>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

minix: fix regression in minix_mkdir()

Commit 9eed1fb721c ("minix: replace inode uid,gid,mode init with helper")
broke directory creation on minix filesystems.

Fix it by passing the needed mode flag to inode init helper.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org> [2.6.35.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page allocator: drain per-cpu lists after direct reclaim allocation fails

When under significant memory pressure, a process enters direct reclaim
and immediately afterwards tries to allocate a page.  If it fails and no
further progress is made, it's possible the system will go OOM.  However,
on systems with large amounts of memory, it's possible that a significant
number of pages are on per-cpu lists and inaccessible to the calling
process.  This leads to a process entering direct reclaim more often than
it should increasing the pressure on the system and compounding the
problem.

This patch notes that if direct reclaim is making progress but allocations
are still failing that the system is already under heavy pressure.  In
this case, it drains the per-cpu lists and tries the allocation a second
time before continuing.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake

Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists.  To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained
both periodically and when the delta is above a threshold.  On large CPU
systems, the difference between the estimated and real value of
NR_FREE_PAGES can be very high.  If NR_FREE_PAGES is much higher than
number of real free page in buddy, the VM can allocate pages below min
watermark, at worst reducing the real number of pages to zero.  Even if
the OOM killer kills some victim for freeing memory, it may not free
memory if the exit path requires a new page resulting in livelock.

This patch introduces a zone_page_state_snapshot() function (courtesy of
Christoph) that takes a slightly more accurate view of an arbitrary vmstat
counter.  It is used to read NR_FREE_PAGES while kswapd is awake to avoid
the watermark being accidentally broken.  The estimate is not perfect and
may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page allocator: update free page counters after pages are placed on the free list

When allocating a page, the system uses NR_FREE_PAGES counters to
determine if watermarks would remain intact after the allocation was made.
This check is made without interrupts disabled or the zone lock held and
so is race-prone by nature.  Unfortunately, when pages are being freed in
batch, the counters are updated before the pages are added on the list.
During this window, the counters are misleading as the pages do not exist
yet.  When under significant pressure on systems with large numbers of
CPUs, it's possible for processes to make progress even though they should
have been stalled.  This is particularly problematic if a number of the
processes are using GFP_ATOMIC as the min watermark can be accidentally
breached and in extreme cases, the system can livelock.

This patch updates the counters after the pages have been added to the
list.  This makes the allocator more cautious with respect to preserving
the watermarks and mitigates livelock possibilities.

[akpm@linux-foundation.org: avoid modifying incoming args]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vmstat: update zone stat threshold when onlining a cpu

refresh_zone_stat_thresholds() calculates parameter based on the number of
online cpus. It's called at cpu offlining but needs to be called at
onlining, too.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vfs: take O_NONBLOCK out of the O_* uniqueness test

O_NONBLOCK on parisc has a dual value:

#define O_NONBLOCK 000200004 /* HPUX has separate NDELAY & NONBLOCK */

It is caught by the O_* bits uniqueness check and leads to a parisc
compile error. The fix would be to take O_NONBLOCK out.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: Jamie Lokier <jamie@shareable.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

swap: discard while swapping only if SWAP_FLAG_DISCARD

Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs
show a shift since I last tested in December: in part because of firmware
updates, in part because of the necessary move from barriers to awaiting
completion at the block layer. While discard at swapon still shows as
slightly beneficial on both, discarding 1MB swap cluster when allocating
is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV).

Surrender: discard as presently implemented is more hindrance than help
for swap; but might prove useful on other devices, or with improvements.
So continue to do the discard at swapon, but make discard while swapping
conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using
only the lower 16 bits of int flags).

We can add a --discard or -d to swapon(8), and a "discard" to swap in
/etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

swap: do not send discards as barriers

The swap code already uses synchronous discards, no need to add I/O
barriers.

This fixes the worst of the terrible slowdown in swap allocation for
hibernation, reported on 2.6.35 by Nigel Cunningham; but does not entirely
eliminate that regression.

[tj@kernel.org: superflous newlines removed]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

swap: prevent reuse during hibernation

Move the hibernation check from scan_swap_map() into try_to_free_swap():
to catch not only the common case when hibernation's allocation itself
triggers swap reuse, but also the less likely case when concurrent page
reclaim (shrink_page_list) might happen to try_to_free_swap from a page.

Hibernation already clears __GFP_IO from the gfp_allowed_mask, to stop
reclaim from going to swap: check that to prevent swap reuse too.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

swap: revert special hibernation allocation

Please revert 2.6.36-rc commit d2997b1042ec150616c1963b5e5e919ffd0b0ebf
"hibernation: freeze swap at hibernation". It complicated matters by
adding a second swap allocation path, just for hibernation; without in any
way fixing the issue that it was intended to address - page reclaim after
fixing the hibernation image might free swap from a page already imaged as
swapcache, letting its swap be reallocated to store a different page of
the image: resulting in data corruption if the imaged page were freed as
clean then swapped back in. Pages freed to si->swap_map were still in
danger of being reallocated by the alternative allocation path.

I guess it inadvertently fixed slow SSD swap allocation for hibernation,
as reported by Nigel Cunningham: by missing out the discards that occur on
the usual swap allocation path; but that was unintentional, and needs a
separate fix.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>