git.karo-electronics.de Git - karo-tx-linux.git/log

sched: Fix SCHED_MC regression caused by change in sched cpu_power

On platforms like dual socket quad-core platform, the scheduler load
balancer is not detecting the load imbalances in certain scenarios. This
is leading to scenarios like where one socket is completely busy (with
all the 4 cores running with 4 tasks) and leaving another socket
completely idle. This causes performance issues as those 4 tasks share
the memory controller, last-level cache bandwidth etc. Also we won't be
taking advantage of turbo-mode as much as we would like, etc.

Some of the comparisons in the scheduler load balancing code are
comparing the "weighted cpu load that is scaled wrt sched_group's
cpu_power" with the "weighted average load per task that is not scaled
wrt sched_group's cpu_power". While this has probably been broken for a
longer time (for multi socket numa nodes etc), the problem got aggrevated
via this recent change:

|
|  commit f93e65c186ab3c05ce2068733ca10e34fd00125e
|  Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
|  Date:   Tue Sep 1 10:34:32 2009 +0200
|
| sched: Restore __cpu_power to a straight sum of power
|

Also with this change, the sched group cpu power alone no longer reflects
the group capacity that is needed to implement MC, MT performance
(default) and power-savings (user-selectable) policies.

We need to use the computed group capacity (sgs.group_capacity, that is
computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to
find out if the group with the max load is above its capacity and how
much load to move etc.

Reported-by: Ma Ling <ling.ma@intel.com>
Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
[ -v2: build fix ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org> # [2.6.32.x, 2.6.33.x]
LKML-Reference: <1266970432.11588.22.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: Don't use possibly stale sched_class

setscheduler() saves task->sched_class outside of the rq->lock held
region for a check after the setscheduler changes have become
effective. That might result in checking a stale value.

rtmutex_setprio() has the same problem, though it is protected by
p->pi_lock against setscheduler(), but for correctness sake (and to
avoid bad examples) it needs to be fixed as well.

Retrieve task->sched_class inside of the rq->lock held region.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org

Merge branch 'sched/urgent' into sched/core

Conflicts: kernel/sched.c

Necessary due to the urgent fixes which conflict with the code move
from sched.c to sched_fair.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sched: Fix race between ttwu() and task_rq_lock()

Thomas found that due to ttwu() changing a task's cpu without holding
the rq->lock, task_rq_lock() might end up locking the wrong rq.

Avoid this by serializing against TASK_WAKING.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266241712.15770.420.camel@laptop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sched: Fix SMT scheduler regression in find_busiest_queue()

Fix a SMT scheduler performance regression that is leading to a scenario
where SMT threads in one core are completely idle while both the SMT threads
in another core (on the same socket) are busy.

This is caused by this commit (with the problematic code highlighted)

   commit bdb94aa5dbd8b55e75f5a50b61312fe589e2c2d1
   Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
   Date:   Tue Sep 1 10:34:38 2009 +0200

   sched: Try to deal with low capacity

   @@ -4203,15 +4223,18 @@ find_busiest_queue()
   ...
for_each_cpu(i, sched_group_cpus(group)) {
   + unsigned long power = power_of(i);

   ...

   - wl = weighted_cpuload(i);
   + wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
   + wl /= power;

   - if (rq->nr_running == 1 && wl > imbalance)
   + if (capacity && rq->nr_running == 1 && wl > imbalance)
continue;

On a SMT system, power of the HT logical cpu will be 589 and
the scheduler load imbalance (for scenarios like the one mentioned above)
can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
the weighted load with the power will result in "wl > imbalance" and
ultimately resulting in find_busiest_queue() return NULL, causing
load_balance() to think that the load is well balanced. But infact
one of the tasks can be moved to the idle core for optimal performance.

We don't need to use the weighted load (wl) scaled by the cpu power to
compare with  imabalance. In that condition, we already know there is only a
single task "rq->nr_running == 1" and the comparison between imbalance,
wl is to make sure that we select the correct priority thread which matches
imbalance. So we really need to compare the imabalnce with the original
weighted load of the cpu and not the scaled load.

But in other conditions where we want the most hammered(busiest) cpu, we can
use scaled load to ensure that we consider the cpu power in addition to the
actual load on that cpu, so that we can move the load away from the
guy that is getting most hammered with respect to the actual capacity,
as compared with the rest of the cpu's in that busiest group.

Fix it.

Reported-by: Ma Ling <ling.ma@intel.com>
Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
Cc: stable@kernel.org [2.6.32.x]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sched: Fix sched_mv_power_savings for !SMT

Fix for sched_mc_powersavigs for pre-Nehalem platforms.
Child sched domain should clear SD_PREFER_SIBLING if parent will have
SD_POWERSAVINGS_BALANCE because they are contradicting.

Sets the flags correctly based on sched_mc_power_savings.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com>
Cc: stable@kernel.org [2.6.32.x]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

kthread, sched: Remove reference to kthread_create_on_cpu

kthread_create_on_cpu doesn't exist so update a comment in
kthread.c to reflect this.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100209040740.GB3702@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cpuacct: Use bigger percpu counter batch values for stats counters

When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
enabled we can call cpuacct_update_stats with values much larger
than percpu_counter_batch.  This means the call to
percpu_counter_add will always add to the global count which is
protected by a spinlock and we end up with a global spinlock in
the scheduler.

Based on an idea by KOSAKI Motohiro, this patch scales the batch
value by cputime_one_jiffy such that we have the same batch
limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
His patch did this once at boot but that initialisation happened
too early on PowerPC (before time_init) and it was never updated
at runtime as a result of a hotplug cpu add/remove.

This patch instead scales percpu_counter_batch by
cputime_one_jiffy at runtime, which keeps the batch correct even
after cpu hotplug operations.  We cap it at INT_MAX in case of
overflow.

For architectures that do not support
CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
and gcc is smart enough to optimise min(s32
percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC.  So there is no need to add an #ifdef.

On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
disabled kernel:

CONFIG_CGROUP_CPUACCT disabled:   16906698 ctx switches/sec
CONFIG_CGROUP_CPUACCT enabled:       61720 ctx switches/sec
CONFIG_CGROUP_CPUACCT + patch:    16663217 ctx switches/sec

Tested with:

wget http://ozlabs.org/~anton/junkcode/context_switch.c
make context_switch
for i in `seq 0 63`; do taskset -c $i ./context_switch & done
vmstat 1

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

percpu_counter: Make __percpu_counter_add an inline function on UP

Even though batch isn't used on UP, we may want to pass one in
to keep the SMP and UP code paths similar. Convert
__percpu_counter_add to an inline function so we wont get
variable unused warnings if we do.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'sched/urgent' into sched/core

Merge reason: Merge dependent fix, update to latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

kernel/sched.c: Suppress unused var warning

On UP:

kernel/sched.c: In function 'wake_up_new_task':
kernel/sched.c:2631: warning: unused variable 'cpu'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Linux 2.6.33-rc7

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: (w83781d) Request I/O ports individually for probing
  hwmon: (lm78) Request I/O ports individually for probing
  hwmon: (adt7462) Wrong ADT7462_VOLT_COUNT

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
  drm/i915: Fix leak of relocs along do_execbuffer error path
  drm/i915: slow acpi_lid_open() causes flickering - V2
  drm/i915: Disable SR when more than one pipe is enabled
  drm/i915: page flip support for Ironlake
  drm/i915: Fix the incorrect DMI string for Samsung SX20S laptop
  drm/i915: Add support for SDVO composite TV
  drm/i915: don't trigger ironlake vblank interrupt at irq install
  drm/i915: handle non-flip pending case when unpinning the scanout buffer
  drm/i915: Fix the device info of Pineview
  drm/i915: enable vblank interrupt on ironlake
  drm/i915: Prevent use of uninitialized pointers along error path.
  drm/i915: disable hotplug detect before Ironlake CRT detect

Fix potential crash with sys_move_pages

We incorrectly depended on the 'node_state/node_isset()' functions
testing the node range, rather than checking it explicitly. That's not
reliable, even if it might often happen to work. So do the proper
explicit test.

Reported-by: Marcus Meissner <meissner@suse.de>
Acked-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ASoC: pandora: Add APLL supply to fix audio output
  ALSA: ice1724 - aureon - fix wm8770 volume offset
  ALSA: cosmetic: make hda intel interrupt name consistent with others
  ALSA: hda - Delay switching to polling mode if an interrupt was missing
  ALSA: ctxfi - fix PTP address initialization

hwmon: (w83781d) Request I/O ports individually for probing

Different motherboards have different PNP declarations for
W83781D/W83782D chips. Some declare the whole range of I/O ports (8
ports), some declare only the useful ports (2 ports at offset 5) and
some declare fancy ranges, for example 4 ports at offset 4. To
properly handle all cases, request all ports individually for probing.
After we have determined that we really have a W83781D or W83782D
chip, the useful port range will be requested again, as a single
block.

I did not see a board which needs this yet, but I know of one for lm78
driver and I'd like to keep the logic of these two drivers in sync.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org

hwmon: (lm78) Request I/O ports individually for probing

Different motherboards have different PNP declarations for LM78/LM79
chips. Some declare the whole range of I/O ports (8 ports), some
declare only the useful ports (2 ports at offset 5) and some declare
fancy ranges, for example 4 ports at offset 4. To properly handle all
cases, request all ports individually for probing. After we have
determined that we really have an LM78 or LM79 chip, the useful port
range will be requested again, as a single block.

This fixes the driver on the Olivetti M3000 DT 540, at least.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org

hwmon: (adt7462) Wrong ADT7462_VOLT_COUNT

The #define ADT7462_VOLT_COUNT is wrong, it should be 13 not 12. All the
for loops that use this as a limit count are of the typical form, "for
(n = 0; n < ADT7462_VOLT_COUNT; n++)", so to loop through all voltages
w/o missing the last one it is necessary for the count to be one greater
than it is. (Specifically, you will miss the +1.5V 3GPIO input with count
= 12 vs. 13.)

Signed-off-by: Ray Copeland <ray.copeland@aprius.com>
Acked-by: "Darrick J. Wong" <djwong@us.ibm.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org

Merge remote branch 'alsa/fixes' into for-linus

Merge branch 'fix/asoc' into for-linus

Merge branch 'fix/hda' into for-linus

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] Call flush_dcache_page after PIO data transfers in libata-sff.c
  ahci: add Acer G725 to broken suspend list
  libata: fix ata_id_logical_per_physical_sectors
  libata-scsi passthru: fix bug which truncated LBA48 return values

CS5536: apply pci quirk for BIOS SMBUS bug

The new cs5535-* drivers use PCI header config info rather than MSRs to
determine the memory region to use for things like GPIOs and MFGPTs.  As
anticipated, we've run into a buggy BIOS:

[    0.081818] pci 0000:00:14.0: reg 10: [io  0x6000-0x7fff]
[    0.081906] pci 0000:00:14.0: reg 14: [io  0x6100-0x61ff]
[    0.082015] pci 0000:00:14.0: reg 18: [io  0x6200-0x63ff]
[    0.082917] pci 0000:00:14.2: reg 20: [io  0xe000-0xe00f]
[    0.083551] pci 0000:00:15.0: reg 10: [mem 0xa0010000-0xa0010fff]
[    0.084436] pci 0000:00:15.1: reg 10: [mem 0xa0011000-0xa0011fff]
[    0.088816] PCI: pci_cache_line_size set to 32 bytes
[    0.088938] pci 0000:00:14.0: address space collision: [io 0x6100-0x61ff] already in use
[    0.089052] pci 0000:00:14.0: can't reserve [io  0x6100-0x61ff]

This is a Soekris board, and its BIOS sets the size of the PCI ISA bridge
device's BAR0 to 8k.  In reality, it should be 8 bytes (BAR0 is used for
SMBus stuff).  This quirk checks for an incorrect size, and resets it
accordingly.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Tested-by: Leigh Porter <leigh@leighporter.org>
Tested-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

percpu: add __percpu for sparse

This is to make the annotation of percpu variables during the next merge
window less painfull.

Extracted from a patch by Rusty Russell.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6

* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: fix r300 vram width calculations
  drm/radeon/kms: rs400/480 MC setup is different than r300.
  drm/radeon/kms: make initial state of load detect property correct.
  drm/radeon/kms: disable HDMI audio for now on rv710/rv730
  drm/radeon/kms: don't call suspend path before cleaning up GPU
  drivers/gpu/drm/radeon/radeon_combios.c: fix warning
  ati_pcigart: fix printk format warning
  drm/r100/kms: Emit cache flush to the end of command buffer. (v2)
  drm/radeon/kms: fix regression rendering issue on R6XX/R7XX
  drm/radeon/kms: move blit initialization after we disabled VGA

Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: apply updated fallocate i_size fix
  Btrfs: do not try and lookup the file extent when finishing ordered io
  Btrfs: Fix oopsen when dropping empty tree.
  Btrfs: remove BUG_ON() due to mounting bad filesystem
  Btrfs: make error return negative in btrfs_sync_file()
  Btrfs: fix race between allocate and release extent buffer.

ASoC: pandora: Add APLL supply to fix audio output

Pandora's external DAC is using 256*Fs output from the TWL4030
codec, and TWL4030 needs to have APLL enabled for it's 256*Fs
output to function.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@nokia.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

ALSA: ice1724 - aureon - fix wm8770 volume offset

The volume register is from 0..0x7f and 0..0x1a range is mute.
Also, fix mute combining in wm_vol_put(). The wrong behaviour was
noticed by Peter Christensen.

Signed-off-by: Jaroslav Kysela <perex@perex.cz>

ALSA: cosmetic: make hda intel interrupt name consistent with others

This renames the interrupt name in /proc/interrupt.
HDA Intel -> hda_intel

This also eliminates space from the name, probably helping some
parsers.
Don't think anybody depends on this name in userspace

Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Delay switching to polling mode if an interrupt was missing

My sound codec seems sometimes (very rarely) to omit interrupts (ALC268)
However, interrupt mode still works.
Thus if we get timeout, poll the codec once.

If we get 3 such polls in a row, then switch to polling mode.

This patch is maybe an bandaid, but this might be a workaround for hardware bug.

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

drm/radeon/kms: fix r300 vram width calculations

This was incorrect according to the docs and the UMS driver does
it like this.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: rs400/480 MC setup is different than r300.

Boot testing on my rs480 laptop found the MC idle never happened
on startup, a quick check with AMD found the idle bit is in a different
place on the rs4xx than r300.

Implement a new rs400 mc idle function to fix this.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: make initial state of load detect property correct.

this was incorrect on my rs480.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: disable HDMI audio for now on rv710/rv730

Support isn't correct yet and we are getting green tinges on the
displays.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: don't call suspend path before cleaning up GPU

In suspend path we unmap the GART table while in cleaning up
path we will unbind buffer and thus try to write to unmapped
GART leading to oops. In order to avoid this we don't call the
suspend path in cleanup path. Cleanup path is clever enough
to desactive GPU like the suspend path is doing, thus this was
redondant.

Tested on: RV370, R420, RV515, RV570, RV610, RV770 (all PCIE)

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drivers/gpu/drm/radeon/radeon_combios.c: fix warning

drivers/gpu/drm/radeon/radeon_combios.c: In function 'radeon_combios_get_lvds_info':
drivers/gpu/drm/radeon/radeon_combios.c:893: warning: comparison is always false due to limited range of data type

Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

ati_pcigart: fix printk format warning

Fix ati_pcigart printk format warning:

drivers/gpu/drm/ati_pcigart.c:115: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'dma_addr_t'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/r100/kms: Emit cache flush to the end of command buffer. (v2)

Cache flush is required in case CPU is accessing rendered data.

This fixes glean/readPixSanity test case and random rendering
errors in sauerbraten and warzone2100.

v2 Fix comment ordering in r100_fence_ring_emit and remove extra
defines added in first version.

Signed-off-by: Pauli Nieminen <suokkos@gmail.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: fix regression rendering issue on R6XX/R7XX

It seems that some R6XX/R7XX silently ignore HDP flush when
programmed through ring, this patch addback an ioctl callback
to allow R6XX/R7XX hw to perform such flush through MMIO in
order to fix a regression. For more details see:

http://bugzilla.kernel.org/show_bug.cgi?id=15186

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: move blit initialization after we disabled VGA

VGA might be overwritting VRAM and corrupt our blit shader leading
to corruption, it likely won't happen if you load fbcon right after
radeon. Thanks to Shawn Starr and Andre Maasikas for tracking down
this issue.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: Fix wrong register in proc-arm6_7.S data abort handler
  ARM: 5909/1: ARM: Correct the FPSCR bits setting when raising exceptions
  ARM: 5904/1: ARM: Always generate the IT instruction when compiling for Thumb-2
  ARM: 5907/1: ARM: Fix the reset on the RealView PBX Development board
  mx35: add a missing comma in a pad definition
  mx25: make the FEC AHB clk secondary of the IPG
  mx25: fix time accounting
  mx25: properly initialize clocks
  mx25: remove unused mx25_clocks_init() argument
  i.MX25: implement secondary clocks for uarts and fec
  i.MX25: Allow secondary clocks in DEFINE_CLOCK
  ARM: MX3: Fixed typo in declared enum type name.
  MXC: Add AUDMUXv2 register decode to debugfs
  mx31ads: Provide an IRQ range to the WM835x on the 1133-EV1 module
  mx31ads: Provide a name for EXPIO interrupt chip
  mx31ads: Allow enable/disable of switchable supplies

Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6

* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  omap: Disable serial port autoidle by default
  omap: Fix access to already released memory in clk_debugfs_register_one()
  omap: Fix arch/arm/mach-omap2/mux.c: Off by one error
  omap: Fix 3630 mux errors
  OMAP2/3: GPMC: ensure valid clock pointer
  OMAP2/3: IRQ: ensure valid base address
  ARCH OMAP : enable ARCH_HAS_HOLES_MEMORYMODEL for OMAP
  omap: Remove old unused defines for OMAP_32KSYNCT_BASE
  omap: define _toggle_gpio_edge_triggering only for OMAP1

Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Don't clobber the attribute type in nfs_update_inode()
  NFS: Fix a umount race
  NFS: Fix an Oops when truncating a file
  NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctly
  NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarily
  NFSv4: Don't allow posix locking against servers that don't support it
  NFSv4: Ensure that the NFSv4 locking can recover from stateid errors
  NFS: Avoid warnings when CONFIG_NFS_V4=n
  NFS: Make nfs_commitdata_release static
  NFS: Try to commit unstable writes in nfs_release_page()
  NFS: Fix a reference leak in nfs_wb_cancel_page()

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Handle futex value corruption gracefully
  futex: Handle user space corruption gracefully
  futex_lock_pi() key refcnt fix
  softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Extend umount wait coverage to full glock lifetime
GFS2: Wait for unlock completion on umount

idr: revert misallocation bug fix

Commit 859ddf09743a8cc680af33f7259ccd0fd36bfe9d tried to fix
misallocation bug but broke full bit marking by not clearing
pa[idp->layers] and also is causing X failures due to lookup failure
in drm code. The cause of the latter hasn't been found yet. Revert
the fix for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ALSA: ctxfi - fix PTP address initialization

After hours of debugging, I finally found the reason why some source
and runtime combination does not work. The PTP (page table pages)
address must be aligned. I am not sure how much, but alignment to
PAGE_SIZE is sufficient. Also, use ALSA's page allocation routines
to ensure proper virtual -> physical address translation.

Cc: <stable@kernel.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>

drm/i915: Fix leak of relocs along do_execbuffer error path

Following a gpu hang, we would leak the relocation buffer. So simply
earrange the error path to always free the relocation buffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>

drm/i915: slow acpi_lid_open() causes flickering - V2

acpi_lid_open() could take up to 10ms on my computer.  Some component is
calling the drm GETCONNECTOR ioctl many times in a row.  This results in
flickering (for example, when starting a video).  Fix it by assuming an
always connected lid status.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Eric Anholt <eric@anholt.net>

drm/i915: Disable SR when more than one pipe is enabled

Self Refresh should be disabled on dual plane configs. Otherwise, as
the SR watermark is not calculated for such configs, switching to non
VGA mode causes FIFO underrun and display flicker.

This fixes Korg Bug #14897.

Signed-off-by: David John <davidjon@xenontk.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>

Btrfs: apply updated fallocate i_size fix

This version of the i_size fix for fallocate makes sure we only update
the i_size when the current fallocate is really operating outside of
i_size.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: do not try and lookup the file extent when finishing ordered io

When running the following fio job

[torrent]
filename=torrent-test
rw=randwrite
size=4g
filesize=4g
bs=4k
ioengine=sync

you would see long stalls where no work was being done.  That is because we were
doing all this extra work to read in the file extent outside of the transaction,
however in the random io case this ends up hurting us because the file extents
are not there to begin with.  So axe this logic, since we end up reading in the
file extent when we go to update it anyway.  This took the fio job from 11 mb/s
with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: Fix oopsen when dropping empty tree.

When dropping a empty tree, walk_down_tree() skips checking
extent information for the tree root. This will triggers a
BUG_ON in walk_up_proc().

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: remove BUG_ON() due to mounting bad filesystem

Mounting a bad filesystem caused a BUG_ON(). The following is steps to
reproduce it.
# mkfs.btrfs /dev/sda2
# mount /dev/sda2 /mnt
# mkfs.btrfs /dev/sda1 /dev/sda2
(the program says that /dev/sda2 was mounted, and then exits. )
# umount /mnt
# mount /dev/sda1 /mnt

At the third step, mkfs.btrfs exited in the way of make filesystem. So the
initialization of the filesystem didn't finish. So the filesystem was bad, and
it caused BUG_ON() when mounting it. But BUG_ON() should be called by the wrong
code, not user's operation, so I think it is a bug of btrfs.

This patch fixes it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: make error return negative in btrfs_sync_file()

It appears the error return should be negative

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix race between allocate and release extent buffer.

Increase extent buffer's reference count while holding the lock.
Otherwise it can race with try_release_extent_buffer.

Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

sched: Remove member rt_se from struct rt_rq

It's a duplicate of tg->rt_se[cpu] and the only usage is
sched_rt_rq_dequeue() and sched_rt_rq_enqueue(). After the
first patch to those two function. rt_se can be removed.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001282258q38781619u653ca4a7dd267347@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: Change usage of rt_rq->rt_se to rt_rq->tg->rt_se[cpu]

This is the first step to remove rt_rq member rt_se because it have the
same meaning with tg->rt_se[cpu]. And the latter style is also used by
the fair scheduling class.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001282257r28c97a92o9f90cf16fe8d3d84@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[libata] Call flush_dcache_page after PIO data transfers in libata-sff.c

flush_dcache_page() must be called after (!ATA_TFLAG_WRITE) the
data copying to avoid D-cache aliasing with user space or I-D cache
coherency issues (when reading data from an ATA device using PIO,
the kernel dirties the D-cache but there is no flush_dcache_page()
required on Harvard architectures).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: add Acer G725 to broken suspend list

Acer G725 shares the same suspend problem with the HP laptops which
lose ATA devices on resume.  New firmware which fixes the problem is
already available.  Add G725 with old firmwares to the broken suspend
list.

This problem has been reported in bko#15104.

  http://bugzilla.kernel.org/show_bug.cgi?id=15104

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jani-Matti Hätinen <jani-matti.hatinen@iki.fi>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: fix ata_id_logical_per_physical_sectors

The value we get from the low byte of the ATA_ID_SECTOR_SIZE word is not not
a plain multiple, but the log of it, so fix the helper to give the correct
answer. Without this we'll get an incorrect minimal I/O size in the block
limits VPD page for 4k sector drives.

Also change the return value of ata_id_logical_per_physical_sectors to u16
for the unlikely case of very large logical sectors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata-scsi passthru: fix bug which truncated LBA48 return values

Fix assignment which overwrote SAT ATA PASS-THROUGH command EXTEND
bit setting (ATA_TFLAG_LBA48)

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze

* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: fix interrupt state restore
microblaze: Defconfig update

Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
saa7146: stop DMA before de-allocating DMA scatter/gather page buffers
V4L/DVB: saa7134: remove stray unlock_kernel

omap: Disable serial port autoidle by default

Currently the omap serial clocks are autoidled after 5 seconds.
However, this causes lost characters on the serial ports. As this
is considered non-standard behaviour for Linux, disable the timeout.

Note that this will also cause blocking of any deeper omap sleep
states.

To enable the autoidling of the serial ports, do something like
this for each serial port:

# echo 5 > /sys/devices/platform/serial8250.0/sleep_timeout
# echo 5 > /sys/devices/platform/serial8250.1/sleep_timeout
...

Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

omap: Fix access to already released memory in clk_debugfs_register_one()

I have found an access to already released memory in
clk_debugfs_register_one() function.

Signed-off-by: Marek Skuczynski <mareksk7@gmail.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

omap: Fix arch/arm/mach-omap2/mux.c: Off by one error

David Binderman ran the sourceforge tool cppcheck over the source code of the
new Linux kernel 2.6.33-rc6:

[./arm/mach-omap2/mux.c:492]: (error) Buffer access out-of-bounds

13 characters + 1 digit + 1 zero byte is more than 14 characters.

Also add a comment on mode0 name length in case new omaps
start using longer names.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

omap: Fix 3630 mux errors

3630 has more mux signals than 34xx. The additional pins
exist in omap36xx_cbp_subset, but are not initialized
as the superset is missing these offsets. This causes
the following errors during the boot:

mux: Unknown entry offset 0x236
mux: Unknown entry offset 0x22e
mux: Unknown entry offset 0x1ec
mux: Unknown entry offset 0x1ee
mux: Unknown entry offset 0x1f4
mux: Unknown entry offset 0x1f6
mux: Unknown entry offset 0x1f8
mux: Unknown entry offset 0x1fa
mux: Unknown entry offset 0x1fc
mux: Unknown entry offset 0x22a
mux: Unknown entry offset 0x226
mux: Unknown entry offset 0x230
mux: Unknown entry offset 0x22c
mux: Unknown entry offset 0x228

Fix this by adding the missing offsets to omap3 superset.
Note that additionally the uninitialized pins need to be
skipped on 34xx.

Based on an earlier patch by Allen Pais <allen.pais@ti.com>.

Reported-by: Allen Pais <allen.pais@ti.com>
Signed-off-by: Allen Pais <allen.pais@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

OMAP2/3: GPMC: ensure valid clock pointer

Ensure valid clock pointer during GPMC init. Fixes compiler
warning about potential use of uninitialized variable.

Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

OMAP2/3: IRQ: ensure valid base address

Ensure valid base address during IRQ init. Fixes compiler warning
about potential use of uninitialized variable.

Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

ARCH OMAP : enable ARCH_HAS_HOLES_MEMORYMODEL for OMAP

OMAP platforms(like OMAP3530) include DSP or other co-processors
for media acceleration. when carving out memory for the
accelerators we can end up creating a hole in the memory map
of sort:
<kernel memory><hole(memory for accelerator)><kernel memory>

To handle such a memory configuration ARCH_HAS_HOLES_MEMORYMODEL
has to be enabled. For further information refer discussion at:
http://www.mail-archive.com/linux-omap@vger.kernel.org/msg15262.html.

Signed-off-by: Sriramakrishnan <srk@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

omap: Remove old unused defines for OMAP_32KSYNCT_BASE

Remove old unused defines for OMAP_32KSYNCT_BASE

Signed-off-by: Tony Lindgren <tony@atomide.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix potential leak of dirty data on umount

ARM: Fix wrong register in proc-arm6_7.S data abort handler

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

futex: Handle futex value corruption gracefully

The WARN_ON in lookup_pi_state which complains about a mismatch
between pi_state->owner->pid and the pid which we retrieved from the
user space futex is completely bogus.

The code just emits the warning and then continues despite the fact
that it detected an inconsistent state of the futex. A conveniant way
for user space to spam the syslog.

Replace the WARN_ON by a consistency check. If the values do not match
return -EINVAL and let user space deal with the mess it created.

This also fixes the missing task_pid_vnr() when we compare the
pi_state->owner pid with the futex value.

Reported-by: Jermome Marchand <jmarchan@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>

futex: Handle user space corruption gracefully

If the owner of a PI futex dies we fix up the pi_state and set
pi_state->owner to NULL. When a malicious or just sloppy programmed
user space application sets the futex value to 0 e.g. by calling
pthread_mutex_init(), then the futex can be acquired again. A new
waiter manages to enqueue itself on the pi_state w/o damage, but on
unlock the kernel dereferences pi_state->owner and oopses.

Prevent this by checking pi_state->owner in the unlock path. If
pi_state->owner is not current we know that user space manipulated the
futex value. Ignore the mess and return -EINVAL.

This catches the above case and also the case where a task hijacks the
futex by setting the tid value and then tries to unlock it.

Reported-by: Jermome Marchand <jmarchan@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>

futex_lock_pi() key refcnt fix

This fixes a futex key reference count bug in futex_lock_pi(),
where a key's reference count is incremented twice but decremented
only once, causing the backing object to not be released.

If the futex is created in a temporary file in an ext3 file system,
this bug causes the file's inode to become an "undead" orphan,
which causes an oops from a BUG_ON() in ext3_put_super() when the
file system is unmounted. glibc's test suite is known to trigger this,
see <http://bugzilla.kernel.org/show_bug.cgi?id=14256>.

The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's
38d47c1b7075bd7ec3881141bb3629da58f88dab "[PATCH] futex: rely on
get_user_pages() for shared futexes". That commit made get_futex_key()
also increment the reference count of the futex key, and updated its
callers to decrement the key's reference count before returning.
Unfortunately the normal exit path in futex_lock_pi() wasn't corrected:
the reference count is incremented by get_futex_key() and queue_lock(),
but the normal exit path only decrements once, via unqueue_me_pi().
The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31
this is easily done by 'goto out_put_key' rather than 'goto out'.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>

NFS: Don't clobber the attribute type in nfs_update_inode()

If the NFS_ATTR_FATTR_TYPE field isn't set in fattr->valid, then we should
not set the S_IFMT part of inode->i_mode.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: Fix a umount race

Ensure that we unregister the bdi before kill_anon_super() calls
ida_remove() on our device name.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

NFS: Fix an Oops when truncating a file

The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
Since the NFS code assumes that the page stays mapped for as long as the
writeback is active, we can end up Oopsing (among other things).

The only safe fix here is to convert nfs_wait_on_request(), so as to make
it uninterruptible (as is already the case with wait_on_page_writeback()).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

microblaze: fix interrupt state restore

Interrupts must be disabled while an interrupt state restore
(prep for interrupt return) is in progress.
Code to do this was lost in the port to the mainline kernel.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>

GFS2: Extend umount wait coverage to full glock lifetime

Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.

This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

GFS2: Wait for unlock completion on umount

This patch adds a wait on umount between the point at which we
dispose of all glocks and the point at which we unmount the
lock protocol. This ensures that we've received all the replies
to our unlock requests before we stop the locking.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>

microblaze: Defconfig update

There were several changes in Microblaze defconfig that's why
is good to update defconfigs.

Signed-off-by: Michal Simek <monstr@monstr.eu>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
kernel/cred.c: use kmem_cache_free

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
  connector: Delete buggy notification code.
  be2net: use eq-id to calculate cev-isr reg offset
  Bluetooth: Use the control channel for raw HID reports
  Bluetooth: Add DFU driver for Atheros Bluetooth chipset AR3011
  Bluetooth: Redo checks in IRQ handler for shared IRQ support
  Bluetooth: Fix memory leak in L2CAP
  Bluetooth: Remove double free of SKB pointer in L2CAP
  cdc_ether: Partially revert "usbnet: Set link down initially ..."
  be2net: Fix memset() arg ordering.
  bonding: bond_open error return value
  ixgbe: if ixgbe_copy_dcb_cfg is going to fail learn about it early
  ixgbe: set the correct DCB bit for pg tx settings
  igbvf: fix issue w/ mapped_as_page being left set after unmap
  drivers/net: ks8851_mll ethernet network driver
  be2net: Bug fix to support newer generation of BE ASIC
  starfire: clean up properly if firmware loading fails
  mac80211: fix NULL pointer dereference when ftrace is enabled
  netfilter: ctnetlink: fix expectation mask dump
  ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure
  ath9k: fix eeprom INI values override for 2GHz-only cards
  ...

pktcdvd: removing device does not remove its sysfs dir

This is the counterpart to cba767175becadc5c4016cceb7bfdd2c7fe722f4
("pktcdvd: remove broken dev_t export of class devices").  Device is not
registered using dev_t, so it should not be destroyed using device_destroy
which looks up the device by dev_t.  This will fail and adding the device
again will fail with the "duplicate name" error.  This is fixed using
device_unregister instead of device_destroy.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memory hotplug: fix a bug on /dev/mem for 64-bit kernels

Newly added memory can not be accessed via /dev/mem, because we do not
update the variables high_memory, max_pfn and max_low_pfn.

Add a function update_end_of_memory_vars() to update these variables for
64-bit kernels.

[akpm@linux-foundation.org: simplify comment]
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Li Haicheng <haicheng.li@intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fault injection: correct function names in documentation

init_fault_attr_entries() should be init_fault_attr_dentries().

cleanup_fault_attr_entries() should be cleanup_fault_attr_dentries().

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hugetlb: fix section mismatches

hugetlb_sysfs_add_hstate is called by hugetlb_register_node directly
during init and also indirectly via sysfs after init.

This patch removes the __init tag from hugetlb_sysfs_add_hstate.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

uartlite: fix crash when using as console

Move the ulite_console_setup to the .devinit section since it might be
called on probe, which is in devinit. Fixes the crash below where the
uartlite hw is probed after the .init section is freed from the kernel.

uartlite: ttyUL0 at MMIO 0xc8000100 (irq = 30) is a uartlite
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c176720e>] ulite_console_setup+0x6f/0xa8
*pdpt = 0000000036fb0001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.1/host0/uevent
Modules linked in: puffin(+) serio_raw

Pid: 151, comm: modprobe Not tainted (2.6.31.5-1.0.b1-b1 #1) POULSBO
EIP: 0060:[<c176720e>] EFLAGS: 00010246 CPU: 0
EIP is at ulite_console_setup+0x6f/0xa8
EAX: c16ec824 EBX: c16ec824 ECX: c176719f EDX: 00000000
ESI: 00000000 EDI: c17b42c4 EBP: f6fd1cf0 ESP: f6fd1cd8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process modprobe (pid: 151, ti=f6fd0000 task=f6fa1020 task.ti=f6fd0000)
Stack:
c1031f51 00000000 00000000 00000246 c182237c f7742000 f6fd1d5c c11fd316
<0> c16ec85c f77420d4 0000001e 00000000 00000000 c1633e78 4f494d4d 63783020
<0> 30303038 00303031 f6fd1d3c c10e0786 f6fd1d48 00000000 f6fd1d48 00000000
Call Trace:
[<c1031f51>] ? register_console+0xf6/0x1fc
[<c11fd316>] ? uart_add_one_port+0x237/0x2bb
[<c10e0786>] ? sysfs_add_one+0x13/0xd3
[<c10e142f>] ? sysfs_do_create_link+0xba/0xfc
[<c146f200>] ? ulite_probe+0x198/0x1eb
[<c12064ee>] ? platform_drv_probe+0xc/0xe
[<c120597b>] ? driver_probe_device+0x79/0x105
[<c1205a8e>] ? __device_attach+0x28/0x30
[<c120511f>] ? bus_for_each_drv+0x3d/0x67
[<c1205af9>] ? device_attach+0x44/0x58
[<c1205a66>] ? __device_attach+0x0/0x30
[<c1204fb8>] ? bus_probe_device+0x1f/0x34
[<c1203e68>] ? device_add+0x385/0x4c0
[<c148491f>] ? _write_unlock+0x8/0x1f
[<c1206aac>] ? platform_device_add+0xd9/0x11c
[<c120c685>] ? mfd_add_devices+0x165/0x1bc
[<f831b378>] ? puffin_probe+0x2d0/0x390 [puffin]
[<c11a08ef>] ? pci_match_device+0xa0/0xa7
[<c11a07bc>] ? local_pci_probe+0xe/0x10
[<c11a11db>] ? pci_device_probe+0x43/0x66
[<c120597b>] ? driver_probe_device+0x79/0x105
[<c1205a4a>] ? __driver_attach+0x43/0x5f
[<c120535d>] ? bus_for_each_dev+0x3d/0x67
[<c1205852>] ? driver_attach+0x14/0x16
[<c1205a07>] ? __driver_attach+0x0/0x5f
[<c1204dea>] ? bus_add_driver+0xf9/0x220
[<c1205c8f>] ? driver_register+0x8b/0xeb
[<c11a1518>] ? __pci_register_driver+0x43/0x9f
[<c10477ef>] ? __blocking_notifier_call_chain+0x40/0x4c
[<f831f000>] ? puffin_init+0x0/0x48 [puffin]
[<f831f017>] ? puffin_init+0x17/0x48 [puffin]
[<c1001139>] ? do_one_initcall+0x4c/0x131
[<c105607b>] ? sys_init_module+0xa7/0x1b7
[<c1002a61>] ? syscall_call+0x7/0xb
Code: 6e 74 00 00 00 92 33 00 00 18 00 0e 01 73 79 6e 63 65 2d 72 65 67 69 73 74 72 79 0c 00 49 32
00 00 14 00 09 01 61 6c 73 61 2d 69 <6e> 66 6f 00 00 00 42 37 00 00 10 00 07 01 6b 69 6c 6c 61 6c 6c
EIP: [<c176720e>] ulite_console_setup+0x6f/0xa8 SS:ESP 0068:f6fd1cd8
CR2: 0000000000000000

Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>
Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

imxfb: correct location of callbacks in suspend and resume

The probe function passes a pointer to a struct fb_info to
platform_set_drvdata(), so don't interpret the return value of
platform_get_drvdata() as a pointer to struct imxfb_info.

The original imxfb_info *fbi backlight_power was NULL but in imxfb_suspend
it was 4 resulting in an oops as imxfb_suspend calls
imxfb_disable_controller(fbi) which in turn has

if (fbi->backlight_power)
fbi->backlight_power(0);

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <kernel@pengutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroups: fix to return errno in a failure path

In cgroup_create(), if alloc_css_id() returns failure, the errno is not
propagated to userspace, so mkdir will fail silently.

To trigger this bug, we mount blkio (or memory subsystem), and create more
then 65534 cgroups. (The number of cgroups is limited to 65535 if a
subsystem has use_id == 1)

# mount -t cgroup -o blkio xxx /mnt
# for ((i = 0; i < 65534; i++)); do mkdir /mnt/$i; done
# mkdir /mnt/65534
(should return ENOSPC)
#

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

markup_oops.pl: fix $func_offset error with x86_64

When I use markup_oops.pl parse a x8664 oops, I got:

objdump: --start-address: bad number: NaN
No matching code found
This is because:
main::(./m.pl:228): open(FILE, "objdump -dS --adjust-vma=$vmaoffset --start-address=$decodestart --stop-address=$decodestop $filename |") || die "Cannot start objdump";
  DB<3> p $decodestart
NaN

This NaN is from:
main::(./m.pl:176): my $decodestart = Math::BigInt->from_hex("0x$target") - Math::BigInt->from_hex("0x$func_offset");
  DB<2> p $func_offset
0x175

There is already a "0x" in $func_offset, another 0x makes it a NaN.

The $func_offset is from line:

if ($line =~ /RIP: 0010:\[\<[0-9a-f]+\>\]  \[\<[0-9a-f]+\>\] ([a-zA-Z0-9\_]+)\+(0x[0-9a-f]+)\/0x[a-f0-9]/) {
$function = $1;
$func_offset = $2;
}

I make a patch to change "(0x[0-9a-f]+)\/0x[a-f0-9]/)" to "0x([0-9a-f]+)\/0x[a-f0-9]/)".

Signed-off-by: Hui Zhu <teawater@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

get_maintainer.pl: teach git log to use --no-color

When git has been set to always use color in .gitconfig then I get the
warning message

Bad divisor in main::vcs_assign: 0

This is caused by vcs_file_signoffs not matching any commits due to the
pattern not understand the colour codes. Fix this by telling git log to
never use colour.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

devmem: fix kmem write bug on memory holes

write_kmem() used to assume vwrite() always return the full buffer length.
However now vwrite() could return 0 to indicate memory hole. This
creates a bug that "buf" is not advanced accordingly.

Fix it to simply ignore the return value, hence the memory hole.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

devmem: check vmalloc address on kmem read/write

Otherwise vmalloc_to_page() will BUG().

This also makes the kmem read/write implementation aligned with mem(4):
"References to nonexistent locations cause errors to be returned." Here we
return -ENXIO (inspired by Hugh) if no bytes have been transfered to/from
user space, otherwise return partial read/write results.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: flush dcache before writing into page to avoid alias

The cache alias problem will happen if the changes of user shared mapping
is not flushed before copying, then user and kernel mapping may be mapped
into two different cache line, it is impossible to guarantee the coherence
after iov_iter_copy_from_user_atomic. So the right steps should be:

flush_dcache_page(page);
kmap_atomic(page);
write to page;
kunmap_atomic(page);
flush_dcache_page(page);

More precisely, we might create two new APIs flush_dcache_user_page and
flush_dcache_kern_page to replace the two flush_dcache_page accordingly.

Here is a snippet tested on omap2430 with VIPT cache, and I think it is
not ARM-specific:

int val = 0x11111111;
fd = open("abc", O_RDWR);
addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*(addr+0) = 0x44444444;
tmp = *(addr+0);
*(addr+1) = 0x77777777;
write(fd, &val, sizeof(int));
close(fd);

The results are not always 0x11111111 0x77777777 at the beginning as expected. Sometimes we see 0x44444444 0x77777777.

Signed-off-by: Anfei <anfei.zhou@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <linux-arch@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kfifo: fix kernel-doc notation

Fix kfifo kernel-doc warnings:

Warning(kernel/kfifo.c:361): No description found for parameter 'total'
Warning(kernel/kfifo.c:402): bad line: @ @lenout: pointer to output variable with copied data
Warning(kernel/kfifo.c:412): No description found for parameter 'lenout'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>