Suresh Siddha [Wed, 24 Feb 2010 00:13:52 +0000 (16:13 -0800)]
sched: Fix SCHED_MC regression caused by change in sched cpu_power
On platforms like dual socket quad-core platform, the scheduler load
balancer is not detecting the load imbalances in certain scenarios. This
is leading to scenarios like where one socket is completely busy (with
all the 4 cores running with 4 tasks) and leaving another socket
completely idle. This causes performance issues as those 4 tasks share
the memory controller, last-level cache bandwidth etc. Also we won't be
taking advantage of turbo-mode as much as we would like, etc.
Some of the comparisons in the scheduler load balancing code are
comparing the "weighted cpu load that is scaled wrt sched_group's
cpu_power" with the "weighted average load per task that is not scaled
wrt sched_group's cpu_power". While this has probably been broken for a
longer time (for multi socket numa nodes etc), the problem got aggrevated
via this recent change:
|
| commit f93e65c186ab3c05ce2068733ca10e34fd00125e
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Tue Sep 1 10:34:32 2009 +0200
|
| sched: Restore __cpu_power to a straight sum of power
|
Also with this change, the sched group cpu power alone no longer reflects
the group capacity that is needed to implement MC, MT performance
(default) and power-savings (user-selectable) policies.
We need to use the computed group capacity (sgs.group_capacity, that is
computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to
find out if the group with the max load is above its capacity and how
much load to move etc.
Thomas Gleixner [Wed, 17 Feb 2010 08:05:48 +0000 (09:05 +0100)]
sched: Don't use possibly stale sched_class
setscheduler() saves task->sched_class outside of the rq->lock held
region for a check after the setscheduler changes have become
effective. That might result in checking a stale value.
rtmutex_setprio() has the same problem, though it is protected by
p->pi_lock against setscheduler(), but for correctness sake (and to
avoid bad examples) it needs to be fixed as well.
Retrieve task->sched_class inside of the rq->lock held region.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org
Peter Zijlstra [Mon, 15 Feb 2010 13:45:54 +0000 (14:45 +0100)]
sched: Fix race between ttwu() and task_rq_lock()
Thomas found that due to ttwu() changing a task's cpu without holding
the rq->lock, task_rq_lock() might end up locking the wrong rq.
Avoid this by serializing against TASK_WAKING.
Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266241712.15770.420.camel@laptop> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Suresh Siddha [Sat, 13 Feb 2010 01:14:22 +0000 (17:14 -0800)]
sched: Fix SMT scheduler regression in find_busiest_queue()
Fix a SMT scheduler performance regression that is leading to a scenario
where SMT threads in one core are completely idle while both the SMT threads
in another core (on the same socket) are busy.
This is caused by this commit (with the problematic code highlighted)
On a SMT system, power of the HT logical cpu will be 589 and
the scheduler load imbalance (for scenarios like the one mentioned above)
can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
the weighted load with the power will result in "wl > imbalance" and
ultimately resulting in find_busiest_queue() return NULL, causing
load_balance() to think that the load is well balanced. But infact
one of the tasks can be moved to the idle core for optimal performance.
We don't need to use the weighted load (wl) scaled by the cpu power to
compare with imabalance. In that condition, we already know there is only a
single task "rq->nr_running == 1" and the comparison between imbalance,
wl is to make sure that we select the correct priority thread which matches
imbalance. So we really need to compare the imabalnce with the original
weighted load of the cpu and not the scaled load.
But in other conditions where we want the most hammered(busiest) cpu, we can
use scaled load to ensure that we consider the cpu power in addition to the
actual load on that cpu, so that we can move the load away from the
guy that is getting most hammered with respect to the actual capacity,
as compared with the rest of the cpu's in that busiest group.
Fix it.
Reported-by: Ma Ling <ling.ma@intel.com> Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com> Cc: stable@kernel.org [2.6.32.x] Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix for sched_mc_powersavigs for pre-Nehalem platforms.
Child sched domain should clear SD_PREFER_SIBLING if parent will have
SD_POWERSAVINGS_BALANCE because they are contradicting.
Sets the flags correctly based on sched_mc_power_savings.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com> Cc: stable@kernel.org [2.6.32.x] Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Anton Blanchard [Tue, 2 Feb 2010 22:46:13 +0000 (14:46 -0800)]
sched: cpuacct: Use bigger percpu counter batch values for stats counters
When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
enabled we can call cpuacct_update_stats with values much larger
than percpu_counter_batch. This means the call to
percpu_counter_add will always add to the global count which is
protected by a spinlock and we end up with a global spinlock in
the scheduler.
Based on an idea by KOSAKI Motohiro, this patch scales the batch
value by cputime_one_jiffy such that we have the same batch
limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
His patch did this once at boot but that initialisation happened
too early on PowerPC (before time_init) and it was never updated
at runtime as a result of a hotplug cpu add/remove.
This patch instead scales percpu_counter_batch by
cputime_one_jiffy at runtime, which keeps the batch correct even
after cpu hotplug operations. We cap it at INT_MAX in case of
overflow.
For architectures that do not support
CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
and gcc is smart enough to optimise min(s32
percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC. So there is no need to add an #ifdef.
On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
disabled kernel:
Anton Blanchard [Tue, 2 Feb 2010 22:46:10 +0000 (14:46 -0800)]
percpu_counter: Make __percpu_counter_add an inline function on UP
Even though batch isn't used on UP, we may want to pass one in
to keep the SMP and UP code paths similar. Convert
__percpu_counter_add to an inline function so we wont get
variable unused warnings if we do.
Signed-off-by: Anton Blanchard <anton@samba.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Sat, 6 Feb 2010 21:01:39 +0000 (13:01 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Fix leak of relocs along do_execbuffer error path
drm/i915: slow acpi_lid_open() causes flickering - V2
drm/i915: Disable SR when more than one pipe is enabled
drm/i915: page flip support for Ironlake
drm/i915: Fix the incorrect DMI string for Samsung SX20S laptop
drm/i915: Add support for SDVO composite TV
drm/i915: don't trigger ironlake vblank interrupt at irq install
drm/i915: handle non-flip pending case when unpinning the scanout buffer
drm/i915: Fix the device info of Pineview
drm/i915: enable vblank interrupt on ironlake
drm/i915: Prevent use of uninitialized pointers along error path.
drm/i915: disable hotplug detect before Ironlake CRT detect
Linus Torvalds [Sat, 6 Feb 2010 00:16:50 +0000 (16:16 -0800)]
Fix potential crash with sys_move_pages
We incorrectly depended on the 'node_state/node_isset()' functions
testing the node range, rather than checking it explicitly. That's not
reliable, even if it might often happen to work. So do the proper
explicit test.
Jean Delvare [Fri, 5 Feb 2010 18:58:36 +0000 (19:58 +0100)]
hwmon: (w83781d) Request I/O ports individually for probing
Different motherboards have different PNP declarations for
W83781D/W83782D chips. Some declare the whole range of I/O ports (8
ports), some declare only the useful ports (2 ports at offset 5) and
some declare fancy ranges, for example 4 ports at offset 4. To
properly handle all cases, request all ports individually for probing.
After we have determined that we really have a W83781D or W83782D
chip, the useful port range will be requested again, as a single
block.
I did not see a board which needs this yet, but I know of one for lm78
driver and I'd like to keep the logic of these two drivers in sync.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: stable@kernel.org
Jean Delvare [Fri, 5 Feb 2010 18:58:36 +0000 (19:58 +0100)]
hwmon: (lm78) Request I/O ports individually for probing
Different motherboards have different PNP declarations for LM78/LM79
chips. Some declare the whole range of I/O ports (8 ports), some
declare only the useful ports (2 ports at offset 5) and some declare
fancy ranges, for example 4 ports at offset 4. To properly handle all
cases, request all ports individually for probing. After we have
determined that we really have an LM78 or LM79 chip, the useful port
range will be requested again, as a single block.
This fixes the driver on the Olivetti M3000 DT 540, at least.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: stable@kernel.org
Ray Copeland [Fri, 5 Feb 2010 18:58:35 +0000 (19:58 +0100)]
hwmon: (adt7462) Wrong ADT7462_VOLT_COUNT
The #define ADT7462_VOLT_COUNT is wrong, it should be 13 not 12. All the
for loops that use this as a limit count are of the typical form, "for
(n = 0; n < ADT7462_VOLT_COUNT; n++)", so to loop through all voltages
w/o missing the last one it is necessary for the count to be one greater
than it is. (Specifically, you will miss the +1.5V 3GPIO input with count
= 12 vs. 13.)
Signed-off-by: Ray Copeland <ray.copeland@aprius.com> Acked-by: "Darrick J. Wong" <djwong@us.ibm.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: stable@kernel.org
Linus Torvalds [Fri, 5 Feb 2010 15:58:21 +0000 (07:58 -0800)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] Call flush_dcache_page after PIO data transfers in libata-sff.c
ahci: add Acer G725 to broken suspend list
libata: fix ata_id_logical_per_physical_sectors
libata-scsi passthru: fix bug which truncated LBA48 return values
Andres Salomon [Fri, 5 Feb 2010 06:42:43 +0000 (01:42 -0500)]
CS5536: apply pci quirk for BIOS SMBUS bug
The new cs5535-* drivers use PCI header config info rather than MSRs to
determine the memory region to use for things like GPIOs and MFGPTs. As
anticipated, we've run into a buggy BIOS:
This is a Soekris board, and its BIOS sets the size of the PCI ISA bridge
device's BAR0 to 8k. In reality, it should be 8 bytes (BAR0 is used for
SMBus stuff). This quirk checks for an incorrect size, and resets it
accordingly.
Linus Torvalds [Fri, 5 Feb 2010 15:24:01 +0000 (07:24 -0800)]
Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: fix r300 vram width calculations
drm/radeon/kms: rs400/480 MC setup is different than r300.
drm/radeon/kms: make initial state of load detect property correct.
drm/radeon/kms: disable HDMI audio for now on rv710/rv730
drm/radeon/kms: don't call suspend path before cleaning up GPU
drivers/gpu/drm/radeon/radeon_combios.c: fix warning
ati_pcigart: fix printk format warning
drm/r100/kms: Emit cache flush to the end of command buffer. (v2)
drm/radeon/kms: fix regression rendering issue on R6XX/R7XX
drm/radeon/kms: move blit initialization after we disabled VGA
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: apply updated fallocate i_size fix
Btrfs: do not try and lookup the file extent when finishing ordered io
Btrfs: Fix oopsen when dropping empty tree.
Btrfs: remove BUG_ON() due to mounting bad filesystem
Btrfs: make error return negative in btrfs_sync_file()
Btrfs: fix race between allocate and release extent buffer.
Jaroslav Kysela [Fri, 5 Feb 2010 09:19:41 +0000 (10:19 +0100)]
ALSA: ice1724 - aureon - fix wm8770 volume offset
The volume register is from 0..0x7f and 0..0x1a range is mute.
Also, fix mute combining in wm_vol_put(). The wrong behaviour was
noticed by Peter Christensen.
Maxim Levitsky [Thu, 4 Feb 2010 20:21:47 +0000 (22:21 +0200)]
ALSA: hda - Delay switching to polling mode if an interrupt was missing
My sound codec seems sometimes (very rarely) to omit interrupts (ALC268)
However, interrupt mode still works.
Thus if we get timeout, poll the codec once.
If we get 3 such polls in a row, then switch to polling mode.
This patch is maybe an bandaid, but this might be a workaround for hardware bug.
Dave Airlie [Fri, 5 Feb 2010 03:41:54 +0000 (13:41 +1000)]
drm/radeon/kms: rs400/480 MC setup is different than r300.
Boot testing on my rs480 laptop found the MC idle never happened
on startup, a quick check with AMD found the idle bit is in a different
place on the rs4xx than r300.
Implement a new rs400 mc idle function to fix this.
Jerome Glisse [Tue, 2 Feb 2010 10:51:45 +0000 (11:51 +0100)]
drm/radeon/kms: don't call suspend path before cleaning up GPU
In suspend path we unmap the GART table while in cleaning up
path we will unbind buffer and thus try to write to unmapped
GART leading to oops. In order to avoid this we don't call the
suspend path in cleanup path. Cleanup path is clever enough
to desactive GPU like the suspend path is doing, thus this was
redondant.
Tested on: RV370, R420, RV515, RV570, RV610, RV770 (all PCIE)
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
drivers/gpu/drm/radeon/radeon_combios.c: In function 'radeon_combios_get_lvds_info':
drivers/gpu/drm/radeon/radeon_combios.c:893: warning: comparison is always false due to limited range of data type
Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
Randy Dunlap [Tue, 2 Feb 2010 22:40:33 +0000 (14:40 -0800)]
ati_pcigart: fix printk format warning
Fix ati_pcigart printk format warning:
drivers/gpu/drm/ati_pcigart.c:115: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'dma_addr_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jerome Glisse [Thu, 4 Feb 2010 19:36:39 +0000 (20:36 +0100)]
drm/radeon/kms: fix regression rendering issue on R6XX/R7XX
It seems that some R6XX/R7XX silently ignore HDP flush when
programmed through ring, this patch addback an ioctl callback
to allow R6XX/R7XX hw to perform such flush through MMIO in
order to fix a regression. For more details see:
http://bugzilla.kernel.org/show_bug.cgi?id=15186
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jerome Glisse [Thu, 4 Feb 2010 16:27:27 +0000 (17:27 +0100)]
drm/radeon/kms: move blit initialization after we disabled VGA
VGA might be overwritting VRAM and corrupt our blit shader leading
to corruption, it likely won't happen if you load fbcon right after
radeon. Thanks to Shawn Starr and Andre Maasikas for tracking down
this issue.
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Fri, 5 Feb 2010 00:09:01 +0000 (16:09 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: Fix wrong register in proc-arm6_7.S data abort handler
ARM: 5909/1: ARM: Correct the FPSCR bits setting when raising exceptions
ARM: 5904/1: ARM: Always generate the IT instruction when compiling for Thumb-2
ARM: 5907/1: ARM: Fix the reset on the RealView PBX Development board
mx35: add a missing comma in a pad definition
mx25: make the FEC AHB clk secondary of the IPG
mx25: fix time accounting
mx25: properly initialize clocks
mx25: remove unused mx25_clocks_init() argument
i.MX25: implement secondary clocks for uarts and fec
i.MX25: Allow secondary clocks in DEFINE_CLOCK
ARM: MX3: Fixed typo in declared enum type name.
MXC: Add AUDMUXv2 register decode to debugfs
mx31ads: Provide an IRQ range to the WM835x on the 1133-EV1 module
mx31ads: Provide a name for EXPIO interrupt chip
mx31ads: Allow enable/disable of switchable supplies
Linus Torvalds [Fri, 5 Feb 2010 00:08:42 +0000 (16:08 -0800)]
Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
omap: Disable serial port autoidle by default
omap: Fix access to already released memory in clk_debugfs_register_one()
omap: Fix arch/arm/mach-omap2/mux.c: Off by one error
omap: Fix 3630 mux errors
OMAP2/3: GPMC: ensure valid clock pointer
OMAP2/3: IRQ: ensure valid base address
ARCH OMAP : enable ARCH_HAS_HOLES_MEMORYMODEL for OMAP
omap: Remove old unused defines for OMAP_32KSYNCT_BASE
omap: define _toggle_gpio_edge_triggering only for OMAP1
Linus Torvalds [Fri, 5 Feb 2010 00:08:15 +0000 (16:08 -0800)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Don't clobber the attribute type in nfs_update_inode()
NFS: Fix a umount race
NFS: Fix an Oops when truncating a file
NFS: Ensure that we handle NFS4ERR_STALE_STATEID correctly
NFSv4.1: Don't call nfs4_schedule_state_recovery() unnecessarily
NFSv4: Don't allow posix locking against servers that don't support it
NFSv4: Ensure that the NFSv4 locking can recover from stateid errors
NFS: Avoid warnings when CONFIG_NFS_V4=n
NFS: Make nfs_commitdata_release static
NFS: Try to commit unstable writes in nfs_release_page()
NFS: Fix a reference leak in nfs_wb_cancel_page()
Linus Torvalds [Fri, 5 Feb 2010 00:07:41 +0000 (16:07 -0800)]
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Handle futex value corruption gracefully
futex: Handle user space corruption gracefully
futex_lock_pi() key refcnt fix
softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Extend umount wait coverage to full glock lifetime
GFS2: Wait for unlock completion on umount
Tejun Heo [Thu, 4 Feb 2010 08:57:37 +0000 (17:57 +0900)]
idr: revert misallocation bug fix
Commit 859ddf09743a8cc680af33f7259ccd0fd36bfe9d tried to fix
misallocation bug but broke full bit marking by not clearing
pa[idp->layers] and also is causing X failures due to lookup failure
in drm code. The cause of the latter hasn't been found yet. Revert
the fix for now.
Jaroslav Kysela [Tue, 2 Feb 2010 18:58:25 +0000 (19:58 +0100)]
ALSA: ctxfi - fix PTP address initialization
After hours of debugging, I finally found the reason why some source
and runtime combination does not work. The PTP (page table pages)
address must be aligned. I am not sure how much, but alignment to
PAGE_SIZE is sufficient. Also, use ALSA's page allocation routines
to ensure proper virtual -> physical address translation.
Cc: <stable@kernel.org> Signed-off-by: Jaroslav Kysela <perex@perex.cz>
acpi_lid_open() could take up to 10ms on my computer. Some component is
calling the drm GETCONNECTOR ioctl many times in a row. This results in
flickering (for example, when starting a video). Fix it by assuming an
always connected lid status.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Eric Anholt <eric@anholt.net>
David John [Wed, 27 Jan 2010 09:49:08 +0000 (15:19 +0530)]
drm/i915: Disable SR when more than one pipe is enabled
Self Refresh should be disabled on dual plane configs. Otherwise, as
the SR watermark is not calculated for such configs, switching to non
VGA mode causes FIFO underrun and display flicker.
This fixes Korg Bug #14897.
Signed-off-by: David John <davidjon@xenontk.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: stable@kernel.org Signed-off-by: Eric Anholt <eric@anholt.net>
you would see long stalls where no work was being done. That is because we were
doing all this extra work to read in the file extent outside of the transaction,
however in the random io case this ends up hurting us because the file extents
are not there to begin with. So axe this logic, since we end up reading in the
file extent when we go to update it anyway. This took the fio job from 11 mb/s
with several ~10 second stalls to 24 mb/s to a couple of 1-2 second stalls.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Miao Xie [Tue, 2 Feb 2010 08:46:44 +0000 (08:46 +0000)]
Btrfs: remove BUG_ON() due to mounting bad filesystem
Mounting a bad filesystem caused a BUG_ON(). The following is steps to
reproduce it.
# mkfs.btrfs /dev/sda2
# mount /dev/sda2 /mnt
# mkfs.btrfs /dev/sda1 /dev/sda2
(the program says that /dev/sda2 was mounted, and then exits. )
# umount /mnt
# mount /dev/sda1 /mnt
At the third step, mkfs.btrfs exited in the way of make filesystem. So the
initialization of the filesystem didn't finish. So the filesystem was bad, and
it caused BUG_ON() when mounting it. But BUG_ON() should be called by the wrong
code, not user's operation, so I think it is a bug of btrfs.
This patch fixes it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yong Zhang [Fri, 29 Jan 2010 06:58:47 +0000 (14:58 +0800)]
sched: Remove member rt_se from struct rt_rq
It's a duplicate of tg->rt_se[cpu] and the only usage is
sched_rt_rq_dequeue() and sched_rt_rq_enqueue(). After the
first patch to those two function. rt_se can be removed.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001282258q38781619u653ca4a7dd267347@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yong Zhang [Fri, 29 Jan 2010 06:57:52 +0000 (14:57 +0800)]
sched: Change usage of rt_rq->rt_se to rt_rq->tg->rt_se[cpu]
This is the first step to remove rt_rq member rt_se because it have the
same meaning with tg->rt_se[cpu]. And the latter style is also used by
the fair scheduling class.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001282257r28c97a92o9f90cf16fe8d3d84@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Catalin Marinas [Thu, 4 Feb 2010 06:04:50 +0000 (01:04 -0500)]
[libata] Call flush_dcache_page after PIO data transfers in libata-sff.c
flush_dcache_page() must be called after (!ATA_TFLAG_WRITE) the
data copying to avoid D-cache aliasing with user space or I-D cache
coherency issues (when reading data from an ATA device using PIO,
the kernel dirties the D-cache but there is no flush_dcache_page()
required on Harvard architectures).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Thu, 28 Jan 2010 07:04:15 +0000 (16:04 +0900)]
ahci: add Acer G725 to broken suspend list
Acer G725 shares the same suspend problem with the HP laptops which
lose ATA devices on resume. New firmware which fixes the problem is
already available. Add G725 with old firmwares to the broken suspend
list.
The value we get from the low byte of the ATA_ID_SECTOR_SIZE word is not not
a plain multiple, but the log of it, so fix the helper to give the correct
answer. Without this we'll get an incorrect minimal I/O size in the block
limits VPD page for 4k sector drives.
Also change the return value of ata_id_logical_per_physical_sectors to u16
for the unlikely case of very large logical sectors.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tony Lindgren [Mon, 1 Feb 2010 20:34:31 +0000 (12:34 -0800)]
omap: Disable serial port autoidle by default
Currently the omap serial clocks are autoidled after 5 seconds.
However, this causes lost characters on the serial ports. As this
is considered non-standard behaviour for Linux, disable the timeout.
Note that this will also cause blocking of any deeper omap sleep
states.
To enable the autoidling of the serial ports, do something like
this for each serial port:
Tony Lindgren [Mon, 1 Feb 2010 19:22:54 +0000 (11:22 -0800)]
omap: Fix 3630 mux errors
3630 has more mux signals than 34xx. The additional pins
exist in omap36xx_cbp_subset, but are not initialized
as the superset is missing these offsets. This causes
the following errors during the boot:
Sriram [Fri, 29 Jan 2010 22:20:05 +0000 (14:20 -0800)]
ARCH OMAP : enable ARCH_HAS_HOLES_MEMORYMODEL for OMAP
OMAP platforms(like OMAP3530) include DSP or other co-processors
for media acceleration. when carving out memory for the
accelerators we can end up creating a hole in the memory map
of sort:
<kernel memory><hole(memory for accelerator)><kernel memory>
To handle such a memory configuration ARCH_HAS_HOLES_MEMORYMODEL
has to be enabled. For further information refer discussion at:
http://www.mail-archive.com/linux-omap@vger.kernel.org/msg15262.html.
Signed-off-by: Sriramakrishnan <srk@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
Thomas Gleixner [Wed, 3 Feb 2010 08:33:05 +0000 (09:33 +0100)]
futex: Handle futex value corruption gracefully
The WARN_ON in lookup_pi_state which complains about a mismatch
between pi_state->owner->pid and the pid which we retrieved from the
user space futex is completely bogus.
The code just emits the warning and then continues despite the fact
that it detected an inconsistent state of the futex. A conveniant way
for user space to spam the syslog.
Replace the WARN_ON by a consistency check. If the values do not match
return -EINVAL and let user space deal with the mess it created.
This also fixes the missing task_pid_vnr() when we compare the
pi_state->owner pid with the futex value.
Reported-by: Jermome Marchand <jmarchan@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org>
Thomas Gleixner [Tue, 2 Feb 2010 10:40:27 +0000 (11:40 +0100)]
futex: Handle user space corruption gracefully
If the owner of a PI futex dies we fix up the pi_state and set
pi_state->owner to NULL. When a malicious or just sloppy programmed
user space application sets the futex value to 0 e.g. by calling
pthread_mutex_init(), then the futex can be acquired again. A new
waiter manages to enqueue itself on the pi_state w/o damage, but on
unlock the kernel dereferences pi_state->owner and oopses.
Prevent this by checking pi_state->owner in the unlock path. If
pi_state->owner is not current we know that user space manipulated the
futex value. Ignore the mess and return -EINVAL.
This catches the above case and also the case where a task hijacks the
futex by setting the tid value and then tries to unlock it.
Reported-by: Jermome Marchand <jmarchan@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org>
This fixes a futex key reference count bug in futex_lock_pi(),
where a key's reference count is incremented twice but decremented
only once, causing the backing object to not be released.
If the futex is created in a temporary file in an ext3 file system,
this bug causes the file's inode to become an "undead" orphan,
which causes an oops from a BUG_ON() in ext3_put_super() when the
file system is unmounted. glibc's test suite is known to trigger this,
see <http://bugzilla.kernel.org/show_bug.cgi?id=14256>.
The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's 38d47c1b7075bd7ec3881141bb3629da58f88dab "[PATCH] futex: rely on
get_user_pages() for shared futexes". That commit made get_futex_key()
also increment the reference count of the futex key, and updated its
callers to decrement the key's reference count before returning.
Unfortunately the normal exit path in futex_lock_pi() wasn't corrected:
the reference count is incremented by get_futex_key() and queue_lock(),
but the normal exit path only decrements once, via unqueue_me_pi().
The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31
this is easily done by 'goto out_put_key' rather than 'goto out'.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org>
Trond Myklebust [Wed, 3 Feb 2010 13:27:22 +0000 (08:27 -0500)]
NFS: Fix an Oops when truncating a file
The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
Since the NFS code assumes that the page stays mapped for as long as the
writeback is active, we can end up Oopsing (among other things).
The only safe fix here is to convert nfs_wait_on_request(), so as to make
it uninterruptible (as is already the case with wait_on_page_writeback()).
Interrupts must be disabled while an interrupt state restore
(prep for interrupt return) is in progress.
Code to do this was lost in the port to the mainline kernel.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Signed-off-by: Michal Simek <monstr@monstr.eu>
GFS2: Extend umount wait coverage to full glock lifetime
Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.
This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch adds a wait on umount between the point at which we
dispose of all glocks and the point at which we unmount the
lock protocol. This ensures that we've received all the replies
to our unlock requests before we stop the locking.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
connector: Delete buggy notification code.
be2net: use eq-id to calculate cev-isr reg offset
Bluetooth: Use the control channel for raw HID reports
Bluetooth: Add DFU driver for Atheros Bluetooth chipset AR3011
Bluetooth: Redo checks in IRQ handler for shared IRQ support
Bluetooth: Fix memory leak in L2CAP
Bluetooth: Remove double free of SKB pointer in L2CAP
cdc_ether: Partially revert "usbnet: Set link down initially ..."
be2net: Fix memset() arg ordering.
bonding: bond_open error return value
ixgbe: if ixgbe_copy_dcb_cfg is going to fail learn about it early
ixgbe: set the correct DCB bit for pg tx settings
igbvf: fix issue w/ mapped_as_page being left set after unmap
drivers/net: ks8851_mll ethernet network driver
be2net: Bug fix to support newer generation of BE ASIC
starfire: clean up properly if firmware loading fails
mac80211: fix NULL pointer dereference when ftrace is enabled
netfilter: ctnetlink: fix expectation mask dump
ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure
ath9k: fix eeprom INI values override for 2GHz-only cards
...
pktcdvd: removing device does not remove its sysfs dir
This is the counterpart to cba767175becadc5c4016cceb7bfdd2c7fe722f4
("pktcdvd: remove broken dev_t export of class devices"). Device is not
registered using dev_t, so it should not be destroyed using device_destroy
which looks up the device by dev_t. This will fail and adding the device
again will fail with the "duplicate name" error. This is fixed using
device_unregister instead of device_destroy.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Peter Osterlund <petero2@telia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Röjfors [Tue, 2 Feb 2010 21:44:12 +0000 (13:44 -0800)]
uartlite: fix crash when using as console
Move the ulite_console_setup to the .devinit section since it might be
called on probe, which is in devinit. Fixes the crash below where the
uartlite hw is probed after the .init section is freed from the kernel.
uartlite: ttyUL0 at MMIO 0xc8000100 (irq = 30) is a uartlite
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c176720e>] ulite_console_setup+0x6f/0xa8
*pdpt = 0000000036fb0001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.1/host0/uevent
Modules linked in: puffin(+) serio_raw
imxfb: correct location of callbacks in suspend and resume
The probe function passes a pointer to a struct fb_info to
platform_set_drvdata(), so don't interpret the return value of
platform_get_drvdata() as a pointer to struct imxfb_info.
The original imxfb_info *fbi backlight_power was NULL but in imxfb_suspend
it was 4 resulting in an oops as imxfb_suspend calls
imxfb_disable_controller(fbi) which in turn has
if (fbi->backlight_power)
fbi->backlight_power(0);
Li Zefan [Tue, 2 Feb 2010 21:44:10 +0000 (13:44 -0800)]
cgroups: fix to return errno in a failure path
In cgroup_create(), if alloc_css_id() returns failure, the errno is not
propagated to userspace, so mkdir will fail silently.
To trigger this bug, we mount blkio (or memory subsystem), and create more
then 65534 cgroups. (The number of cgroups is limited to 65535 if a
subsystem has use_id == 1)
# mount -t cgroup -o blkio xxx /mnt
# for ((i = 0; i < 65534; i++)); do mkdir /mnt/$i; done
# mkdir /mnt/65534
(should return ENOSPC)
#
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hui Zhu [Tue, 2 Feb 2010 21:44:09 +0000 (13:44 -0800)]
markup_oops.pl: fix $func_offset error with x86_64
When I use markup_oops.pl parse a x8664 oops, I got:
objdump: --start-address: bad number: NaN
No matching code found
This is because:
main::(./m.pl:228): open(FILE, "objdump -dS --adjust-vma=$vmaoffset --start-address=$decodestart --stop-address=$decodestop $filename |") || die "Cannot start objdump";
DB<3> p $decodestart
NaN
This NaN is from:
main::(./m.pl:176): my $decodestart = Math::BigInt->from_hex("0x$target") - Math::BigInt->from_hex("0x$func_offset");
DB<2> p $func_offset
0x175
There is already a "0x" in $func_offset, another 0x makes it a NaN.
I make a patch to change "(0x[0-9a-f]+)\/0x[a-f0-9]/)" to "0x([0-9a-f]+)\/0x[a-f0-9]/)".
Signed-off-by: Hui Zhu <teawater@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Kennedy [Tue, 2 Feb 2010 21:44:07 +0000 (13:44 -0800)]
get_maintainer.pl: teach git log to use --no-color
When git has been set to always use color in .gitconfig then I get the
warning message
Bad divisor in main::vcs_assign: 0
This is caused by vcs_file_signoffs not matching any commits due to the
pattern not understand the colour codes. Fix this by telling git log to
never use colour.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 2 Feb 2010 21:44:06 +0000 (13:44 -0800)]
devmem: fix kmem write bug on memory holes
write_kmem() used to assume vwrite() always return the full buffer length.
However now vwrite() could return 0 to indicate memory hole. This
creates a bug that "buf" is not advanced accordingly.
Fix it to simply ignore the return value, hence the memory hole.
This also makes the kmem read/write implementation aligned with mem(4):
"References to nonexistent locations cause errors to be returned." Here we
return -ENXIO (inspired by Hugh) if no bytes have been transfered to/from
user space, otherwise return partial read/write results.
anfei zhou [Tue, 2 Feb 2010 21:44:02 +0000 (13:44 -0800)]
mm: flush dcache before writing into page to avoid alias
The cache alias problem will happen if the changes of user shared mapping
is not flushed before copying, then user and kernel mapping may be mapped
into two different cache line, it is impossible to guarantee the coherence
after iov_iter_copy_from_user_atomic. So the right steps should be:
flush_dcache_page(page);
kmap_atomic(page);
write to page;
kunmap_atomic(page);
flush_dcache_page(page);
More precisely, we might create two new APIs flush_dcache_user_page and
flush_dcache_kern_page to replace the two flush_dcache_page accordingly.
Here is a snippet tested on omap2430 with VIPT cache, and I think it is
not ARM-specific:
Randy Dunlap [Tue, 2 Feb 2010 21:44:01 +0000 (13:44 -0800)]
kfifo: fix kernel-doc notation
Fix kfifo kernel-doc warnings:
Warning(kernel/kfifo.c:361): No description found for parameter 'total'
Warning(kernel/kfifo.c:402): bad line: @ @lenout: pointer to output variable with copied data
Warning(kernel/kfifo.c:412): No description found for parameter 'lenout'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>