Matt Carlson [Wed, 20 Jul 2011 10:20:50 +0000 (10:20 +0000)]
tg3: Fix io failures after chip reset
Commit f2096f94b514d88593355995d5dd276961e88af1, entitled
"tg3: Add 5720 H2BMC support", needed to add code to preserve some bits
set by firmware. Unfortunately the new code causes throughput to stop
after a chip reset because it enables state machines before they are
ready. This patch undoes the problematic code. The bits will be
restored later in the init sequence.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes both the failure in the self-test on 578xx
and a hole in a parity recovery flow that this failure
has discovered:
- internal 'pending' state in a VLAN_MAC object wasn't been cleared
when the object state change was called with DRV_ONLY flag, which in
particular happens when a parity error happens during the self-test.
- bp->sp_state wasn't cleared in the similar circumstances as described
above.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the parity errors recovery flow for 578xx:
- Add a separate column for the 578xx in the parity mask
registers DB.
- Fix the bnx2x_process_kill_chip_reset() to handle the blocks
newly introduced in the 578xx.
Cover ATC and PGLUE_B blocks for 57712 and 578xx.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x: Read FIP mac from SHMEM in single function mode
Read FIP MAC address from SHMEM's "port" section
similar to what we do in a MF mode when we read it from
a "func" section of SHMEM.
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x: Implementation for netdev->ndo_fcoe_get_wwn
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Make Dell Latitude E6420 use reboot=pci
x86: Make Dell Latitude E5420 use reboot=pci
H. Peter Anvin [Thu, 21 Jul 2011 18:22:21 +0000 (11:22 -0700)]
x86: Make Dell Latitude E6420 use reboot=pci
Yet another variant of the Dell Latitude series which requires
reboot=pci.
From the E5420 bug report by Daniel J Blueman:
> The E6420 is affected also (same platform, different casing and
> features), which provides an external confirmation of the issue; I can
> submit a patch for that later or include it if you prefer:
> http://linux.koolsolutions.com/2009/08/04/howto-fix-linux-hangfreeze-during-reboots-and-restarts/
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>
Daniel J Blueman [Fri, 13 May 2011 01:04:59 +0000 (09:04 +0800)]
x86: Make Dell Latitude E5420 use reboot=pci
Rebooting on the Dell E5420 often hangs with the keyboard or ACPI
methods, but is reliable via the PCI method.
[ hpa: this was deferred because we believed for a long time that the
recent reshuffling of the boot priorities in commit 660e34cebf0a11d54f2d5dd8838607452355f321 fixed this platform.
Unfortunately that turned out to be incorrect. ]
Peter Zijlstra [Thu, 7 Jul 2011 09:39:45 +0000 (11:39 +0200)]
lockdep: Fix lockdep_no_validate against IRQ states
Thomas noticed that a lock marked with lockdep_set_novalidate_class()
will still trigger warnings for IRQ inversions. Cure this by skipping
those when marking irq state.
Robert Richter [Tue, 7 Jun 2011 09:49:55 +0000 (11:49 +0200)]
x86, perf: Make copy_from_user_nmi() a library function
copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.
x86, perf: P4 PMU - Fix typos in comments and style cleanup
This patch:
- fixes typos in comments and clarifies the text
- renames obscure p4_event_alias::original and ::alter members to
::original and ::alternative as appropriate
- drops parenthesis from the return of p4_get_alias_event()
vfs: drop conditional inode prefetch in __do_lookup_rcu
It seems to hurt performance in real life. Yes, the inode will be used
later, but the conditional doesn't seem to predict all that well
(negative dentries are not uncommon) and it looks like the cost of
prefetching is simply higher than depending on the cache doing the right
thing.
The compiler, at least for ix86 and m68k, validly warns that the
comparison:
next <= (loff_t)-1
is always true (and it's always true also for x86-64 and probably all
other arches - as long as pgoff_t isn't wider than loff_t). The
intention appears to be to avoid wrapping of "next", so rather than
eliminating the pointless comparison, fix the loop to indeed get exited
when "next" would otherwise wrap.
On m68k the following warning is observed:
fs/fscache/page.c: In function '__fscache_uncache_all_inode_pages':
fs/fscache/page.c:979: warning: comparison is always false due to limited range of data type
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephan Baerwolf [Wed, 20 Jul 2011 12:46:59 +0000 (14:46 +0200)]
sched: Replace use of entity_key()
"entity_key()" is only used in "__enqueue_entity()" and
its only function is to subtract a tasks vruntime by
its groups minvruntime.
Before this patch a rbtree enqueue-decision is done by
comparing two tasks in the style:
So we do not need "entity_key()".
If "entity_before()" is inline we will also save one subtraction (only one,
because "entity_key(cfs_rq, se)" was cached in "key")
Jan H. Schönherr [Thu, 14 Jul 2011 16:32:43 +0000 (18:32 +0200)]
sched: Separate group-scheduling code more clearly
Clean up cfs/rt runqueue initialization by moving group scheduling
related code into the corresponding functions.
Also, keep group scheduling as an add-on, so that things are only done
additionally, i. e. remove the init_*_rq() calls from init_tg_*_entry().
(This removes a redundant initalization during sched_init()).
In case of group scheduling rt_rq->highest_prio.curr is now initialized
twice, but adding another #ifdef seems not worth it.
Richard Kennedy [Fri, 15 Jul 2011 10:41:31 +0000 (11:41 +0100)]
sched: Reorder root_domain to remove 64 bit alignment padding
Reorder root_domain to remove 8 bytes of alignment padding on 64 bit
builds, this shrinks the size from 1736 to 1728 bytes, therefore using
one fewer cachelines.
sched: Do not attempt to destroy uninitialized rt_bandwidth
If a task group is to be created and alloc_fair_sched_group() fails,
then the rt_bandwidth of the corresponding task group is not yet
initialized. The caller, sched_create_group(), starts a clean up
procedure which calls free_rt_sched_group() which unconditionally
destroys the not yet initialized rt_bandwidth.
This crashes or hangs the system in lock_hrtimer_base(): UP systems
dereference a NULL pointer, while SMP systems loop endlessly on a
condition that cannot become true.
This patch simply avoids the destruction of rt_bandwidth when the
initialization code path was not reached.
(This was discovered by accident with a custom kernel modification.)
Jan Schoenherr [Wed, 13 Jul 2011 18:13:32 +0000 (20:13 +0200)]
sched: Remove unused function cpu_cfs_rq()
The last reference to cpu_cfs_rq() was removed with commit 88ec22d3
("sched: Remove the cfs_rq dependency from set_task_cpu()"). Thus,
remove this function, too.
Peter Zijlstra [Wed, 13 Jul 2011 11:09:25 +0000 (13:09 +0200)]
sched, cgroup: Optimize load_balance_fair()
Use for_each_leaf_cfs_rq() instead of list_for_each_entry_rcu(), this
achieves that load_balance_fair() only iterates those task_groups that
actually have tasks on busiest, and that we iterate bottom-up, trying to
move light groups before the heavier ones.
No idea if it will actually work out to be beneficial in practice, does
anybody have a cgroup workload that might show a difference one way or
the other?
[ Also move update_h_load to sched_fair.c, loosing #ifdef-ery ]
Paul Turner [Thu, 7 Jul 2011 05:30:37 +0000 (22:30 -0700)]
sched: Don't update shares twice on on_rq parent
In dequeue_task_fair() we bail on dequeue when we encounter a parenting entity
with additional weight. However, we perform a double shares update on this
entity as we continue the shares update traversal from this point, despite
dequeue_entity() having already updated its queuing cfs_rq.
Avoid this by starting from the parent when we resume.
Paul Turner [Wed, 6 Jul 2011 02:07:21 +0000 (19:07 -0700)]
sched: update correct entity's runtime in check_preempt_wakeup()
While looking at check_preempt_wakeup() I realized that we are
potentially updating the wrong entity in the fair-group scheduling
case. In this case the current task's cfs_rq may not be the same as
the one used for the comparison between the waking task and the
existing task's vruntime.
This potentially results in us using a stale vruntime in the
pre-emption decision, providing a small false preference for the
previous task. The effects of this are bounded since we always
perform a hierarchal update on the tick.
Without the patch it hangs. After the patch SIGSTOP "injected" by the
tracer is not ignored and stops the tracee.
Note also that if this test-case uses, say, SIGWINCH instead of SIGCONT,
everything works without the patch. This can't be right, and this is
confusing.
The problem is that SIGSTOP (or any other sig_kernel_stop() signal) has
no effect without JOBCTL_STOP_DEQUEUED. This means it is simply ignored
after PTRACE_CONT unless JOBCTL_STOP_DEQUEUED was set "by accident", say
it wasn't cleared after initial SIGSTOP sent by PTRACE_ATTACH.
At first glance we could change ptrace_signal() to add STOP_DEQUEUED
after return from ptrace_stop(), but this is not right in case when the
tracer does not change the reported SIGSTOP and SIGCONT comes in between.
This is even more wrong with PT_SEIZED, SIGCONT adds JOBCTL_TRAP_NOTIFY
which will be "lost" during the TRAP_STOP | TRAP_NOTIFY report.
So lets add STOP_DEQUEUED _before_ we report the signal. It has no effect
unless sig_kernel_stop() == T after the tracer resumes us, and in the
latter case the pending STOP_DEQUEUED means no SIGCONT in between, we
should stop.
Note also that if SIGCONT was sent, PT_SEIZED tracee will correctly
report PTRACE_EVENT_STOP/SIGTRAP and thus the tracer can notice the fact
SIGSTOP was cancelled.
Also, move the current->ptrace check from ptrace_signal() to its caller,
get_signal_to_deliver(), this looks more natural.
Daniel Drake [Sun, 17 Jul 2011 15:38:41 +0000 (16:38 +0100)]
mmc: print debug messages for runtime PM actions
At http://www.mail-archive.com/linux-mmc@vger.kernel.org/msg08371.html
(thread: "mmc: sdio: reset card during power_restore") we found and
fixed a bug where mmc's runtime power management functions were not being
called. We have now also made improvements to the SDIO powerup routine
which could possibly mask this kind of issue in future.
Add debug messages to the runtime PM hooks so that it is easy to verify
if and when runtime PM is happening.
Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Chris Ball <cjb@laptop.org>
In the case where a driver returns -ENOSYS from its suspend handler
to indicate that the device should be powered down over suspend, the
remove routine of the driver was not being called, leading to lots of
confusion during resume.
The problem is that runtime PM is disabled during this process,
and when we reach mmc_sdio_remove, calling the runtime PM functions here
(validly) return errors, and this was causing us to skip the remove
function.
Fix this by ignoring the error value of pm_runtime_get_sync(), which
can return valid errors. This also matches the behaviour of
pci_device_remove().
Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Fix clock rate setting in the mxs-mmc driver. Previously, if div2 was 0
then the value for TIMING_CLOCK_RATE would have been 255 instead of 0.
The limits for div1 (TIMING_CLOCK_DIVIDE) and div2 (TIMING_CLOCK_RATE+1)
were also not correctly defined.
Can easily be reproduced on mx23evk: default clock for high speed sdio
cards is 50 MHz. With a SSP_CLK of 28.8 MHz default), this resulted in
an actual clock rate of about 56 kHz. Tested on mx23evk.
Signed-off-by: Koen Beel <koen.beel@barco.com> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Chris Ball <cjb@laptop.org>
Currently the tmio-mmc driver contains a recursive runtime PM method
invocation, which leads to a deadlock on a mutex. Avoid it by taking
care not to request DMA too early.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
mmc: tmio: fix a recently introduced bug in DMA code
A recent commit "mmc: tmio: Share register access functions" has swapped
arguments of a macro and broken DMA with TMIO MMC. This patch fixes the
arguments back.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
mmc: tmio: fix recursive spinlock, don't schedule with interrupts disabled
Calling mmc_request_done() under a spinlock with interrupts disabled
leads to a recursive spin-lock on request retry path and to
scheduling in atomic context. This patch fixes both these problems
by moving mmc_request_done() to the scheduler workqueue.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
mmc: Added quirks for Ricoh 1180:e823 lower base clock frequency
Ricoh 1180:e823 does not recognize certain types of SD/MMC cards,
as reported at http://launchpad.net/bugs/773524. Lowering the SD
base clock frequency from 200Mhz to 50Mhz fixes this issue. This
solution was suggest by Koji Matsumuro, Ricoh Company, Ltd.
This change has no negative performance effect on standard SD
cards, though it's quite possible that there will be one on
UHS-1 cards.
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Tested-by: Daniel Manrique <daniel.manrique@canonical.com> Cc: Koji Matsumuro <matsumur@nts.ricoh.co.jp> Cc: <stable@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Andy Shevchenko [Wed, 13 Jul 2011 15:31:15 +0000 (11:31 -0400)]
mmc: omap_hsmmc: refactor duplicated code
There are a few places with the same functionality. This patch creates
two functions omap_hsmmc_set_bus_width() and omap_hsmmc_set_bus_mode()
to do the job.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Andy Shevchenko [Fri, 6 May 2011 09:14:06 +0000 (12:14 +0300)]
mmc: omap_hsmmc: fix a few bugs when setting the clock divisor
There are two pieces of code which are similar, but not the same.
Each of them contains a bug.
The SYSCTL register should be read before writing to it in
omap_hsmmc_context_restore() to retain the state of the reserved bits.
Before setting the clock divisor and DTO bits the value from the SYSCTL
register should be masked properly. We were lucky to have no problems
with DTO bits. So, make sure we have clear DTO bits properly in
omap_hsmmc_set_ios().
Additionally get rid of msleep(1). The actual time is rarely higher
than 30us on OMAP 3630.
The resulting pieces of code are refactored into the
omap_hsmmc_set_clock() function.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Andy Shevchenko [Fri, 6 May 2011 09:14:05 +0000 (12:14 +0300)]
mmc: omap_hsmmc: introduce start_clock and re-use stop_clock
There is similar code in two functions which enable the clock. Refactor
this code to omap_hsmmc_start_clock(). Re-use omap_hsmmc_stop_clock() in
omap_hsmmc_context_restore() as well.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Andy Shevchenko [Tue, 10 May 2011 12:51:54 +0000 (15:51 +0300)]
mmc: omap_hsmmc: split duplicate code to calc_divisor() function
There are two places where the same calculations are done.
Let's split them into a separate function.
In addition, simplify by using the DIV_ROUND_UP kernel macro.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Andy Shevchenko [Wed, 13 Jul 2011 15:16:29 +0000 (11:16 -0400)]
mmc: omap_hsmmc: move hardcoded frequency constants to defines
Move the min and max frequency constants to the definition block in
the source file.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com> Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Chris Friesen [Thu, 21 Jul 2011 10:07:10 +0000 (12:07 +0200)]
netfilter: ipset: fix compiler warnings "'hash_ip4_data_next' declared inline after being called"
Some gcc versions warn about prototypes without "inline" when the declaration
includes the "inline" keyword. The fix generates a false error message
"marked inline, but without a definition" with sparse below 0.4.2.
Signed-off-by: Chris Friesen <chris.friesen@genband.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
Jiri Olsa [Thu, 14 Jul 2011 09:25:32 +0000 (11:25 +0200)]
perf tools: De-opt the parse_events function
Moving out the option parameter from parse_events function,
and adding new parse_events_option function instead.
The option parameter is used only to carry "struct perf_evlist"
pointer for chaining new events. Putting it away, enable us
to call parse_events from other places without using the
option parameter.
The perf_event_attr struct has two __u32's at the top and
they need to be swapped individually.
With this change I was able to analyze a perf.data collected in a
32-bit PPC VM on an x86 system. I tested both 32-bit and 64-bit
binaries for the Intel analysis side; both read the PPC perf.data
file correctly.
-v2:
- changed the existing perf_event__attr_swap() to swap only elements
of perf_event_attr and exported it for use in swapping the
attributes in the file header
- updated swap_ops used for processing events
Jiri Olsa [Wed, 13 Jul 2011 20:58:18 +0000 (22:58 +0200)]
perf tools: Add missing 'node' alias to the hw_cache[] array
Add "node" as a simple alias for NODE cache events.
The addition of NODE cache events broke the parse_alias
function, so any mismatched event caused the segfault, like:
# ./perf stat -e krava ls
The hw_cache/hw_cache_op/hw_cache_result arrays needs to follow
PERF_COUNT_HW_CACHE_*MAX enums. Adding those MAXs to be size
of those arrays, so possible ommision in future wil not lead to
segfault.
Adding read/write/prefetch as allowed operations for node cache
event.
Jean Delvare [Sat, 16 Jul 2011 15:42:00 +0000 (17:42 +0200)]
mutex: Make mutex_destroy() an inline function
The non-debug variant of mutex_destroy is a no-op, currently
implemented as a macro which does nothing. This approach fails
to check the type of the parameter, so an error would only show
when debugging gets enabled. Using an inline function instead,
offers type checking for earlier bug catching.
On backend change, we flushed out outstanding skbs
but forgot to update the used ring, so that
done entries were left in the ubuf_info ring.
As a result we lose heads or complete incorrect ones,
crashing the guest or leaking memory.
Fix by updating the used ring.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jan Beulich [Tue, 19 Jul 2011 10:53:07 +0000 (11:53 +0100)]
x86: Serialize EFI time accesses on rtc_lock
The EFI specification requires that callers of the time related
runtime functions serialize with other CMOS accesses in the
kernel, as the EFI time functions may choose to also use the
legacy CMOS RTC.
Besides fixing a latent bug, this is a prerequisite to safely
enable the rtc-efi driver for x86, which ought to be preferred
over rtc-cmos on all EFI platforms.
Jan Beulich [Tue, 19 Jul 2011 10:39:03 +0000 (11:39 +0100)]
x86: Serialize SMP bootup CMOS accesses on rtc_lock
With CPU hotplug, there is a theoretical race between other CMOS
(namely RTC) accesses and those done in the SMP secondary
processor bringup path.
I am unware of the problem having been noticed by anyone in practice,
but it would very likely be rather spurious and very hard to reproduce.
So to be on the safe side, acquire rtc_lock around those accesses.
Jan Beulich [Tue, 19 Jul 2011 12:00:45 +0000 (13:00 +0100)]
x86: Fix write lock scalability 64-bit issue
With the write lock path simply subtracting RW_LOCK_BIAS there
is, on large systems, the theoretical possibility of overflowing
the 32-bit value that was used so far (namely if 128 or more
CPUs manage to do the subtraction, but don't get to do the
inverse addition in the failure path quickly enough).
A first measure is to modify RW_LOCK_BIAS itself - with the new
value chosen, it is good for up to 2048 CPUs each allowed to
nest over 2048 times on the read path without causing an issue.
Quite possibly it would even be sufficient to adjust the bias a
little further, assuming that allowing for significantly less
nesting would suffice.
However, as the original value chosen allowed for even more
nesting levels, to support more than 2048 CPUs (possible
currently only for 64-bit kernels) the lock itself gets widened
to 64 bits.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich [Tue, 19 Jul 2011 12:00:19 +0000 (13:00 +0100)]
x86: Unify rwsem assembly implementation
Rather than having two functionally identical implementations
for 32- and 64-bit configurations, use the previously extended
assembly abstractions to fold the rwsem two implementations into
a shared one.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258DF3020000780004E3ED@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich [Tue, 19 Jul 2011 11:59:51 +0000 (12:59 +0100)]
x86: Unify rwlock assembly implementation
Rather than having two functionally identical implementations
for 32- and 64-bit configurations, extend the existing assembly
abstractions enough to fold the two rwlock implementations into
a shared one.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dave Chinner [Mon, 18 Jul 2011 03:40:18 +0000 (03:40 +0000)]
xfs: convert AIL cursors to use struct list_head
The list of active AIL cursors uses a roll-your-own linked list with
special casing for the AIL push cursor. Simplify this code by
replacing the list with standard struct list_head lists, and use a
separate list_head to track the active cursors. This allows us to
treat the AIL push cursor as a generic cursor rather than as a
special case, further simplifying the code.
Further, fix the duplicate push cursor initialisation that the
special case handling was hiding, and clean up all the comments
around the active cursor list handling.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Mon, 18 Jul 2011 03:40:17 +0000 (03:40 +0000)]
xfs: remove confusing ail cursor wrapper
xfs_trans_ail_cursor_set() doesn't set the cursor to the current log
item, it sets it to the next item. There is already a function for
doing this - xfs_trans_ail_cursor_next() - and the _set function is
simply a two line wrapper. Remove it and open code the setting of
the cursor in the two locations that call it to remove the
confusion.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Mon, 18 Jul 2011 03:40:16 +0000 (03:40 +0000)]
xfs: use a cursor for bulk AIL insertion
Delayed logging can insert tens of thousands of log items into the
AIL at the same LSN. When the committing of log commit records
occur, we can get insertions occurring at an LSN that is not at the
end of the AIL. If there are thousands of items in the AIL on the
tail LSN, each insertion has to walk the AIL to find the correct
place to insert the new item into the AIL. This can consume large
amounts of CPU time and block other operations from occurring while
the traversals are in progress.
To avoid this repeated walk, use a AIL cursor to record
where we should be inserting the new items into the AIL without
having to repeat the walk. The cursor infrastructure already
provides this functionality for push walks, so is a simple extension
of existing code. While this will not avoid the initial walk, it
will avoid repeating it tens of thousands of times during a single
checkpoint commit.
This version includes logic improvements from Christoph Hellwig.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
J. Bruce Fields [Thu, 14 Jul 2011 20:50:36 +0000 (20:50 +0000)]
xfs: failure mapping nfs fh to inode should return ESTALE
On xfs exports, nfsd is incorrectly returning ENOENT instead of
ESTALE on attempts to use a filehandle of a deleted file (spotted
with pynfs test PUTFH3). The ENOENT was coming from xfs_iget.
(It's tempting to wonder whether we should just map all xfs_iget
errors to ESTALE, but I don't believe so--xfs_iget can also return
ENOMEM at least, which we wouldn't want mapped to ESTALE.)
While we're at it, the other return of ENOENT in xfs_nfs_get_inode()
also looks wrong.
Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Avoid creating superfluous NUMA domains on non-NUMA systems
sched: Allow for overlapping sched_domain spans
sched: Break out cpu_power from the sched_group structure
John Stultz [Wed, 20 Jul 2011 22:42:55 +0000 (15:42 -0700)]
time: Fix stupid KERN_WARN compile issue
Terribly embarassing. Don't know how I committed this, but its
KERN_WARNING not KERN_WARN.
This fixes the following compile error:
kernel/time/timekeeping.c: In function ‘__timekeeping_inject_sleeptime’:
kernel/time/timekeeping.c:608: error: ‘KERN_WARN’ undeclared (first use in this function)
kernel/time/timekeeping.c:608: error: (Each undeclared identifier is reported only once
kernel/time/timekeeping.c:608: error: for each function it appears in.)
kernel/time/timekeeping.c:608: error: expected ‘)’ before string constant
make[2]: *** [kernel/time/timekeeping.o] Error 1
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: John Stultz <john.stultz@linaro.org>
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86. reboot: Make Dell Latitude E6320 use reboot=pci
x86, doc only: Correct real-mode kernel header offset for init_size
x86: Disable AMD_NUMA for 32bit for now
Balaji T K [Fri, 1 Jul 2011 16:39:35 +0000 (22:09 +0530)]
mmc: omap_hsmmc: add runtime pm support
* Add runtime pm support to HSMMC host controller.
* Use runtime pm API to enable/disable HSMMC clock.
* Use runtime autosuspend APIs to enable auto suspend delay.
Based on OMAP HSMMC runtime implementation by Kevin Hilman and
Kishore Kadiyala.
Signed-off-by: Balaji T K <balajitk@ti.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Balaji T K [Fri, 1 Jul 2011 16:39:34 +0000 (22:09 +0530)]
mmc: omap_hsmmc: Remove lazy_disable
lazy_disable framework in OMAP HSMMC manages multiple low power states and
card is powered off after inactivity time of 8 seconds. Based on previous
discussion on the list, card power (regulator) handling (when to power
OFF/ON) should ideally be handled by core layer. Remove usage of lazy
disable to allow core layer _only_ to handle card power. With the removal
of lazy disable framework, MMC regulators are left ON until MMC_POWER_OFF
via set_ios.
Signed-off-by: Balaji T K <balajitk@ti.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Philip Rakity [Wed, 6 Jul 2011 15:51:32 +0000 (08:51 -0700)]
mmc: core: Set non-default Drive Strength via platform hook
Non default Drive Strength cannot be set automatically. It is a function
of the board design and only if there is a specific platform handler can
it be set. The platform handler needs to take into account the board
design. Pass to the platform code the necessary information.
For example: The card and host controller may indicate they support HIGH
and LOW drive strength. There is no way to know what should be chosen
without specific board knowledge. Setting HIGH may lead to reflections
and setting LOW may not suffice. There is no mechanism (like ethernet
duplex or speed pulses) to determine what should be done automatically.
If no platform handler is defined -- use the default value.
Signed-off-by: Philip Rakity <prakity@marvell.com> Reviewed-by: Arindam Nath <arindam.nath@amd.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:33 +0000 (18:55 +0200)]
mmc: block: add handling for two parallel block requests in issue_rw_rq
Change mmc_blk_issue_rw_rq() to become asynchronous.
The execution flow looks like this:
* The mmc-queue calls issue_rw_rq(), which sends the request
to the host and returns back to the mmc-queue.
* The mmc-queue calls issue_rw_rq() again with a new request.
* This new request is prepared in issue_rw_rq(), then it waits for
the active request to complete before pushing it to the host.
* When the mmc-queue is empty it will call issue_rw_rq() with a NULL
req to finish off the active request without starting a new request.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar <sourav.poddar@ti.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:31 +0000 (18:55 +0200)]
mmc: queue: add a second mmc queue request member
Add an additional mmc queue request instance to make way for two active
block requests. One request may be active while the other request is
being prepared.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar <sourav.poddar@ti.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:29 +0000 (18:55 +0200)]
mmc: block: add a block request prepare function
Break out code from mmc_blk_issue_rw_rq to create a block request prepare
function. This doesn't change any functionallity. This helps when handling
more than one active block request.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar <sourav.poddar@ti.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Sat, 9 Jul 2011 21:12:36 +0000 (17:12 -0400)]
mmc: block: add member in mmc queue struct to hold request data
The way the request data is organized in the mmc queue struct, it only
allows processing of one request at a time. This patch adds a new struct
to hold mmc queue request data such as sg list, request, blk request and
bounce buffers, and updates any functions depending on the mmc queue
struct. This prepares for using multiple active requests in one mmc queue.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar <sourav.poddar@ti.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:27 +0000 (18:55 +0200)]
mmc: mmc_test: test to measure how sg_len affect performance
Add a test that measures how the mmc bandwidth depends on the numbers of
sg elements in the sg list. The transfer size if fixed and sg length goes
from a few up to 512. The purpose is to measure overhead caused by
multiple sg elements.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar <sourav.poddar@ti.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:26 +0000 (18:55 +0200)]
mmc: mmc_test: add test for non-blocking transfers
Add four tests for read and write performance per
different transfer size, 4k to 4M.
* Read using blocking mmc request
* Read using non-blocking mmc request
* Write using blocking mmc request
* Write using non-blocking mmc request
The host driver must support pre_req() and post_req()
in order to run the non-blocking test cases.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar<sourav.poddar@ti.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:24 +0000 (18:55 +0200)]
mmc: mmci: implement pre_req() and post_req()
pre_req() runs dma_map_sg() and prepares the dma descriptor for the next
mmc data transfer. post_req() runs dma_unmap_sg. If not calling pre_req()
before mmci_request(), mmci_request() will prepare the cache and dma just
like it did it before. It is optional to use pre_req() and post_req()
for mmci.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:23 +0000 (18:55 +0200)]
mmc: omap_hsmmc: add support for pre_req and post_req
pre_req() runs dma_map_sg(), post_req() runs dma_unmap_sg. If not calling
pre_req() before omap_hsmmc_request(), dma_map_sg will be issued before
starting the transfer. It is optional to use pre_req(). If issuing
pre_req(), post_req() must be called as well.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar <sourav.poddar@ti.com> Signed-off-by: Chris Ball <cjb@laptop.org>
Per Forlin [Fri, 1 Jul 2011 16:55:22 +0000 (18:55 +0200)]
mmc: core: add non-blocking mmc request function
Previously there has only been one function mmc_wait_for_req()
to start and wait for a request. This patch adds:
* mmc_start_req() - starts a request wihtout waiting
If there is on ongoing request wait for completion
of that request and start the new one and return.
Does not wait for the new command to complete.
This patch also adds new function members in struct mmc_host_ops
only called from core.c:
* pre_req - asks the host driver to prepare for the next job
* post_req - asks the host driver to clean up after a completed job
The intention is to use pre_req() and post_req() to do cache maintenance
while a request is active. pre_req() can be called while a request is
active to minimize latency to start next job. post_req() can be used after
the next job is started to clean up the request. This will minimize the
host driver request end latency. post_req() is typically used before
ending the block request and handing over the buffer to the block layer.
Add a host-private member in mmc_data to be used by pre_req to mark the
data. The host driver will then check this mark to see if the data is
prepared or not.
Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Venkatraman S <svenkatr@ti.com> Tested-by: Sourav Poddar <sourav.poddar@ti.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
mmc: MAINTAINERS: change omap_hsmmc maintainence to orphan
Update the OMAP HSMMC entry from the MAINTAINERS file as I will
no longer be able to maintain this driver.
Signed-off-by: Madhusudhan Chikkature <madhu.cr@ti.com>
[khilman@ti.com: change to Orphan rather than complete removal] Signed-off-by: Kevin Hilman <khilman@ti.com> Acked-by: Venkatraman S <svenkatr@ti.com> Signed-off-by: Chris Ball <cjb@laptop.org>