Steven Rostedt [Wed, 25 Feb 2009 20:54:30 +0000 (15:54 -0500)]
tracing: wrap arguments with PARAMS
Peter Zijlstra warned that TPPROTO and TPARGS might become something
other than a simple copy of itself. To prevent this from having
side effects in the TRACE_FORMAT macro in tracepoint.h, we add a
PARAMS() macro to be defined as just a wrapper.
Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Wed, 25 Feb 2009 20:49:52 +0000 (15:49 -0500)]
tracing: rename DEFINE_TRACE_FMT to just TRACE_FORMAT
There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should
be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it
TRACE_FORMAT.
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Now that several per-cpu files can be read or spliced at the
same, we want the read/splice callbacks for tracing files to be
reentrants.
Until now, a single global mutex (trace_types_lock) serialized
the access to tracing_read_pipe(), tracing_splice_read_pipe(),
and the seq helpers.
Ie: it means that if a user tries to read trace_pipe0 and
trace_pipe1 at the same time, the access to the function
tracing_read_pipe() is contended and one reader must wait for
the other to finish its read call.
The trace_type_lock mutex is mostly here to serialize the access
to the global current tracer (current_trace), which can be
changed concurrently. Although the iter struct keeps a private
pointer to this tracer, its callbacks can be changed by another
function.
The method used here is to not keep anymore private reference to
the tracer inside the iterator but to make a copy of it inside
the iterator. Then it checks on subsequents read calls if the
tracer has changed. This is not costly because the current
tracer is not expected to be changed often, so we use a branch
prediction for that.
Moreover, we add a private mutex to the iterator (there is one
iterator per file descriptor) to serialize the accesses in case
of multiple consumers per file descriptor (which would be a
silly idea from the user). Note that this is not to protect the
ring buffer, since the ring buffer already serializes the
readers accesses. This is to prevent from traces weirdness in
case of concurrent consumers. But these mutexes can be dropped
anyway, that would not result in any crash. Just tell me what
you think about it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:
- trace is an iterator through the ring-buffer. It's a reader
but not a consumer It doesn't block when no more traces are
available.
- trace pretty similar to the former, except that it adds more
informations such as prempt count, irq flag, ...
- trace_pipe is a reader and a consumer, it will also block
waiting for traces if necessary (heh, yes it's a pipe).
The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.
The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.
So this patch creates a new directory: /debug/tracing/per_cpu/.
Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.
Which means if you have two cpus, you will have:
trace0
trace1
trace_pipe0
trace_pipe1
And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.
The original all-in-one cpu trace file are still available on
their original place.
Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.
Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.
Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 25 Feb 2009 10:03:44 +0000 (11:03 +0100)]
tracing: remove /debug/tracing/latency_trace
Impact: remove old debug/tracing API
/debug/tracing/latency_trace is an old legacy format we kept from
the old latency tracer. Remove the file for now. If there's any
useful bit missing then we'll propagate any useful output bits into
the /debug/tracing/trace output.
Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Tue, 24 Feb 2009 17:07:53 +0000 (12:07 -0500)]
tracing: add DEFINE_TRACE_FMT to tracepoint.h
This patch creates a DEFINE_TRACE_FMT to map to DECLARE_TRACE.
This allows for the developers to place format strings and
args in with their tracepoint declaration. A tracer may now
override the DEFINE_TRACE_FMT macro and use it to record
a default format.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 19 Feb 2009 18:41:27 +0000 (13:41 -0500)]
ftrace: break out modify loop immediately on detection of error
Impact: added precaution on failure detection
Break out of the modifying loop as soon as a failure is detected.
This is just an added precaution found by code review and was not
found by any bug chasing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 19 Feb 2009 18:32:57 +0000 (13:32 -0500)]
ftrace: immediately stop code modification if failure is detected
Impact: fix to prevent NMI lockup
If the page fault handler produces a WARN_ON in the modifying of
text, and the system is setup to have a high frequency of NMIs,
we can lock up the system on a failure to modify code.
The modifying of code with NMIs allows all NMIs to modify the code
if it is about to run. This prevents a modifier on one CPU from
modifying code running in NMI context on another CPU. The modifying
is done through stop_machine, so only NMIs must be considered.
But if the write causes the page fault handler to produce a warning,
the print can slow it down enough that as soon as it is done
it will take another NMI before going back to the process context.
The new NMI will perform the write again causing another print and
this will hang the box.
This patch turns off the writing as soon as a failure is detected
and does not wait for it to be turned off by the process context.
This will keep NMIs from getting stuck in this back and forth
of print outs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Tue, 17 Feb 2009 22:57:30 +0000 (17:57 -0500)]
ftrace, x86: make kernel text writable only for conversions
Impact: keep kernel text read only
Because dynamic ftrace converts the calls to mcount into and out of
nops at run time, we needed to always keep the kernel text writable.
But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
the kernel code to writable before ftrace modifies the text, and converts
it back to read only afterward.
The kernel text is converted to read/write, stop_machine is called to
modify the code, then the kernel text is converted back to read only.
The original version used SYSTEM_STATE to determine when it was OK
or not to change the code to rw or ro. Andrew Morton pointed out that
using SYSTEM_STATE is a bad idea since there is no guarantee to what
its state will actually be.
Instead, I moved the check into the set_kernel_text_* functions
themselves, and use a local variable to determine when it is
OK to change the kernel text RW permissions.
[ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]
Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Sometimes it happens that KConfig dependencies are not handled
like in the following scenario:
- config A
bool
- config B
bool
depends on A
- config C
bool
select B
If one selects C, then it will select B without checking its
dependency to A, if A hasn't been selected elsewhere, it will
result in a build failure.
This is what happens on the following build error:
kernel/built-in.o: In function `marker_update_probe_range':
(.text+0x52f64): undefined reference to `tracepoint_probe_register_noupdate'
kernel/built-in.o: In function `marker_update_probe_range':
(.text+0x52f74): undefined reference to `tracepoint_probe_unregister_noupdate'
kernel/built-in.o: In function `marker_update_probe_range':
(.text+0x52fb9): undefined reference to `tracepoint_probe_unregister_noupdate'
kernel/built-in.o: In function `marker_update_probes':
marker.c:(.text+0x530ba): undefined reference to `tracepoint_probe_update_all'
CONFIG_KVM_TRACE will select CONFIG_MARKER, but the latter
depends on CONFIG_TRACEPOINTS which will not be selected.
Steven Rostedt [Tue, 17 Feb 2009 18:35:06 +0000 (13:35 -0500)]
ftrace: allow archs to preform pre and post process for code modification
This patch creates the weak functions: ftrace_arch_code_modify_prepare
and ftrace_arch_code_modify_post_process that are called before and
after the stop machine is called to modify the kernel text.
If the arch needs to do pre or post processing, it only needs to define
these functions.
[ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_*
over using ftrace_arch_modify_* ]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
SLUB: Introduce and use SLUB_MAX_SIZE and SLUB_PAGE_SHIFT constants
As a preparational patch to bump up page allocator pass-through threshold,
introduce two new constants SLUB_MAX_SIZE and SLUB_PAGE_SHIFT and convert
mm/slub.c to use them.
Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Ingo Molnar [Fri, 20 Feb 2009 07:04:13 +0000 (08:04 +0100)]
x86: use the right protections for split-up pagetables
Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().
The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.
The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.
With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.
Also update the comment.
This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.
[ Huang Ying also experienced problems in this area when writing
the EFI code, but the real bug in split_large_page() was not
realized back then. ]
Linus Torvalds [Thu, 19 Feb 2009 17:14:35 +0000 (09:14 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
x86, mce: use force_sig_info to kill process in machine check
x86, mce: reinitialize per cpu features on resume
x86, rcu: fix strange load average and ksoftirqd behavior
Linus Torvalds [Thu, 19 Feb 2009 17:14:22 +0000 (09:14 -0800)]
Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: limit the number of loops the ring buffer self test can make
tracing: have function trace select kallsyms
tracing: disable tracing while testing ring buffer
tracing/function-graph-tracer: trace the idle tasks
Linus Torvalds [Thu, 19 Feb 2009 16:35:52 +0000 (08:35 -0800)]
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] fix "mem=" handling in case of standby memory
[S390] Fix timeval regression on s390
[S390] sclp: handle empty event buffers
Linus Torvalds [Thu, 19 Feb 2009 16:35:29 +0000 (08:35 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
sound: virtuoso: revert "do not overwrite EEPROM on Xonar D2/D2X"
ALSA: jack - Use card->shortname for input name
ALSA: usb-audio - Workaround for misdetected sample rate with CM6207
ALSA: usb-audio - Fix non-continuous rate detection
sound: usb-audio: fix uninitialized variable with M-Audio MIDI interfaces
Revert "Sound: hda - Restore PCI configuration space with interrupts off"
Makito SHIOKAWA [Thu, 19 Feb 2009 14:34:59 +0000 (15:34 +0100)]
[ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC
READ_IMPLIES_EXEC must be set when:
o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X)
o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6)
Signed-off-by: Makito SHIOKAWA <lkhmkt@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Heiko Carstens [Thu, 19 Feb 2009 14:19:01 +0000 (15:19 +0100)]
[S390] fix "mem=" handling in case of standby memory
Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduced a timing regression:
-bash-3.2# time ls
real 0m0.006s
user 0m1.754s
sys 0m1.094s
The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Handle a malformed hardware response which some versions of the
Support Element (SE) may present during SE restart and which otherwise
would result in an endless loop in function sclp_dispatch_evbufs.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Russell King [Thu, 19 Feb 2009 13:25:16 +0000 (13:25 +0000)]
[ARM] omap: fix clock reparenting in omap2_clk_set_parent()
When changing the parent of a clock, it is necessary to keep the
clock use counts balanced otherwise things the parent state will
get corrupted. Since we already disable and re-enable the clock,
we might as well use the recursive versions instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Wed, 18 Feb 2009 21:29:22 +0000 (22:29 +0100)]
[ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()
In the non highmem case, if two memory banks of 1GB each are provided,
the second bank would evade suppression since its virtual base would
be 0. Fix this by disallowing any memory bank which virtual base
address is found to be lower than PAGE_OFFSET.
Reported-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
More user reports show that the overwriting of the EEPROM is not
triggered by using this driver but by installing Linux, and that the
installation of any other operating system (even one without any CMI8788
driver) has the same effect. In other words, the presence of this
driver does not have any effect on the occurrence of the error. (So
far, the available evidence seems to point to a BIOS bug.)
Furthermore, it turns out that the EEPROM chip is protected against
stray write commands by the command format and by requiring a separate
write-enable command, so the error scenario in the previous commit (that
SPI writes can be misinterpreted as an EEPROM write command) is not even
theoretically possible.
The mixer control that was removed as a consequence of the previous
commit can only be partially emulated in userspace, which also means it
cannot be seen be the in-kernel OSS API emulation, so it is better to
revert that change.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Steven Rostedt [Thu, 19 Feb 2009 03:50:01 +0000 (22:50 -0500)]
tracing: limit the number of loops the ring buffer self test can make
Impact: prevent deadlock if ring buffer gets corrupted
This patch adds a paranoid check to make sure the ring buffer consumer
does not go into an infinite loop. Since the ring buffer has been set
to read only, the consumer should not loop for more than the ring buffer
size. A check is added to make sure the consumer does not loop more than
the ring buffer size.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Thu, 19 Feb 2009 03:06:18 +0000 (22:06 -0500)]
tracing: have function trace select kallsyms
Impact: fix output of function tracer to be useful
The function tracer is pretty useless if KALLSYMS is not configured.
Unless you are good at reading hex values, the function tracer should
select the KALLSYMS configuration.
Also, the dynamic function tracer will fail its self test if KALLSYMS
is not selected.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Steven Rostedt [Wed, 18 Feb 2009 23:33:57 +0000 (18:33 -0500)]
tracing: disable tracing while testing ring buffer
Impact: fix to prevent hard lockup on self tests
If one of the tracers are broken and is constantly filling the ring
buffer while the test of the ring buffer is running, it will hang
the box. The reason is that the test is a consumer that will not
stop till the ring buffer is empty. But if the tracer is broken and
is constantly producing input to the buffer, this test will never
end. The result is a lockup of the box.
This happened when KALLSYMS was not defined and the dynamic ftrace
test constantly filled the ring buffer, because the filter failed
and all functions were being traced. Something was being called
that constantly filled the buffer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Linus Torvalds [Thu, 19 Feb 2009 02:33:04 +0000 (18:33 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
block: fix booting from partitioned md array
block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb
cciss: PCI power management reset for kexec
paride/pg.c: xs(): &&/|| confusion
fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
block: fix bad definition of BIO_RW_SYNC
bsg: Fix sense buffer bug in SG_IO
Linus Torvalds [Thu, 19 Feb 2009 01:55:15 +0000 (17:55 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
omap_hsmmc: Change while(); loops with finite version
omap_hsmmc: recover from transfer failures
omap_hsmmc: only MMC1 allows HCTL.SDVS != 1.8V
omap_hsmmc: card detect irq bugfix
sdhci: fix led naming
mmc_test: fix basic read test
s3cmci: Fix hangup in do_pio_write()
Revert "sdhci: force high speed capability on some controllers"
MMC: fix bug - SDHC card capacity not correct
Randy Dunlap [Wed, 18 Feb 2009 22:48:39 +0000 (14:48 -0800)]
x86: dell-laptop: depends on POWER_SUPPLY
Build breaks when DELL_LAPTOP=y and POWER_SUPPLY=m. DELL_LAPTOP needs to
depend on POWER_SUPPLY.
dell-laptop.c:(.text+0x1ef3c4): undefined reference to `power_supply_is_system_supplied'
dell-laptop.c:(.text+0x1ef45e): undefined reference to `power_supply_is_system_supplied'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Wed, 18 Feb 2009 22:48:38 +0000 (14:48 -0800)]
fbdev/drm: fix Kconfig submenu mess in "Graphics support"
Submenus of the graphics support "Support for frame buffer devices" and
"Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)" are
broken in half after latest changes for Intel 915 mode setting support.
The DRM subsection is broken because one option is put outside the choice
section it depends on.
The frame buffers part is broken then due to circular dependency. Fix
this by make Intel frame buffers depend on CONFIG_INTEL_AGP.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
floppy: request and release only the ports we actually use
The floppy driver requests an I/O port it doesn't need, and sometimes this
causes a conflict with a motherboard device reported by PNPBIOS.
This patch makes the floppy driver request and release only the ports it
actually uses. It also factors out the request/release stuff and the
io-ports list so they're all in one place now.
but it requests 0x3f2-0x3f5 and 0x3f7, which includes the unused port
0x3f3.
Some BIOSes report 0x3f3 as a motherboard resource. The PNP system driver
reserves that, which causes a conflict when the floppy driver requests
0x3f2-0x3f5 later.
Philippe reported that this conflict broke the floppy driver between
2.6.11 and 2.6.22. His PNPBIOS reports these devices:
$ cat 00:03/id 00:03/resources # floppy device
PNP0700
state = active
io 0x3f4-0x3f5
io 0x3f2-0x3f2
Reference:
http://lkml.org/lkml/2009/1/31/162
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Philippe De Muyter <phdm@macqel.be> Reported-by: Philippe De Muyter <phdm@macqel.be> Tested-by: Philippe De Muyter <phdm@macqel.be> Cc: Adam M Belay <abelay@mit.edu> Cc: Robert Hancock <hancockrwd@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have a Digi Neo 8 PCI card (114f:00b1) Serial controller: Digi
International Digi Neo 8 (rev 05)
that works with the jsm driver after using the following patch.
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Cc: Scott H Kilau <Scott_Kilau@digi.com> Cc: Wendy Xiong <wendyx@us.ibm.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.
This patch:
Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
I think it makes fix-for-memmap-init not easy.
This patch moves all declaration to include/linux/mm.h
After this,
if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
-> Use static definition in include/linux/mm.h
else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
-> Use generic definition in mm/page_alloc.c
else
-> per-arch back end function will be called.
The cause is after alloc_super() and then retry, an old entry in list
fs_supers is found, so grab_super(old) is called, but both functions hold
s_umount lock:
struct super_block *sget(...)
{
...
retry:
spin_lock(&sb_lock);
if (test) {
list_for_each_entry(old, &type->fs_supers, s_instances) {
if (!test(old, data))
continue;
if (!grab_super(old)) <--- 2nd: down_write(&old->s_umount);
goto retry;
if (s)
destroy_super(s);
return old;
}
}
if (!s) {
spin_unlock(&sb_lock);
s = alloc_super(type); <--- 1th: down_write(&s->s_umount)
if (!s)
return ERR_PTR(-ENOMEM);
goto retry;
}
...
}
It seems like a false positive, and seems like VFS but not cgroup needs to
be fixed.
Peter said:
We can simply put the new s_umount instance in a but lockdep doesn't
particularly cares about subclass order.
If there's any issue with the callers of sget() assuming the s_umount lock
being of sublcass 0, then there is another annotation we can use to fix
that, but lets not bother with that if this is sufficient.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Reported-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Menage <menage@google.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Nemoto [Wed, 18 Feb 2009 22:48:28 +0000 (14:48 -0800)]
atmel_serial might lose modem status change
I found a problem of handling of modem status of atmel_serial driver.
With the commit 1ecc26 ("atmel_serial: split the interrupt handler"),
handling of modem status signal was splitted into two parts. The
atmel_tasklet_func() compares new status with irq_status_prev, but
irq_status_prev is not correct if signal status was changed while the port
is closed.
Here is a sequence to cause problem:
1. Remote side sets CTS (and DSR).
2. Local side close the port.
3. Local side clears RTS and DTR.
4. Remote side clears CTS and DSR.
5. Local side reopen the port. hw_stopped becomes 1.
6. Local side sets RTS and DTR.
7. Remote side sets CTS and DSR.
Then CTS change interrupt can be received, but since CTS bit in
irq_status_prev and new status is same, uart_handle_cts_change() will not
be called (so hw_stopped will not be cleared, i.e. cannot send any data).
I suppose irq_status_prev should be initialized at somewhere in open
sequence.
Itai Levi pointed out that we need to initialize atmel_port->irq_status
as well here. His analysis is as follows:
> Regarding the second part of the patch (which resets irq_status_prev),
> it turns out that both versions of the patch (mine and Atsushi's)
> still leave enough room for faulty behavior when opening the port.
>
> This is because we are not resetting both irq_status_prev and
> irq_status in atmel_startup() to CSR, which leads faulty behavior in
> the following sequences:
>
> First case:
> 1. closing the port while CTS line = 1 (TX not allowed)
> 2. setting CTS line = 0 (TX allowed)
> 3. opening the port
> 4. transmitting one char
> 5. Cannot transmit more chars, although CTS line is 0
>
> Second case:
> 1. closing the port while CTS line = 0 (TX allowed)
> 2. setting CTS line = 1 (TX not allowed)
> 3. opening the port
> 4. receiving some chars
> 5. Now we can transmit, although CTS line is 1
>
> This reason for this is that the tasklet is scheduled as a result of
> TX or RX interrupts (not a status change!), in steps 4 above. Inside
> the tasklet, the atmel_port->irq_status (which holds the value from
> the previous session) is compared to atmel_port->irq_status_prev.
> Hence, a status-change of the CTS line is faultily detected.
>
> Both cases were verified on 9260 hardware.
[haavard.skinnemoen@atmel.com: folded with patch from Itai Levi] Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Cc: Remy Bohmer <linux@bohmer.net> Cc: Marc Pignat <marc.pignat@hevs.ch> Cc: Itai Levi <itai.levi.devel@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Wed, 18 Feb 2009 22:48:26 +0000 (14:48 -0800)]
atmel-mci: fix initialization of dma slave data
The conversion of atmel-mci to dma_request_channel missed the
initialization of the channel dma_slave information. The filter_fn passed
to dma_request_channel is responsible for initializing the channel's
private data. This implementation has the additional benefit of enabling
a generic client-channel data passing mechanism.
Reviewed-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Giuseppe Bilotta [Wed, 18 Feb 2009 22:48:25 +0000 (14:48 -0800)]
lis3lv02d: add axes knowledge of HP Pavilion dv5 models
Add support for HP Pavilion dv5.
Since Intel-based models have an inverted x axis, while AMD-based models
have an inverted y axis, we introduce a new macro that special-cases axis
orientation based on two DMI entries: HP dv5 axis configuration is then
based on both the PRODUCT and BOARD name.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Cc: Eric Piel <Eric.Piel@tremplin-utc.net> Cc: Pavel Machek <pavel@suse.cz> Tested-by: Palatis Tseng <palatis@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Giuseppe Bilotta [Wed, 18 Feb 2009 22:48:24 +0000 (14:48 -0800)]
lis3lv02d: support both one- and two-byte sensors
Sensors responding with 0x3B to WHO_AM_I only have one data register per
direction, thus returning a signed byte from the position which is
occupied by the MSB in sensors responding with 0x3A.
Since multiple sensors share the reply to WHO_AM_I, we rename the defines
to better indicate what they identify (family of single and double
precision sensors).
We support both kind of sensors by checking for the sensor type on init
and defining appropriate data-access routines and sensor limits (for the
joystick) depending on what we find.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Acked-by: Eric Piel <Eric.Piel@tremplin-utc.net> Cc: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Machek [Wed, 18 Feb 2009 22:48:23 +0000 (14:48 -0800)]
hp accelerometer: add freefall detection
This adds freefall handling to hp_accel driver. According to HP, it
should just work, without us having to set the chip up by hand.
hpfall.c is example .c program that parks the disk when accelerometer
detects free fall. It should work; for now, it uses fixed 20seconds
protection period.
Signed-off-by: Pavel Machek <pavel@suse.cz> Cc: Thomas Renninger <trenn@suse.de> Cc: Éric Piel <eric.piel@tremplin-utc.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Wed, 18 Feb 2009 22:48:22 +0000 (14:48 -0800)]
eeepc: should depend on INPUT
Otherwise with INPUT=m, EEEPC_LAPTOP=y one gets
drivers/built-in.o: In function `input_sync':
eeepc-laptop.c:(.text+0x18ce51): undefined reference to `input_event'
drivers/built-in.o: In function `input_report_key':
eeepc-laptop.c:(.text+0x18ce73): undefined reference to `input_event'
drivers/built-in.o: In function `eeepc_hotk_check':
eeepc-laptop.c:(.text+0x18d05f): undefined reference to `input_allocate_device'
eeepc-laptop.c:(.text+0x18d10f): undefined reference to `input_register_device'
eeepc-laptop.c:(.text+0x18d131): undefined reference to `input_free_device'
drivers/built-in.o: In function `eeepc_backlight_exit':
eeepc-laptop.c:(.text+0x18d546): undefined reference to `input_unregister_device'
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Len Brown <lenb@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken
config dependncies. Fix that.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Wed, 18 Feb 2009 22:48:20 +0000 (14:48 -0800)]
cgroups: fix possible use after free
In cgroup_kill_sb(), root is freed before sb is detached from the list, so
another sget() may find this sb and call cgroup_test_super(), which will
access the root that has been freed.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 18 Feb 2009 22:48:18 +0000 (14:48 -0800)]
mm: task dirty accounting fix
YAMAMOTO-san noticed that task_dirty_inc doesn't seem to be called properly for
cases where set_page_dirty is not used to dirty a page (eg. mark_buffer_dirty).
Additionally, there is some inconsistency about when task_dirty_inc is
called. It is used for dirty balancing, however it even gets called for
__set_page_dirty_no_writeback.
So rather than increment it in a set_page_dirty wrapper, move it down to
exactly where the dirty page accounting stats are incremented.
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davide Libenzi [Wed, 18 Feb 2009 22:48:18 +0000 (14:48 -0800)]
timerfd: add flags check
As requested by Michael, add a missing check for valid flags in
timerfd_settime(), and make it return EINVAL in case some extra bits are
set.
Michael said:
If this is to be any use to userland apps that want to check flag
support (perhaps it is too late already), then the sooner we get it
into the kernel the better: 2.6.29 would be good; earlier stables as
well would be even better.
Pavel Machek [Wed, 18 Feb 2009 22:48:16 +0000 (14:48 -0800)]
Pavel has moved
My @suse.cz address will stop working some day, so put working one into
MAINTAINERS/CREDITS. It would be cool to get this to 2.6.29... it should
not really break anything.
Signed-off-by: Pavel Machek <pavel@ucw.cz> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Biederman [Wed, 18 Feb 2009 22:48:16 +0000 (14:48 -0800)]
seq_file: properly cope with pread
Currently seq_read assumes that the offset passed to it is always the
offset it passed to user space. In the case pread this assumption is
broken and we do the wrong thing when presented with pread.
To solve this I introduce an offset cache inside of struct seq_file so we
know where our logical file position is. Then in seq_read if we try to
read from another offset we reset our data structures and attempt to go to
the offset user space wanted.
[akpm@linux-foundation.org: restore FMODE_PWRITE]
[pjt@google.com: seq_open needs its fmode opened up to take advantage of this] Signed-off-by: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Turner <pjt@google.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Turner [Wed, 18 Feb 2009 22:48:15 +0000 (14:48 -0800)]
vfs: separate FMODE_PREAD/FMODE_PWRITE into separate flags
Separate FMODE_PREAD and FMODE_PWRITE into separate flags to reflect the
reality that the read and write paths may have independent restrictions.
A git grep verifies that these flags are always cleared together so this
new behavior will only apply to interfaces that change to clear flags
individually.
This is required for "seq_file: properly cope with pread", a post-2.6.25
regression fix.
[akpm@linux-foundation.org: add comment] Signed-off-by: Paul Turner <pjt@google.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Wed, 18 Feb 2009 22:48:13 +0000 (14:48 -0800)]
aoe: ignore vendor extension AoE responses
The Welland ME-747K-SI AoE target generates unsolicited AoE responses that
are marked as vendor extensions. Instead of ignoring these packets, the
aoe driver was generating kernel messages for each unrecognized response
received. This patch corrects the behavior.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Reported-by: <karaluh@karaluh.pl> Tested-by: <karaluh@karaluh.pl> Cc: <stable@kernel.org> Cc: Alex Buell <alex.buell@munted.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have get_vm_area_caller() and __get_vm_area() but not
__get_vm_area_caller()
On powerpc, I use __get_vm_area() to separate the ranges of addresses
given to vmalloc vs. ioremap (various good reasons for that) so in order
to be able to implement the new caller tracking in /proc/vmallocinfo, I
need a "_caller" variant of it.
(akpm: needed for ongoing powerpc development, so merge it early)
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Pihet [Wed, 11 Feb 2009 21:11:39 +0000 (13:11 -0800)]
omap_hsmmc: recover from transfer failures
Timeouts during a command that has a data phase can result in the next
command issued after the command that failed not being processed, i.e. no
interrupt ever occurs to indicate the command has completed. This failure
can result in a deadlock.
This patch resets the data state machine to clear the error in case of a
command timeout.
Tested on OMAP3430 chip and intensive MMC/SD device removal while
transferring data.
Signed-off-by: Andy Lowe <alowe@mvista.com> Signed-off-by: Jean Pihet <jpihet@mvista.com> Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
David Brownell [Tue, 17 Feb 2009 22:49:01 +0000 (14:49 -0800)]
omap_hsmmc: only MMC1 allows HCTL.SDVS != 1.8V
Based on a patch from Tony Lindgren ... after initialization,
never change HCTL.SDVS except for MMC1. The other controller
instances only support 1.8V in that field, although they can
suport other card/SDIO/eMMC/... voltages with level shifting
solutions such as external transceivers.
MMC2 behavior sanity tested on Overo/WLAN, OMAP3430 SDP, and
custom hardware. MMC1 also sanity tested on those platforms
plus Beagle. This also fixes a bug preventing MMC2 (and also
presumably MMC3) from powering down when requested.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
David Brownell [Wed, 4 Feb 2009 22:42:03 +0000 (14:42 -0800)]
omap_hsmmc: card detect irq bugfix
Work around lockdep issue when card detect IRQ handlers run in
thread context ... it forces IRQF_DISABLED, which prevents all
access to twl4030 card detect signals.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Yauhen Kharuzhy [Wed, 11 Feb 2009 21:25:52 +0000 (13:25 -0800)]
s3cmci: Fix hangup in do_pio_write()
This commit fixes the regression what was added by commit 088a78af978d0c8e339071a9b2bca1f4cb368f30 "s3cmci: Support transfers
which are not multiple of 32 bits."
fifo_free() now returns amount of available space in FIFO buffer in
bytes. But do_pio_write() writes to FIFO 32-bit words. Condition for
return from cycle is (fifo_free() == 0), but when fifo has 1..3 bytes
of free space then this condition will never be true and system hangs.
This patch changes condition in the while() to (fifo_free() > 3).
Signed-off-by: Yauhen Kharuzhy <jekhor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Takashi Iwai [Wed, 18 Feb 2009 15:46:27 +0000 (16:46 +0100)]
ALSA: jack - Use card->shortname for input name
Currently the jack layer refers to card->longname as a part of
its input device name string. However, longname is often really long
and way too ugly as an identifier, such as,
"HDA Intel at 0xf8400000 irq 21".
This patch changes the code to use card->shortname instead.
The shortname string contains usually the h/w vendor and product
names but without messy I/O port or IRQ numbers.
Hannes Reinecke [Wed, 18 Feb 2009 09:30:15 +0000 (10:30 +0100)]
block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
blk_abort_queue() iterates the timeout list and aborts each request on the
list, but if the driver error handling readds a request to the timeout list
during this processing, we could be looping forever. Fix this by splicing
current entries to a local list and run over that list instead.
broke a particular case for booting from partitioned md/raid devices.
That is the second time this has been broken recently. The previous
time was fixed by
Because the data isn't available when an md device is first created
(we add disks and set it up after creation), the initial partition
scan finds nothing. It is not until the device is opened that
another partition scan happens and finds something.
So at the point where the kernel parameter "root=/dev/md_d0p1" is
being parsed, md_d0 exists, but md_d0p1 does not.
However if we let blk_lookup_devt return the correct device number
even though the device doesn't exist, then the attempt to mount it
will successfully find the partition.
I have tried in the past to find a way to get the partition table to
be read as soon as the array is assembled but that proved impossible
(at the time). I don't remember the details, and could possibly
revisit it. However it would be really nice if blk_lookup_devt
could be adjusted to again accept non existant partitions.
The above commit added WRITE_SYNC and switched various places to using
that for committing writes that will be waited upon immediately after
submission. However, this causes a performance regression with AS and CFQ
for ext3 at least, since sync_dirty_buffer() will submit some writes with
WRITE_SYNC while ext3 has sumitted others dependent writes without the sync
flag set. This causes excessive anticipation/idling in the IO scheduler
because sync and async writes get interleaved, causing a big performance
regression for the below test case (which is meant to simulate sqlite
like behaviour).
Chip Coldwell [Mon, 16 Feb 2009 12:11:56 +0000 (13:11 +0100)]
cciss: PCI power management reset for kexec
The kexec kernel resets the CCISS hardware in three steps:
1. Use PCI power management states to reset the controller in the
kexec kernel.
2. Clear the MSI/MSI-X bits in PCI configuration space so that MSI
initialization in the kexec kernel doesn't fail.
3. Use the CCISS "No-op" message to determine when the controller
firmware has recovered from the PCI PM reset.
[akpm@linux-foundation.org: cleanups] Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 16 Feb 2009 09:25:40 +0000 (10:25 +0100)]
block: fix bad definition of BIO_RW_SYNC
We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before 213d9417fec62ef4c3675621b9364a667954d4dd.
Boaz Harrosh [Tue, 3 Feb 2009 06:47:29 +0000 (07:47 +0100)]
bsg: Fix sense buffer bug in SG_IO
When submitting requests via SG_IO, which does a sync io, a
bsg_command is not allocated. So an in-Kernel sense_buffer was not
set. However when calling blk_execute_rq() with no sense buffer
one is provided from the stack. Now bsg at blk_complete_sgv4_hdr_rq()
would check if rq->sense_len and a sense was requested by sg_io_v4
the rq->sense was copy_user() back, but by now it is already mangled
stack memory.
I have fixed that by forcing a sense_buffer when calling bsg_map_hdr().
The bsg_command->sense is provided in the write/read path like before,
and on-the-stack buffer is provided when doing SG_IO.
I have also fixed a dprintk message to print rq->errors in hex because
of the scsi bit-field use of this member. For other block devices it
does not matter anyway.