perf: Make the trace events sample period default to 1
Trace events are mostly used for tracing and then require not to
be lost when possible. As opposite to hardware events that really
require to trigger after a given sample period, trace events mostly
need to trigger everytime.
It is a frustrating experience to trace with perf and realize we
lost a lot of events because we forgot the "-c 1" option.
Then default sample_period to 1 for trace events but let the user
override it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>
perf: Always record tracepoints raw samples from perf record
Trace events are mostly used for tracing rather than simple
counting. Don't bother anymore with adding -R when using them,
just record raw samples of trace events every time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 14 Apr 2010 21:58:03 +0000 (23:58 +0200)]
perf: Fix dynamic field detection
Checking if a tracing field is an array with a dynamic length
requires to check the field type and seek the "__data_loc"
string that prepends the actual type, as can be found in a trace
event format file:
hlist helpers need to be available for all software events, not
only trace events.
Pull them out outside the ifdef CONFIG_EVENT_TRACING section.
Fixes:
kernel/perf_event.c:4573: error: implicit declaration of function 'swevent_hlist_put'
kernel/perf_event.c:4614: error: implicit declaration of function 'swevent_hlist_get'
kernel/perf_event.c:5534: error: implicit declaration of function 'swevent_hlist_release
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1271281338-23491-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf: Make clock software events consistent with general exclusion rules
The cpu/task clock events implement their own version of exclusion
on top of exclude_user and exclude_kernel.
The result is that when the event triggered in the kernel but we
have exclude_kernel set, we try to rewind using task_pt_regs.
There are two side effects of this:
- we call task_pt_regs even on kernel threads, which doesn't give
us the desired result.
- if the event occured in the kernel, we shouldn't rewind to the
user context. We want to actually ignore the event.
get_irq_regs() will always give us the right interrupted context, so
use its result and submit it to perf_exclude_context() that knows
when an event must be ignored.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
Each time a software event triggers, we need to walk through
the entire list of events from the current cpu and task contexts
to retrieve a running perf event that matches.
We also need to check a matching perf event is actually counting.
This walk is wasteful and makes the event fast path scaling
down with a growing number of events running on the same
contexts.
To solve this, we store the running perf events in a hashlist to
get an immediate access to them against their type:event_id when
they trigger.
v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic
maths along the way)
- Only allocate hlist for online cpus, but keep track of the
refcount on offline possible cpus too, so that we allocate it
if needed when it becomes online.
- Drop the kref use as it's not adapted to our tricks anymore.
v3: - Fix bad refcount check (address instead of value). Thanks to
Eric Dumazet who spotted this.
- While exiting cpu, move the hlist release out of the IPI path
to lock the hlist mutex sanely.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
In terms of usability, it's not that bad, but it does require
the user to type and remember more than necessary.
This patch allows the user to accomplish the same thing without
specifying the separate record/report steps or the pipe. So the
same command as above can be accomplished more simply as:
$ perf trace rwtop 5
Notice that the '-i -' and '-o -' aren't required in this case -
they're added internally, and that any extra arguments are
passed along to the report script (but not to the record
script).
The overall effect is that any of the scripts listed in 'perf
trace -l' can now be used directly in live mode, with the
expected arguments, by simply specifying the script and args to
'perf trace'.
Tom Zanussi [Fri, 2 Apr 2010 04:59:24 +0000 (23:59 -0500)]
perf trace/scripting: Enable scripting shell scripts for live mode
It should be possible to run any perf trace script in 'live
mode'. This requires being able to pass in e.g. '-i -' or other
args, which the current shell scripts aren't equipped to handle.
In a few cases, there are required or optional args that also
need special handling. This patch makes changes the current set
of shell scripts as necessary.
Tom Zanussi [Fri, 2 Apr 2010 04:59:23 +0000 (23:59 -0500)]
perf trace/scripting: Add rwtop and sctop scripts
A couple of scripts, one in Python and the other in Perl, that
demonstrate 'live mode' tracing. For each, the output of the
perf event stream is fed continuously to the script, which
continuously aggregates the data and reports the current results
every 3 seconds, or at the optionally specified interval. After
the current results are displayed, the aggregations are cleared
and the cycle begins anew.
To run the scripts, simply pipe the output of the 'perf trace
record' step as input to the corresponding 'perf trace report'
step, using '-' as the filename to -o and -i:
Tom Zanussi [Fri, 2 Apr 2010 04:59:22 +0000 (23:59 -0500)]
perf: Convert perf header build_ids into build_id events
Bypasses the build_id perf header code and replaces it with a
synthesized event and processing function that accomplishes the
same thing, used when reading/writing perf data to/from a pipe.
Tom Zanussi [Fri, 2 Apr 2010 04:59:21 +0000 (23:59 -0500)]
perf: Convert perf tracing data into a tracing_data event
Bypasses the tracing_data perf header code and replaces it with
a synthesized event and processing function that accomplishes
the same thing, used when reading/writing perf data to/from a
pipe.
The tracing data is pretty large, and this patch doesn't attempt
to break it down into component events. The tracing_data event
itself doesn't actually contain the tracing data, rather it
arranges for the event processing code to skip over it after
it's read, using the skip return value added to the event
processing loop in a previous patch.
Tom Zanussi [Fri, 2 Apr 2010 04:59:20 +0000 (23:59 -0500)]
perf: Convert perf event types into event type events
Bypasses the event type perf header code and replaces it with a
synthesized event and processing function that accomplishes the
same thing, used when reading/writing perf data to/from a pipe.
Tom Zanussi [Fri, 2 Apr 2010 04:59:19 +0000 (23:59 -0500)]
perf: Convert perf header attrs into attr events
Bypasses the attr perf header code and replaces it with a
synthesized event and processing function that accomplishes the
same thing, used when reading/writing perf data to/from a pipe.
Making the attrs into events allows them to be streamed over a
pipe along with the rest of the header data (in later patches).
It also paves the way to allowing events to be added and removed
from perf sessions dynamically.
Tom Zanussi [Fri, 2 Apr 2010 04:59:18 +0000 (23:59 -0500)]
perf trace: Introduce special handling for pipe input
Adds special treatment for stdin - if the user specifies '-i -'
to perf trace, the intent is that the event stream be read from
stdin rather than from a disk file.
The actual handling of the '-' filename is done by the session;
this just adds a signal handler to stop reporting, and turns off
interference by the pager.
Tom Zanussi [Fri, 2 Apr 2010 04:59:17 +0000 (23:59 -0500)]
perf report: Introduce special handling for pipe input
Adds special treatment for stdin - if the user specifies '-i -'
to perf report, the intent is that the event stream be written
to stdin rather than from a disk file.
The actual handling of the '-' filename is done by the session;
this just adds a signal handler to stop reporting, and turns off
interference by the pager.
Tom Zanussi [Fri, 2 Apr 2010 04:59:16 +0000 (23:59 -0500)]
perf record: Introduce special handling for pipe output
Adds special treatment for stdout - if the user specifies '-o -'
to perf record, the intent is that the event stream be written
to stdout rather than to a disk file.
Also, redirect stdout of forked child to stderr - in pipe mode,
stdout of the forked child interferes with the stdout perf
stream, so redirect it to stderr where it can still be seen but
won't be mixed in with the perf output.
Tom Zanussi [Fri, 2 Apr 2010 04:59:15 +0000 (23:59 -0500)]
perf: Add pipe-specific header read/write and event processing code
This patch makes several changes to allow the perf event stream
to be sent and received over a pipe:
- adds pipe-specific versions of the header read/write code
- adds pipe-specific version of the event processing code
- adds a range of event types to be used for header or other
pseudo events, above the range used by the kernel
- checks the return value of event handlers, which they can use
to skip over large events during event processing rather than actually
reading them into event objects.
- unifies the multiple do_read() functions and updates its
users.
Note that none of these changes affect the existing perf data
file format or processing - this code only comes into play if
perf output is sent to stdout (or is read from stdin).
Ian Munsie [Tue, 13 Apr 2010 08:37:33 +0000 (18:37 +1000)]
perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR()
Parsing an option from the command line with OPT_BOOLEAN on a
bool data type would not work on a big-endian machine due to the
manner in which the boolean was being cast into an int and
incremented. For example, running 'perf probe --list' on a
PowerPC machine would fail to properly set the list_events bool
and would therefore print out the usage information and
terminate.
This patch makes OPT_BOOLEAN work as expected with a bool
datatype. For cases where the original OPT_BOOLEAN was
intentionally being used to increment an int each time it was
passed in on the command line, this patch introduces OPT_INCR
with the old behaviour of OPT_BOOLEAN (the verbose variable is
currently the only such example of this).
I have reviewed every use of OPT_BOOLEAN to verify that a true
C99 bool was passed. Where integers were used, I verified that
they were only being used for boolean logic and changed them to
bools to ensure that they would not be mistakenly used as ints.
The major exception was the verbose variable which now uses
OPT_INCR instead of OPT_BOOLEAN.
Signed-off-by: Ian Munsie <imunsie@au.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport Cc: Git development list <git@vger.kernel.org> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Valdis.Kletnieks@vt.edu Cc: WANG Cong <amwang@redhat.com> Cc: Thiago Farina <tfransosi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: John Kacur <jkacur@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf trace: Ignore "overwrite" field if present in /events/header_page
That is not used in perf where we have the LOST events.
Without this patch we get:
[root@doppio ~]# perf lock report | head -3
Warning: Error: expected 'data' but read 'overwrite'
So, to make the same perf command work with kernels with and without
this field, introduce variants for the parsing routines to not warn the
user in such case.
Discussed-with: Steven Rostedt <rostedt@goodmis.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Randy Dunlap [Wed, 31 Mar 2010 18:31:00 +0000 (11:31 -0700)]
perf: cleanup some Documentation
Correct typos in perf bench & perf sched help text.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20100331113100.cc898487.randy.dunlap@oracle.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Randy Dunlap [Wed, 31 Mar 2010 18:30:56 +0000 (11:30 -0700)]
perf bench: fix spello
Fix spello in user message.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Paul Mackerra <paulus@samba.org>s
LKML-Reference: <20100331113056.2c7df509.randy.dunlap@oracle.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Using 'pahole --packable' I found some structs that could be reorganized
to eliminate alignment holes, in some cases getting them to be cacheline
multiples.
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
hvc_console: Fix race between hvc_close and hvc_remove
virtio: disable multiport console support.
virtio: console makes incorrect assumption about virtio API
virtio: console: Fix early_put_chars usage
MAINTAINERS: Put the virtio-console entry in correct alphabetical order
Currently early_put_chars is not used by virtio_console because it can
only be used once a port has been found, at which point it's too late
because it is no longer needed. This patch should fix it.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Kevin Hilman [Wed, 7 Apr 2010 18:52:46 +0000 (11:52 -0700)]
rwsem generic spinlock: use IRQ save/restore spinlocks
rwsems can be used with IRQs disabled, particularily in early boot
before IRQs are enabled. Currently the spin_unlock_irq() usage in the
slow-patch will unconditionally enable interrupts and cause problems
since interrupts are not yet initialized or enabled.
This patch uses save/restore versions of IRQ spinlocks in the slowpath
to ensure interrupts are not unintentionally disabled.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Wed, 7 Apr 2010 23:06:07 +0000 (00:06 +0100)]
Have nfs ->d_revalidate() report errors properly
If nfs atomic open implementation ends up doing open request from
->d_revalidate() codepath and gets an error from server, return that error
to caller explicitly and don't bother with lookup_instantiate_filp() at all.
->d_revalidate() can return an error itself just fine...
See
http://bugzilla.kernel.org/show_bug.cgi?id=15674
http://marc.info/?l=linux-kernel&m=126988782722711&w=2
for original report.
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
x86: Increase CONFIG_NODES_SHIFT max to 10
ibft, x86: Change reserve_ibft_region() to find_ibft_region()
x86, hpet: Fix bug in RTC emulation
x86, hpet: Erratum workaround for read after write of HPET comparator
bootmem, x86: Fix 32bit numa system without RAM on node 0
nobootmem, x86: Fix 32bit numa system without RAM on node 0
x86: Handle overlapping mptables
x86: Make e820_remove_range to handle all covered case
x86-32, resume: do a global tlb flush in S4 resume
arch/arm/mach-davinci/include/mach/da8xx.h:104: warning: ‘struct platform_device’
declared inside parameter list
arch/arm/mach-davinci/include/mach/da8xx.h:104: warning: its scope is only this
definition or declaration, which is probably not what you want
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: mixart: range checking proc file
ALSA: hda - Fix a wrong array range check in patch_realtek.c
ALSA: ASoC: move dma_data from snd_soc_dai to snd_soc_pcm_stream
ALSA: hda - Enable amplifiers on Acer Inspire 6530G
ASoC: Only do WM8994 bias off transition from standby
ASoC: Don't use DCS_DATAPATH_BUSY for WM hubs devices
ASoC: Don't do runtime wm_hubs DC servo updates if using offset correction
ASoC: Support second DC servo readback method for wm_hubs
ASoC: Avoid wraparound in wm_hubs DC servo correction
ALSA: echoaudio - Eliminate use after free
ALSA: i2c: cleanup: change parameter to pointer
ALSA: hda - Add MSI blacklist for Aopen MZ915-M
ASoC: OMAP: Fix capture pointer handling for OMAP1510 to work correctly with recent ALSA PCM code
ALSA: hda - Update document about MSI and interrupts
ALSA: hda: Fix 0 dB offset for Lenovo Thinkpad models using AD1981
ALSA: hda - Add missing printk argument in previous patch
ASoC: Fix passing platform_data to ac97 bus users and fix a leak
ALSA: hda - Fix ADC/MUX assignment of ALC269 codec
ALSA: hda - Fix invalid bit values passed to snd_hda_codec_amp_stereo()
ASoC: wm8994: playback => capture
David Howells [Tue, 6 Apr 2010 21:35:09 +0000 (14:35 -0700)]
fs-cache: order the debugfs stats correctly
Order the debugfs statistics correctly. The values displayed through a
seq_printf() statement should be in the same order as the names in the
format string.
In the 'Lookups' line, objects created ('crt=') and lookups timed out
('tmo=') have their values transposed.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 6 Apr 2010 21:35:09 +0000 (14:35 -0700)]
frv: fix kernel/user segment handling in NOMMU mode
In NOMMU mode, the FRV segment handling is broken because KERNEL_DS ==
USER_DS. This causes tests of the following sort:
/* don't pin down non-user-based iovecs */
if (segment_eq(get_fs(), KERNEL_DS))
return NULL;
to malfunction.
To fix this, make USER_DS the top of RAM instead of the top of the non-IO
address space, and make KERNEL_DS one more than the top of the non-IO
address space.
Also get rid of FRV's __addr_ok() as nothing uses it.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On exit paths in mxc_rtc_probe() method some resources are not freed
correctly.
This patch fixes:
* unrequested memory region containing imx RTC registers
* iounmap() isn't called on exit_free_pdata branch
* clock get rate is called for freed clock source
* clock isn't disabled on exit_put_clk branch
To simplify the fix managed device resources are used.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <daniel@caiaq.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently, memcg's FILE_MAPPED accounting has following race with
move_account (happens at rmdir()).
increment page->mapcount (rmap.c)
mem_cgroup_update_file_mapped() move_account()
lock_page_cgroup()
check page_mapped() if
page_mapped(page)>1 {
FILE_MAPPED -1 from old memcg
FILE_MAPPED +1 to old memcg
}
.....
overwrite pc->mem_cgroup
unlock_page_cgroup()
lock_page_cgroup()
FILE_MAPPED + 1 to pc->mem_cgroup
unlock_page_cgroup()
Then,
old memcg (-1 file mapped)
new memcg (+2 file mapped)
This happens because move_account see page_mapped() which is not guarded
by lock_page_cgroup(). This patch adds FILE_MAPPED flag to page_cgroup
and move account information based on it. Now, all checks are synchronous
with lock_page_cgroup().
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Balbir Singh <balbir@in.ibm.com> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Andrea Righi <arighi@develer.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we look into pagemap using page-types with option -p, the value of
pfn for hugepages looks wrong (see below.) This is because pte was
evaluated only once for one vma although it should be updated for each
hugepage. This patch fixes it.
One hugepage contains 1 head page and 511 tail pages in x86_64 and each
two lines represent each hugepage. Voffset and offset mean virtual
address and physical address in the page unit, respectively. The
different hugepages should not have the same offset value.
ratelimit: fix the return value when __ratelimit() fails to acquire the lock
The log of commit edaac8e3167501cda336231d00611bf59c164346 ("ratelimit:
Fix/allow use in atomic contexts"), indicates that we want to suppress the
callback when the trylock fails.
Signed-off-by: Yong Zhang <yong.zhang@windriver.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 6 Apr 2010 21:35:00 +0000 (14:35 -0700)]
vfs: rename block_fsync() to blkdev_fsync()
Requested by hch, for consistency now it is exported.
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We now call through generic_file_aio_write -> generic_write_sync ->
vfs_fsync_range. vfs_fsync_range has:
if (!fop || !fop->fsync) {
ret = -EINVAL;
goto out;
}
But drivers/char/raw.c doesn't set an fsync method.
We have two options: fix it or remove the raw driver completely. I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.
The patch below adds an fsync method to the raw driver. My knowledge of
the block layer is pretty sketchy so this could do with a once over.
If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.
Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since Valentin's email address @siemens.com is no longer valid, it's time
to change it to the one that actually works so that I don't have to
manually forward patches against mb862xx to him every time.
Signed-off-by: Alexander Shishkin <virtuoso@slind.org> Acked-by: Valentin Sitdikov <v.sitdikov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 6 Apr 2010 21:34:57 +0000 (14:34 -0700)]
mb862xxfb: fix acceleration module license
mb862xxfb_accel built as a separate module, but it does not have a
MODULE_LICENSE, so it taints the kernel. Add a MODULE_LICENSE to it (same
as mb862xxfb license).
Shaohua Li reported his tmpfs streaming I/O test can lead to make oom.
The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there
are 6 copies of kernel source and the test does kbuild for each copy. His
investigation shows the test has a lot of rotated anon pages and quite few
file pages, so get_scan_ratio calculates percent[0] (i.e. scanning
percent for anon) to be zero. Actually the percent[0] shoule be a big
value, but our calculation round it to zero.
Although before commit 84b18490 ("vmscan: get_scan_ratio() cleanup") , we
have the same problem too. But the old logic can rescue percent[0]==0
case only when priority==0. It had hided the real issue. I didn't think
merely streaming io can makes percent[0]==0 && priority==0 situation. but
I was wrong.
So, definitely we have to fix such tmpfs streaming io issue. but anyway I
revert the regression commit at first.
Anton Blanchard [Tue, 6 Apr 2010 21:34:55 +0000 (14:34 -0700)]
devmem: handle class_create() failure
I hit this when we had a bug in IDR for a few days. Basically sysfs would
fail to create new inodes since it uses an IDR and therefore class_create
would fail.
While we are unlikely to see this fail we may as well handle it instead of
oopsing.
Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prevents further "key xxx not in .data" bug-reports. Although some
attributes could probably be converted to static ones, this is left for
people having hardware to test.
Found by this semantic patch:
@ init @
type T;
identifier A;
@@
T {
...
struct device_attribute A;
...
};
@ main extends init @
expression E;
statement S;
identifier err;
T *name;
@@
... when != sysfs_attr_init(&name->A.attr);
(
+ sysfs_attr_init(&name->A.attr);
if (device_create_file(E, &name->A))
S
|
+ sysfs_attr_init(&name->A.attr);
err = device_create_file(E, &name->A);
)
While reviewing, I put the initialization to apropriate places.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg KH <gregkh@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Mike Isely <isely@pobox.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Sujith Thomas <sujith.thomas@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Tue, 6 Apr 2010 21:34:50 +0000 (14:34 -0700)]
mxser: spin_lock() => spin_lock_irq()
This should be spin_lock_irq() to match the spin_unlock_irq(). Originally
it was a lock_kernel() but we switched everything to spin_lock_irq() last
November.
[akpm@linux-foundation.org: fix the MOXA_ASPP_MON case too (per Jiri)] Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reset of data lines when the card is removed from the cage results in
a failure.The failure is seen if the card is removed from the cage when TC
is pending after a CMD with data received CC.The reset logic leaves the
controller in a state where niether a TC is received nor DTO.
The rest code can be safely removed here since it is taken care in the IRQ
handler.
Signed-off-by: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Adrian Hunter <adrian.hunter@nokia.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Tue, 6 Apr 2010 21:34:44 +0000 (14:34 -0700)]
vesafb: use platform_driver_probe() instead of platform_driver_register()
Commit c2e13037e6794bd0d9de3f9ecabf5615f15c160b ("platform-drivers: move
probe to .devinit.text in drivers/video") introduced a huge amount of
section mismatch warnings in vesafb code. Rather than converting all of
the annotations, do the obvious and revert the __init -> __devinit change,
and use the recommended (in that patch) alternative to calling
platform_driver_register(): vesafb depends on information obtained from by
kernel at boot time, cannot be a module, and no post-boot devices can ever
show up.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Greg KH <greg@kroah.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Härdeman [Tue, 6 Apr 2010 21:34:43 +0000 (14:34 -0700)]
include/linux/kfifo.h: fix INIT_KFIFO()
DECLARE_KFIFO creates a union with a struct kfifo and a buffer array with
size [size + sizeof(struct kfifo)].
INIT_KFIFO then sets the buffer pointer in struct kfifo to point to the
beginning of the buffer array which means that the first call to kfifo_in
will overwrite members of the struct kfifo.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Stefani Seibold <stefani@seibold.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 6 Apr 2010 21:34:41 +0000 (14:34 -0700)]
bitops: remove temporary for_each_bit()
Migration has been completed so remove this now. There's one straggler in
linux-next's drivers/mtd/sm_ftl.c. A patch has been sent.
Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
before any other tracing-related commands results in a kernel panic:
BUG: failure at arch/microblaze/kernel/ftrace.c:198/ftrace_update_ftrace_func()!
Recoding ftrace_update_ftrace_func() to use &ftrace_caller directly eliminates
the need to capture its address elsewhere (and thus rely on a particular call
sequence).
Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Signed-off-by: Michal Simek <monstr@monstr.eu>
NODEMASK_ALLOC/FREE are mapped to kmalloc/free if NODES_SHIFT > 8.
Among its several users, drivers/base/node.c wasn't including slab.h
leading to build failure if NODES_SHIFT > 8. Include slab.h from
drivers/base/node.c.
This isn't an ideal solution but including slab.h directly from
nodemask.h is not an option because nodemask.h gets included
everywhere. For now, make it work by including slab.h from its users.
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Force MSI irq handlers to run with interrupts disabled
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] hpwdt - fix lower timeout limit
[WATCHDOG] iTCO_wdt: TCO Watchdog patch for additional Intel Cougar Point DeviceIDs
[WATCHDOG] doc: Fix use of WDIOC_SETOPTIONS ioctl.
[WATCHDOG] doc: watchdog simple example: don't fail on fsync()
[WATCHDOG] set max63xx driver as ARM only
[WATCHDOG] powerpc: pika_wdt ident cannot be const
Dan Carpenter [Tue, 6 Apr 2010 16:31:26 +0000 (19:31 +0300)]
ALSA: mixart: range checking proc file
The original code doesn't take into consideration that the value of
MIXART_BA0_SIZE - pos can be less than zero which would lead to a large
unsigned value for "count".
Also I moved the check that read size is a multiple of 4 bytes below
the code that adjusts "count".
According to Intel Software Devel Manual Volume 3B, the
Nehalem-EX PMU is just like regular Nehalem (except for the
uncore support, which is completely different).
Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf kmem: Fix breakage introduced by 5a0e3ad slab.h script
Commit 5a0e3ad ("include cleanup: Update gfp.h and slab.h
includes to prepare for breaking implicit slab.h inclusion
from percpu.h") added a '#include <linux/slab.h>' to
tools/perf/builtin-kmem.h because: that tool has lines like
this:
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
smc91c92_cs: fix the problem of "Unable to find hardware address"
r8169: clean up my printk uglyness
net: Hook up cxgb4 to Kconfig and Makefile
cxgb4: Add main driver file and driver Makefile
cxgb4: Add remaining driver headers and L2T management
cxgb4: Add packet queues and packet DMA code
cxgb4: Add HW and FW support code
cxgb4: Add register, message, and FW definitions
netlabel: Fix several rcu_dereference() calls used without RCU read locks
bonding: fix potential deadlock in bond_uninit()
net: check the length of the socket address passed to connect(2)
stmmac: add documentation for the driver.
stmmac: fix kconfig for crc32 build error
be2net: fix bug in vlan rx path for big endian architecture
be2net: fix flashing on big endian architectures
be2net: fix a bug in flashing the redboot section
bonding: bond_xmit_roundrobin() fix
drivers/net: Add missing unlock
net: gianfar - align BD ring size console messages
net: gianfar - initialize per-queue statistics
...
Some BIOSes don't configure HPA during boot but do so while resuming.
This causes harddrives to shrink during resume making libata detach
and reattach them. This can be worked around by unlocking HPA if old
size equals native size.
Add ATA_DFLAG_UNLOCK_HPA so that HPA unlocking can be controlled
per-device and update ata_dev_revalidate() such that it sets
ATA_DFLAG_UNLOCK_HPA and fails with -EIO when the above condition is
detected.
Thank you for contacting us. We know that with our M225 line of SSDs
you sometimes need to disable NCQ (native command queuing) to avoid
just the type of errors you're seeing. Our recommendation for the
M225 is to add libata.force=noncq to your Linux kernel boot options,
under the kernel ATA library option.
I have sent your feedback to the engineers working on the C300, and
asked them to please pass it on to the firmware team. I have been
notified that they are in the process of testing and finalizing a
new firmware version, that you can expect to see released around the
end of April. We’ll keep you posted as to when it will be available
for download.
So, turn off NCQ on the drive w/ the current firmware revision.
Reported in the following bug.
https://bugzilla.kernel.org/show_bug.cgi?id=15573
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: lethalwp@scarlet.be Reported-by: Luke Macken <lmacken@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Wed, 31 Mar 2010 07:41:18 +0000 (16:41 +0900)]
libata: don't whine on spurious IRQ
On configurations where IRQ line is shared with a different
controller, spurious IRQs may happen continuously. The message was
put there primarily for debugging anyway. Kill it.
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
James Hogan [Mon, 5 Apr 2010 10:31:29 +0000 (11:31 +0100)]
[WATCHDOG] doc: Fix use of WDIOC_SETOPTIONS ioctl.
In the watchdog-test program and watchdog-api.txt, pass the values to
the WDIOC_SETOPTIONS ioctl as a pointer to an integer containing the
values intead of directly in the third ioctl argument. The actual
watchdog drivers in drivers/watchdog don't read the options directly
from the argument but use get_user and copy_from_user.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nick Piggin [Thu, 1 Apr 2010 08:09:40 +0000 (19:09 +1100)]
Fix up possibly racy module refcounting
Module refcounting is implemented with a per-cpu counter for speed.
However there is a race when tallying the counter where a reference may
be taken by one CPU and released by another. Reference count summation
may then see the decrement without having seen the previous increment,
leading to lower than expected count. A module which never has its
actual reference drop below 1 may return a reference count of 0 due to
this race.
Module removal generally runs under stop_machine, which prevents this
race causing bugs due to removal of in-use modules. However there are
other real bugs in module.c code and driver code (module_refcount is
exported) where the callers do not run under stop_machine.
Fix this by maintaining running per-cpu counters for the number of
module refcount increments and the number of refcount decrements. The
increments are tallied after the decrements, so any decrement seen will
always have its corresponding increment counted. The final refcount is
the difference of the total increments and decrements, preventing a
low-refcount from being returned.
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] qla1280: retain firmware for error recovery
[SCSI] attirbute_container: Initialize sysfs attributes with sysfs_attr_init
[SCSI] advansys: fix regression with request_firmware change
[SCSI] qla2xxx: Updated version number to 8.03.02-k2.
[SCSI] qla2xxx: Prevent sending mbx commands from sysfs during isp reset.
[SCSI] qla2xxx: Disable MSI on qla24xx chips other than QLA2432.
[SCSI] qla2xxx: Check to make sure multique and CPU affinity support is not enabled at the same time.
[SCSI] qla2xxx: Correct vp_idx checking during PORT_UPDATE processing.
[SCSI] qla2xxx: Honour "Extended BB credits" bit for CNAs.
[SCSI] scsi_transport_fc: Make sure commands are completed when rport is offline
[SCSI] libiscsi: Fix recovery slowdown regression
Brian Niebuhr [Tue, 9 Mar 2010 22:48:03 +0000 (16:48 -0600)]
davinci: edma: clear events in edma_start()
This patch fixes an issue where a DMA channel can erroneously process an
event generated by a previous transfer. A failure case is where DMA is
being used for SPI transmit and receive channels on OMAP L138. In this
case there is a single bit that controls all event generation from the
SPI peripheral. Therefore it is possible that between when edma_stop()
has been called for the transmit channel on a previous transfer and
edma_start() is called for the transmit channel on a subsequent transfer,
that a transmit event has been generated.
The fix is to clear events in edma_start(). This prevents false events
from being processed when events are enabled for that channel.
Signed-off-by: Brian Niebuhr <bniebuhr@efjohnson.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>