Hugh Dickins [Sat, 6 Jun 2009 20:18:09 +0000 (21:18 +0100)]
integrity: fix IMA inode leak
CONFIG_IMA=y inode activity leaks iint_cache and radix_tree_node objects
until the system runs out of memory. Nowhere is calling ima_inode_free()
a.k.a. ima_iint_delete(). Fix that by calling it from destroy_inode().
Linus Torvalds [Sat, 6 Jun 2009 19:18:14 +0000 (12:18 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
ext3/4 with synchronous writes gets wedged by Postfix
Fix nobh_truncate_page() to not pass stack garbage to get_block()
Al Viro [Wed, 13 May 2009 18:13:40 +0000 (19:13 +0100)]
ext3/4 with synchronous writes gets wedged by Postfix
OK, that's probably the easiest way to do that, as much as I don't like it...
Since iget() et.al. will not accept I_FREEING (will wait to go away
and restart), and since we'd better have serialization between new/free
on fs data structures anyway, we can afford simply skipping I_FREEING
et.al. in insert_inode_locked().
We do that from new_inode, so it won't race with free_inode in any interesting
ways and it won't race with iget (of any origin; nfsd or in case of fs
corruption a lookup) since both still will wait for I_LOCK.
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Tested-by: David Watson <dbwatson@ukfsn.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Theodore Ts'o [Tue, 12 May 2009 11:37:56 +0000 (07:37 -0400)]
Fix nobh_truncate_page() to not pass stack garbage to get_block()
The nobh_truncate_page() function is used by ext2, exofs, and jfs. Of
these three, only ext2 and jfs's get_block() function pays attention
to bh->b_size --- which is normally always the filesystem blocksize
except when the get_block() function is called by either
mpage_readpage(), mpage_readpages(), or the direct I/O routines in
fs/direct_io.c.
Unfortunately, nobh_truncate_page() does not initialize map_bh before
calling the filesystem-supplied get_block() function. So ext2 and jfs
will try to calculate the number of blocks to map by taking stack
garbage and shifting it left by inode->i_blkbits. This should be
*mostly* harmless (except the filesystem will do some unnneeded work)
unless the stack garbage is less than filesystem's blocksize, in which
case maxblocks will be zero, and the attempt to find out whether or
not the filesystem has a hole at a given logical block will fail, and
the page cache entry might not get zero'ed out.
Also if the stack garbage in in map_bh->state happens to have the
BH_Mapped bit set, there could be an attempt to call readpage() on a
non-existent page, which could cause nobh_truncate_page() to return an
error when it should not.
Fix this by initializing map_bh->state and map_bh->size.
Fortunately, it's probably fairly unlikely that ext2 and jfs users
mount with nobh these days.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mark Langsdorf [Sun, 5 Jul 2009 20:50:52 +0000 (15:50 -0500)]
x86: enable GART-IOMMU only after setting up protection methods
The current code to set up the GART as an IOMMU enables GART
translations before it removes the aperture from the kernel memory
map, sets the GART PTEs to UC, sets up the guard and scratch
pages, or does a wbinvd(). This leaves the possibility of cache
aliasing open and can cause system crashes.
Re-order the code so as to enable the GART translations only
after all safeguards are in place and the tlb has been flushed.
AMD has tested this patch on both Istanbul systems and 1st
generation Opteron systems with APG enabled and seen no adverse
effects. Istanbul systems with HT Assist enabled sometimes
see MCE errors due to cache artifacts with the unmodified
code.
Alan Cox [Wed, 13 May 2009 14:02:27 +0000 (15:02 +0100)]
[libata] pata_ali: Use IGN_SIMPLEX
Some ALi devices report simplex if they have been disabled and re-enabled, and
restoring the byte does not work. Ignore it - the needed supporting logic is
already present for the SATA ULi ports.
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: Fix oops and use after free during space balancing
Btrfs: set device->total_disk_bytes when adding new device
Linus Torvalds [Fri, 5 Jun 2009 18:53:44 +0000 (11:53 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata_piix: Add HP Compaq nc6000 to the broken poweroff list
ahci: add warning messages for hp laptops with broken suspend
pata_efar: fix PIO2 underclocking
pata_legacy: wait for async probing
Tejun Heo [Sat, 30 May 2009 11:50:12 +0000 (20:50 +0900)]
ahci: add warning messages for hp laptops with broken suspend
Harddisks on HP dv[4-6] and HDX18 fail to come online after resume on
earlier BIOSen. Fortunately, HP recently released BIOS updates for
all machines to fix the issue. Detect old BIOSen, warn the user to
update BIOS on boot and suspend attempts and fail suspend.
Kudos to all the bug reporters.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kernel.org@epperson.homelinux.net Cc: emisca@gmail.com Cc: Gadi Cohen <dragon@wastelands.net> Cc: Paul Swanson <paul@procursa.com> Cc: s@ourada.org Cc: Trevor Davenport <trevor.davenport@gmail.com> Cc: corruptor1972 <steven_tierney@yahoo.co.uk> Cc: Victoria Wilson <mail@vwilson.co.uk> Cc: khiraly <khiraly.list@gmail.com> Cc: Sean <wollombi@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Sergei Shtylyov [Mon, 1 Jun 2009 19:42:10 +0000 (22:42 +0300)]
pata_efar: fix PIO2 underclocking
Fix the PIO mode 2 using mode 0 timings -- this driver should enable the
fast timing bank starting with PIO2, just like the PIIX/ICH drivers do.
Also, fix/rephrase some comments while at it.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
James Bottomley [Fri, 5 Jun 2009 14:41:39 +0000 (10:41 -0400)]
pata_legacy: wait for async probing
The basic problem here that pata_legacy attaches the host, sees if it found
any devices and detaches it if none were found. With async probing, it's not
waiting until discovery is finished before deciding it has no devices and
trying the detach leading to this warning:
One way to fix it would be to put an async_synchronize_full() before looking
for devices, which this patch does. A better way might be to separate libata
into its own domain and only wait for that.
Reported-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Dave Jones [Fri, 5 Jun 2009 16:37:07 +0000 (12:37 -0400)]
[CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH
The powernow-k8 driver checks to see that the Performance Control/Status
Registers are declared as FFH (functional fixed hardware) by the BIOS.
However, this check got broken in the commit: 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
[CPUFREQ] checkpatch cleanups for powernow-k8
Fix based on an original patch from Naga Chumbalkar.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>
Alan Cox [Fri, 5 Jun 2009 10:56:18 +0000 (11:56 +0100)]
ivtv: Fix PCI DMA direction
The ivtv stream buffers may be for receive or for send but the attached
sg handle is always destined cpu->device. We flush it correctly but the
allocation is wrongly done with the same type as the buffers.
See bug: http://bugzilla.kernel.org/show_bug.cgi?id=13385
(Note this doesn't close the bug - it fixes the ivtv part and in turn
the logging next shows up some rather alarming DMA sg list warnings in
libata)
Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Thu, 4 Jun 2009 23:29:09 +0000 (16:29 -0700)]
ptrace: revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic"
Commit 95a3540da9c81a5987be810e1d9a83640a366bd5 ("ptrace_detach: the wrong
wakeup breaks the ERESTARTxxx logic") removed the "extra"
wake_up_process() from ptrace_detach(), but as Jan pointed out this breaks
the compatibility.
I believe the changelog is right and this wake_up() is wrong in many
ways, but GDB assumes that ptrace(PTRACE_DETACH, child, 0, 0) always
wakes up the tracee.
Despite the fact this breaks SIGNAL_STOP_STOPPED/group_stop_count logic,
and despite the fact this wake_up_process() can break another
assumption: PTRACE_DETACH with SIGSTOP should leave the tracee in
TASK_STOPPED case. Because the untraced child can dequeue SIGSTOP and
call do_signal_stop() before ptrace_detach() calls wake_up_process().
Revert this change for now. We need some fixes even if we we want to keep
the current behaviour, but these fixes are not for 2.6.30.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right,
- If the untraced task does clone(CLONE_PTRACE) the new child is not traced,
we must not queue SIGSTOP.
- If we forked the traced task, but the tracer exits and untraces both the
forking task and the new child (after copy_process() drops tasklist_lock),
we should not queue SIGSTOP too.
Change the code to check task_ptrace() != 0 instead. This is still racy, but
the race is harmless.
We can race with another tracer attaching to this child, or the tracer can
exit and detach in parallel. But giwen that we didn't do wake_up_new_task()
yet, the child must have the pending SIGSTOP anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 4 Jun 2009 22:23:39 +0000 (15:23 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: ignore EDID with really tiny modes.
drm: don't associate _DRM_DRIVER maps with a master
drm/i915: intel_lvds.c fix section mismatch
drm: Hook up DPMS property handling in drm_crtc.c. Add drm_helper_connector_dpms.
drm: set permissions on edid file to 0444
drm: add newlines to text sysfs files
drm/radeon: fix ring free alignment calculations
drm: fix irq naming for kms drivers.
Salman Qazi [Thu, 4 Jun 2009 22:20:39 +0000 (15:20 -0700)]
drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero
While running 20 parallel instances of dd as follows:
#!/bin/bash
for i in `seq 1 20`; do
dd if=/dev/zero of=/export/hda3/dd_$i bs=1073741824 count=1 &
done
wait
on a 16G machine, we noticed that rather than just killing the processes,
the entire kernel went down. Stracing dd reveals that it first does an
mmap2, which makes 1GB worth of zero page mappings. Then it performs a
read on those pages from /dev/zero, and finally it performs a write.
The machine died during the reads. Looking at the code, it was noticed
that /dev/zero's read operation had been changed by 557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") from giving
zero page mappings to actually zeroing the page.
The zeroing of the pages causes physical pages to be allocated to the
process. But, when the process exhausts all the memory that it can, the
kernel cannot kill it, as it is still in the kernel mode allocating more
memory. Consequently, the kernel eventually crashes.
To fix this, I propose that when a fatal signal is pending during
/dev/zero read operation, we simply return and let the user process die.
Signed-off-by: Salman Qazi <sqazi@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Modified error return and comment trivially. - Linus] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Mason [Thu, 4 Jun 2009 19:34:51 +0000 (15:34 -0400)]
Btrfs: Fix oops and use after free during space balancing
The btrfs allocator uses list_for_each to walk the available block
groups when searching for free blocks. It starts off with a hint
to help find the best block group for a given allocation.
The hint is resolved into a block group, but we don't properly check
to make sure the block group we find isn't in the middle of being
freed due to filesystem shrinking or balancing. If it is being
freed, the list pointers in it are bogus and can't be trusted. But,
the code happily goes along and uses them in the list_for_each loop,
leading to all kinds of fun.
The fix used here is to check to make sure the block group we find really
is on the list before we use it. list_del_init is used when removing
it from the list, so we can do a proper check.
The allocation clustering code has a similar bug where it will trust
the block group in the current free space cluster. If our allocation
flags have changed (going from single spindle dup to raid1 for example)
because the drives in the FS have changed, we're not allowed to use
the old block group any more.
The fix used here is to check the current cluster against the
current allocation flags.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Rusty Russell [Wed, 3 Jun 2009 05:22:24 +0000 (14:52 +0930)]
lguest: fix 'unhandled trap 13' with CONFIG_CC_STACKPROTECTOR
We don't set up the canary; let's disable stack protector on boot.c so
we can get into lguest_init, then set it up. As a side effect,
switch_to_new_gdt() sets up %fs for us properly too.
Eric Anholt [Thu, 4 Jun 2009 11:18:14 +0000 (11:18 +0000)]
drm/i915: Remove a bad BUG_ON in the fence management code.
This could be triggered by a gtt mapping fault on 965 that decides to
remove the fence from another object that happens to be active currently.
Since the other object doesn't rely on the fence reg for its execution, we
don't wait for it to finish. We'll soon be not waiting on 915 most of the
time as well, so just drop the BUG_ON.
Yinghai Lu [Wed, 3 Jun 2009 07:13:13 +0000 (00:13 -0700)]
x86/pci: fix mmconfig detection with 32bit near 4g
Pascal reported and bisected a commit:
| x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case
which broke one system system.
ACPI: Using IOAPIC for interrupt routing
PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 255
PCI: MCFG area at f0000000 reserved in ACPI motherboard resources
PCI: Using MMCONFIG for extended config space
it didn't have
PCI: updated MCFG configuration 0: base f0000000 segment 0 buses 0 - 63
anymore, and try to use 0xf000000 - 0xffffffff for mmconfig
For 32bit, mcfg_res->end could be 32bit only (if 64 resources aren't used)
So use end - 1 to pass the value in mcfg->end to avoid overflow.
We don't need to worry about the e820 path, they are always 64 bit.
Yu Zhao [Wed, 27 May 2009 16:25:05 +0000 (00:25 +0800)]
PCI: use fixed-up device class when configuring device
The device class may be changed after the fixup, so re-read the class
value from pci_dev when configuring the device. Otherwise some devices
such as JMicron SATA controller won't work.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Philipp Zabel [Tue, 26 May 2009 20:03:32 +0000 (22:03 +0200)]
[ARM] pxa: fix pxa27x_udc default pullup GPIO
Currently, pxa27x_udc tries to use GPIO 0 as D+ pullup if not
explicitly configured. Default to an invalid GPIO (-1) instead.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Eric Miao <eric.miao@marvell.com>
Ben Skeggs [Tue, 26 May 2009 00:35:52 +0000 (10:35 +1000)]
drm: don't associate _DRM_DRIVER maps with a master
A driver will use the _DRM_DRIVER map flag to indicate that it wants
to be responsible for removing the map itself, bypassing the DRM's
automagic cleanup code.
Since the multi-master changes this has been broken, resulting in some
drivers having their registers unmapped before it's finished with them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Keith Packard [Sun, 31 May 2009 03:42:28 +0000 (20:42 -0700)]
drm: Hook up DPMS property handling in drm_crtc.c. Add drm_helper_connector_dpms.
Making the drm_crtc.c code recognize the DPMS property and invoke the
connector->dpms function doesn't remove any capability from the driver while
reducing code duplication.
That just highlighted the problem with the existing DPMS functions which
could turn off the connector, but failed to turn off any relevant crtcs. The
new drm_helper_connector_dpms function manages all of that, using the
drm_helper-specific crtc and encoder dpms functions, automatically computing
the appropriate DPMS level for each object in the system.
This fixes the current troubles in the i915 driver which left PLLs, pipes
and planes running while in DPMS_OFF mode or even while they were unused.
Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 3 Jun 2009 21:08:13 +0000 (07:08 +1000)]
drm/radeon: fix ring free alignment calculations
fd.o bz#21849
We were aligning to +16 dwords, instead of to the next 16dword
boundary in the ring. Fix the calculation to go to the next 16dword
boundary when space checking.
Dave Liu [Wed, 6 May 2009 10:40:07 +0000 (18:40 +0800)]
sdhci-of: Fix the wrong accessor to HOSTVER register
Freescale eSDHC controller has the special order for
the HOST version register. that is not same as the other's
registers. The address of HOSTVER in spec is 0xFE, and
we need use the in_be16(0xFE) to access it, not in_be16(0xFC).
Signed-off-by: Dave Liu <daveliu@freescale.com> Acked-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Nicolas Pitre [Wed, 27 May 2009 02:35:34 +0000 (22:35 -0400)]
mvsdio: fix config failure with some high speed SDHC cards
Especially with Sandisk SDHC cards, the second SWITCH command was failing
with a timeout and the card was not recognized at all. However if the
system was busy, or debugging was enabled, or a udelay(100) was inserted
before the second SWITCH command in the core code, then the timing was
so that the card started to work.
With some unusual block sizes, the data FIFO status doesn't indicate a
"empty" state right away when the data transfer is done. Queuing
another data transfer in that condition results in a transfer timeout.
The empty FIFO bit eventually get set by itself in less than 50 usecs
when it is not set right away. So let's just poll for that bit before
configuring the controller with a new data transfer.
Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Nicolas Pitre [Fri, 15 May 2009 01:28:05 +0000 (21:28 -0400)]
mvsdio: ignore high speed timing requests from the core
Empirical evidences show that this is causing far more problems than it
solves when this mode is enabled in the host hardware. Amongst those
cards that are known to be non functional when this bit is set are:
Ben Nizette [Thu, 16 Apr 2009 05:55:21 +0000 (15:55 +1000)]
mmc/omap: Use disable_irq_nosync() from within irq handlers.
disable_irq() should wait for all running handlers to complete
before returning. As such, if it's used to disable an interrupt
from that interrupt's handler it will deadlock. This replaces
the dangerous instances with the _nosync() variant which doesn't
have this problem.
Signed-off-by: Ben Nizette <bn@niasdigital.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Pierre Ossman <pierre@ossman.eu>
This looks a bit awkward, and if we have both stack and user stack traces
running, it would be nice to have a title to tell them apart, although
it is easy to tell by the output.
This patch adds an annotation to the start of the stack traces:
tracing: fix multiple use of __print_flags and __print_symbolic
Here is an updated patch to include the extra call to
trace_seq_init() as requested. This is vs. the latest
-tip tree and fixes the use of multiple __print_flags
and __print_symbolic in a single tracer. Also tested
to ensure its working now:
mount.gfs2-2534 [000] 235.850587: gfs2_glock_queue: 8.7 glock 1:2 dequeue PR
mount.gfs2-2534 [000] 235.850591: gfs2_demote_rq: 8.7 glock 1:0 demote EX to NL flags:DI
mount.gfs2-2534 [000] 235.850591: gfs2_glock_queue: 8.7 glock 1:0 dequeue EX
glock_workqueue-2529 [000] 235.850666: gfs2_glock_state_change: 8.7 glock 1:0 state EX => NL tgt:NL dmt:NL flags:lDpI
glock_workqueue-2529 [000] 235.850672: gfs2_glock_put: 8.7 glock 1:0 state NL => IV flags:I
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
LKML-Reference: <1244037123.29604.603.camel@localhost.localdomain> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
walimis [Wed, 3 Jun 2009 08:01:29 +0000 (16:01 +0800)]
tracing/events: fix output format of kernel stack
According to "events/ftrace/kernel_stack/format", output format of
kernel stack should use "=>" instead of "<=".
The second problem is that we shouldn't skip the first entry in the stack,
although it seems to be duplicated when used in the "function" tracer,
but events also use it. If we skip the first one, we will drop the topmost
entry of the stack.
The last problem is that if the last entry is ULONG_MAX(0xffffffff), we should
drop it, otherwise it will print a NULL name line.
walimis [Wed, 3 Jun 2009 08:01:28 +0000 (16:01 +0800)]
tracing/trace_stack: fix the number of entries in the header
The last entry in the stack_dump_trace is ULONG_MAX, which is not
a valid entry, but max_stack_trace.nr_entries has accounted for it.
So when printing the header, we should decrease it by one.
Before fix, print as following, for example:
Steven Rostedt [Wed, 3 Jun 2009 13:30:10 +0000 (09:30 -0400)]
ring-buffer: discard timestamps that are at the start of the buffer
Every buffer page in the ring buffer includes its own time stamp.
When an event is recorded to the ring buffer with a delta time greater
than what can be held in the event header, a time stamp event is created.
If the the create timestamp falls over to the next buffer page, it is
redundant because the buffer page holds a full time stamp. This patch
will try to discard the time stamp when it falls to the start of the
next page.
This change also fixes a issues with disarding events. If most events are
discarded, timestamps will start to creep into the ring buffer. If we
do not discard the timestamps then they can fill up the ring buffer over
time and waste space.
This change will keep time stamps from filling up over another page. If
something is recorded in the buffer page, and the rest is filtered, then
the time stamps can only fill up to the end of the page.
[ Impact: prevent time stamps from filling ring buffer ]
Reported-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 3 Jun 2009 03:00:53 +0000 (23:00 -0400)]
ring-buffer: try to discard unneeded timestamps
There are times that a race may happen that we add a timestamp in a
nested write. This timestamp would just contain a zero delta and serves
no purpose.
Now that we have a way to discard events, this patch will try to discard
the timestamp instead of just wasting the space in the ring buffer.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tim Bird [Wed, 3 Jun 2009 00:06:54 +0000 (17:06 -0700)]
ring-buffer: fix bug in ring_buffer_discard_commit
There's a bug in ring_buffer_discard_commit. The wrong
pointer is being compared in order to check if the event
can be freed from the buffer rather than discarded
(i.e. marked as PAD).
I noticed this when I was working on duration filtering.
The bug is not deadly - it just results in lots of wasted
space in the buffer. All filtered events are left in
the buffer and marked as discarded, rather than being
removed from the buffer to make space for other events.
Unfortunately, when I fixed this bug, I got errors doing a
filtered function trace. Multiple TIME_EXTEND
events pile up in the buffer, and trigger the
following loop overage warning in rb_iter_peek():
again:
...
if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10))
return NULL;
I'm not sure what the best way is to fix this. I don't
know if I should extend the loop threshhold, or if I should
make the test more complex (ignore TIME_EXTEND
events), or just get rid of this loop check completely.
Note that if I implement a workaround for this, then I
see another problem from rb_advance_iter(). I haven't
tracked that one down yet.
In general, it seems like the case of removing filtered
events has not been working properly, and so some assumptions
about buffer invariant conditions need to be revisited.
Here's the patch for the simple fix:
Compare correct pointer for checking if an event can be
freed rather than left as discarded in the buffer.
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
LKML-Reference: <4A25BE9E.5090909@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cliff Wickman [Wed, 20 May 2009 13:10:57 +0000 (08:10 -0500)]
x86: Fix UV BAU activation descriptor init
The UV tlb shootdown code has a serious initialization error.
An array of structures [32*8] is initialized as if it were [32].
The array is indexed by (cpu number on the blade)*8, so the short
initialization works for up to 4 cpus on a blade.
But above that, we provide an invalid opcode to the hub's
broadcast assist unit.
This patch changes the allocation of the array to use its symbolic
dimensions for better clarity. And initializes all 32*8 entries.
Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's
recommendation.
Mike Galbraith [Tue, 2 Jun 2009 06:23:58 +0000 (08:23 +0200)]
x86, boot: add new generated files to the appropriate .gitignore files
git status complains of untracked (generated) files in arch/x86/boot..
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# ../../arch/x86/boot/compressed/mkpiggy
# ../../arch/x86/boot/compressed/piggy.S
# ../../arch/x86/boot/compressed/vmlinux.lds
# ../../arch/x86/boot/voffset.h
# ../../arch/x86/boot/zoffset.h
..so adjust .gitignore files accordingly.
Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Russell King [Mon, 1 Jun 2009 11:50:33 +0000 (12:50 +0100)]
[ARM] ARMv7 errata: only apply fixes when running on applicable CPU
Currently, whenever an erratum workaround is enabled, it will be
applied whether or not the erratum is relevent for the CPU. This
patch changes this - we check the variant and revision fields in the
main ID register to determine which errata to apply.
We also avoid re-applying erratum 460075 if it has already been applied.
Applying this fix in non-secure mode results in the kernel failing to
boot (or even do anything.)
This fixes booting on some ARMv7 based platforms which otherwise
silently fail.
Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Steven Rostedt [Tue, 2 Jun 2009 20:51:55 +0000 (16:51 -0400)]
function-graph: always initialize task ret_stack
On creating a new task while running the function graph tracer, if
we fail to allocate the ret_stack, and then fail the fork, the
code will free the parent ret_stack. This is because the child
duplicated the parent and currently points to the parent's ret_stack.
This patch always initializes the task's ret_stack to NULL.
[ Impact: prevent crash of parent on low memory during fork ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 2 Jun 2009 20:39:48 +0000 (16:39 -0400)]
function-graph: move initialization of new tasks up in fork
When the function graph tracer is enabled, all new tasks must allocate
a ret_stack to place the return address of functions. This is because
the function graph tracer will replace the real return address with a
call to the tracing of the exit function.
This initialization happens in fork, but it happens too late. If fork
fails, then it will call free_task and that calls the freeing of this
ret_stack. But before initialization happens, the new (failed) task
points to its parents ret_stack. If a fork failure happens during
the function trace, it would be catastrophic for the parent.
Also, there's no need to call ftrace_graph_exit_task from fork, since
it is called by free_task which fork calls on failure.
[ Impact: prevent crash during failed fork running function graph tracer ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 2 Jun 2009 18:01:19 +0000 (14:01 -0400)]
function-graph: add memory barriers for accessing task's ret_stack
The code that handles the tasks ret_stack allocation for every task
assumes that only an interrupt can cause issues (even though interrupts
are disabled).
In reality, the code is allocating the ret_stack for tasks that may be
running on other CPUs and there are not efficient memory barriers to
handle this case.
[ Impact: prevent crash due to using of uninitialized ret_stack variables ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 2 Jun 2009 16:26:07 +0000 (12:26 -0400)]
function-graph: enable the stack after initialization of other variables
The function graph tracer checks if the task_struct has ret_stack defined
to know if it is OK or not to use it. The initialization is done for
all tasks by one process, but the idle tasks use the same initialization
used by new tasks.
If an interrupt happens on an idle task that just had the ret_stack
created, but before the rest of the initialization took place, then
we can corrupt the return address of the functions.
This patch moves the setting of the task_struct's ret_stack to after
the other variables have been initialized.
[ Impact: prevent kernel panic on idle task when starting function graph ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Alan Cox [Tue, 2 Jun 2009 15:58:10 +0000 (16:58 +0100)]
parport: quickfix the proc registration bug
Ideally we should have a directory of drivers and a link to the 'active'
driver. For now just show the first device which is effectively the existing
semantics without a warning.
This is an update on the original buggy patch that I then forgot to
resubmit. Confusingly it was proposed by Red Hat, written by Etched Pixels
fixed and submitted by Intel ...
Resolves-Bug: http://bugzilla.kernel.org/show_bug.cgi?id=9749 Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 2 Jun 2009 16:47:21 +0000 (09:47 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: prevent deadlock in xfs_qm_shake()
xfs: fix overflow in xfs_growfs_data_private
xfs: fix double unlock in xfs_swap_extents()
Steven Rostedt [Tue, 2 Jun 2009 16:03:19 +0000 (12:03 -0400)]
function-graph: only allocate init tasks if it was not already done
When the function graph tracer is enabled, it calls the initialization
needed for the init tasks that would be called on all created tasks.
The problem is that this is called every time the function graph tracer
is enabled, and the ret_stack is allocated for the idle tasks each time.
Thus, the old ret_stack is lost and a memory leak is created.
This is also dangerous because if an interrupt happened on another CPU
with the init task and the ret_stack is replaced, we then lose all the
return pointers for the interrupt, and a crash would take place.
[ Impact: fix memory leak and possible crash due to race ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Joerg Roedel [Fri, 22 May 2009 19:23:13 +0000 (21:23 +0200)]
dma-debug: add debugfs file for driver filter
This patch adds the dma-api/driver_filter file to debugfs. The root user
can write a driver name into this file to see only dma-api errors for
that particular driver in the kernel log. Writing an empty string to
that file disables the driver filter.
Joerg Roedel [Fri, 22 May 2009 16:24:20 +0000 (18:24 +0200)]
dma-debug: add variables and checks for driver filter
This patch adds the state variables for the driver filter and a function
to check if the filter is enabled and matches to the current device. The
check is built into the err_printk function.
Minoru Usui [Tue, 2 Jun 2009 09:17:34 +0000 (02:17 -0700)]
net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.
When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained. After it's chained,
tc_ctl_tfilter() calls classifier's change(). When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.
In addition, cls_cgroup is initialized in change() not in init(). It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 2 Jun 2009 08:29:58 +0000 (01:29 -0700)]
e1000: add missing length check to e1000 receive routine
Patch to fix bad length checking in e1000. E1000 by default does two
things:
1) Spans rx descriptors for packets that don't fit into 1 skb on recieve
2) Strips the crc from a frame by subtracting 4 bytes from the length prior to
doing an skb_put
Since the e1000 driver isn't written to support receiving packets that span
multiple rx buffers, it checks the End of Packet bit of every frame, and
discards it if its not set. This places us in a situation where, if we have a
spanning packet, the first part is discarded, but the second part is not (since
it is the end of packet, and it passes the EOP bit test). If the second part of
the frame is small (4 bytes or less), we subtract 4 from it to remove its crc,
underflow the length, and wind up in skb_over_panic, when we try to skb_put a
huge number of bytes into the skb. This amounts to a remote DOS attack through
careful selection of frame size in relation to interface MTU. The fix for this
is already in the e1000e driver, as well as the e1000 sourceforge driver, but no
one ever pushed it to e1000. This is lifted straight from e1000e, and prevents
small frames from causing the underflow described above
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ed Swierk [Tue, 2 Jun 2009 07:19:52 +0000 (00:19 -0700)]
forcedeth: add phy_power_down parameter, leave phy powered up by default (v2)
Add a phy_power_down parameter to forcedeth: set to 1 to power down the
phy and disable the link when an interface goes down; set to 0 to always
leave the phy powered up.
The phy power state persists across reboots; Windows, some BIOSes, and
older versions of Linux don't bother to power up the phy again, forcing
users to remove all power to get the interface working (see
http://bugzilla.kernel.org/show_bug.cgi?id=13072). Leaving the phy
powered on is the safest default behavior. Users accustomed to seeing
the link state reflect the interface state and/or wanting to minimize
power consumption can set phy_power_down=1 if compatibility with other
OSes is not an issue.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Blyakher [Mon, 1 Jun 2009 18:13:24 +0000 (13:13 -0500)]
xfs: prevent deadlock in xfs_qm_shake()
It's possible to recurse into filesystem from the memory
allocation, which deadlocks in xfs_qm_shake(). Add check
for __GFP_FS, and bail out if it is not set.
Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Hedi Berriche <hedi@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
Eric Sandeen [Sat, 23 May 2009 19:30:12 +0000 (14:30 -0500)]
xfs: fix overflow in xfs_growfs_data_private
In the case where growing a filesystem would leave the last AG
too small, the fixup code has an overflow in the calculation
of the new size with one fewer ag, because "nagcount" is a 32
bit number. If the new filesystem has > 2^32 blocks in it
this causes a problem resulting in an EINVAL return from growfs:
Reported-by: richard.ems@cape-horn-eng.com Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
Felix Blyakher [Fri, 8 May 2009 00:49:45 +0000 (19:49 -0500)]
xfs: fix double unlock in xfs_swap_extents()
Regreesion from commit ef8f7fc, which rearranged the code in
xfs_swap_extents() leading to double unlock of xfs inode ilock.
That resulted in xfs_fsr deadlocking itself on platforms, which
don't handle double unlock of rw_semaphore nicely. It caused the
count go negative, which represents the write holder, without
really having one. ia64 is one of the platforms where deadlock
was easily reproduced and the fix was tested.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>
Steven Rostedt [Tue, 2 Jun 2009 01:51:28 +0000 (21:51 -0400)]
ftrace: do not profile functions when disabled
A race was found that if one were to enable and disable the function
profiler repeatedly, then the system can panic. This was because a profiled
function may be preempted just before disabling interrupts. While
the profiler is disabled and then reenabled, the preempted function
could start again, and access the hash as it is being initialized.
This just adds a check in the irq disabled part to check if the profiler
is enabled, and if it is not then it will just exit.
When the system is disabled, the profile_enabled variable is cleared
before calling the unregistering of the function profiler. This
unregistering calls stop machine which also acts as a synchronize schedule.
[ Impact: fix panic in enabling/disabling function profiler ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 1 Jun 2009 19:16:05 +0000 (15:16 -0400)]
tracing: make trace pipe recognize latency format flag
The trace_pipe did not recognize the latency format flag and would produce
different output than the trace file. The problem was partly due that
the trace flags in the iterator was not set as well as the trace_pipe
zeros out part of the iterator (including the flags) to be able to use
the same routines as the trace file. trace_flags of the iterator should
not cause any problems when not zeroed out by for trace_pipe.
Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 1 Jun 2009 16:20:40 +0000 (12:20 -0400)]
tracing: remove redundant SOFTIRQ from softirq event traces
After converting the softirq tracer to use te flags options, this
caused a regression with the name. Since the flag was used directly
it was printed out (i.e. HRTIMER_SOFTIRQ).
This patch only shows the softirq name without the SOFTIRQ part.
[ Impact: fix regression of output from softirq events ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
tracing: add exports to use __print_symbolic and __print_flags from a module
A patch to allow the use of __print_symbolic and __print_flags
from a module. This allows the current GFS2 tracing patch to
build.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
LKML-Reference: <1243868015.29604.542.camel@localhost.localdomain> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Li Zefan [Mon, 1 Jun 2009 07:35:46 +0000 (15:35 +0800)]
tracing/events: introduce __dynamic_array()
__string() is limited:
- it's a char array, but we may want to define array with other types
- a source string should be available, but we may just know the string size
We introduce __dynamic_array() to break those limitations, and __string()
becomes a wrapper of it. As a side effect, now __get_str() can be used
in TP_fast_assign but not only TP_print.
Take XFS for example, we have the string length in the dirent, but the
string itself is not NULL-terminated, so __dynamic_array() can be used:
Steven Rostedt [Thu, 28 May 2009 20:31:21 +0000 (16:31 -0400)]
tracing: combine the default tracers into one config
Both event tracer and sched switch plugin are selected by default
by all generic tracers. But if no generic tracer is enabled, their options
appear. But ether one of them will select the other, thus it only
makes sense to have the default tracers be selected by one option.
[ Impact: clean up kconfig menu ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 28 May 2009 19:50:13 +0000 (15:50 -0400)]
tracing: fix config options to not show when automatically selected
There are two options that are selected by all tracers, but we want
to have those options available when no tracer is selected. These are
The event tracer and sched switch tracer.
The are enabled by all tracers, but if a tracer is not selected we want
the options to appear. All tracers including them select TRACING.
Thus what we would like to do is:
config EVENT_TRACER
bool "prompt"
depends on TRACING
select TRACING
But that gives us a bug in the kbuild system since we just created a
circular dependency. We only want the prompt to show when TRACING is off.
This patch adds GENERIC_TRACER that all tracers will select instead of
TRACING. The two options (sched switch and event tracer) will select
TRACING directly and depend on !GENERIC_TRACER. This solves the cicular
dependency.
[ Impact: hide options that are selected by default ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>