Kent Overstreet [Wed, 11 Sep 2013 01:46:36 +0000 (18:46 -0700)]
bcache: Add explicit keylist arg to btree_insert()
Some refactoring - better to explicitly pass stuff around instead of
having it all in the "big bag of state", struct btree_op. Going to prune
struct btree_op quite a bit over time.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:22:44 +0000 (17:22 -0700)]
bcache: Insert multiple keys at a time
We'll often end up with a list of adjacent keys to insert -
because bch_data_insert() may have to fragment the data it writes.
Originally, to simplify things and avoid having to deal with corner
cases bch_btree_insert() would pass keys from this list one at a time to
btree_insert_recurse() - mainly because the list of keys might span leaf
nodes, so it was easier this way.
With the btree_insert_node() refactoring, it's now a lot easier to just
pass down the whole list and have btree_insert_recurse() iterate over
leaf nodes until it's done.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:41:15 +0000 (18:41 -0700)]
bcache: Add btree_insert_node()
The flow of control in the old btree insertion code was rather -
backwards; we'd recurse down the btree (in btree_insert_recurse()), and
then if we needed to split the keys to be inserted into the parent node
would be effectively returned up to btree_insert_recurse(), which would
notice there was more work to do and finish the insertion.
The main problem with this was that the full logic for btree insertion
could only be used by calling btree_insert_recurse; if you'd gotten to a
btree leaf some other way and had a key to insert, if it turned out that
node needed to be split you were SOL.
This inverts the flow of control so btree_insert_node() does _full_
btree insertion, including splitting - and takes a (leaf) btree node to
insert into as a parameter.
This means we can now _correctly_ handle cache misses - for cache
misses, we need to insert a fake "check" key into the btree when we
discover we have a cache miss - while we still have the btree locked.
Previously, if the btree node was full inserting a cache miss would just
fail.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:20:19 +0000 (17:20 -0700)]
bcache: Explicitly track btree node's parent
This is prep work for the reworked btree insertion code.
The way we set b->parent is ugly and hacky... the problem is, when
btree_split() or garbage collection splits or rewrites a btree node, the
parent changes for all its (potentially already cached) children.
I may change this later and add some code to look through the btree node
cache and find all our cached child nodes and change the parent pointer
then...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Sat, 17 Aug 2013 09:13:15 +0000 (02:13 -0700)]
bcache: Stripe size isn't necessarily a power of two
Originally I got this right... except that the divides didn't use
do_div(), which broke 32 bit kernels. When I went to fix that, I forgot
that the raid stripe size usually isn't a power of two... doh
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:16:09 +0000 (17:16 -0700)]
bcache: Use blkdev_issue_discard()
The old asynchronous discard code was really a relic from when all the
allocation code was asynchronous - now that allocation runs out of a
dedicated thread there's no point in keeping around all that complicated
machinery.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 29 Aug 2013 01:01:24 +0000 (18:01 -0700)]
bcache: Fix for handling overlapping extents when reading in a btree node
btree_sort_fixup() was overly clever, because it was trying to avoid
pulling a key off the btree iterator in more than one place.
This led to a really obscure bug where we'd break early from the loop in
btree_sort_fixup() if the current key overlapped with keys in more than
one older set, and the next key it overlapped with was zero size.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Wed, 31 Jul 2013 23:29:54 +0000 (16:29 -0700)]
bcache: Fix a writeback performance regression
Background writeback works by scanning the btree for dirty data and
adding those keys into a fixed size buffer, then for each dirty key in
the keybuf writing it to the backing device.
When read_dirty() finishes and it's time to scan for more dirty data, we
need to wait for the outstanding writeback IO to finish - they still
take up slots in the keybuf (so that foreground writes can check for
them to avoid races) - without that wait, we'll continually rescan when
we'll be able to add at most a key or two to the keybuf, and that takes
locks that starves foreground IO. Doh.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
bcache: Correct printf()-style format length modifier
drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’:
drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
bcache: Strip endline when writing the label through sysfs
sysfs attributes with unusual characters have crappy failure modes
in Squeeze (udev 164); later versions of udev are unaffected.
This should make these characters more unusual.
Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 01:31:58 +0000 (18:31 -0700)]
bcache: Allocation kthread fixes
The alloc kthread should've been using try_to_freeze() - and also there
was the potential for the alloc kthread to get woken up after it had
shut down, which would have been bad.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Fri, 12 Jul 2013 02:43:21 +0000 (19:43 -0700)]
bcache: Fix GC_SECTORS_USED() calculation
Part of the job of garbage collection is to add up however many sectors
of live data it finds in each bucket, but that doesn't work very well if
it doesn't reset GC_SECTORS_USED() when it starts. Whoops.
This wouldn't have broken anything horribly, but allocation tries to
preferentially reclaim buckets that are mostly empty and that's not
gonna work with an incorrect GC_SECTORS_USED() value.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Fri, 12 Jul 2013 05:42:14 +0000 (22:42 -0700)]
bcache: Journal replay fix
The journal replay code starts by finding something that looks like a
valid journal entry, then it does a binary search over the unchecked
region of the journal for the journal entries with the highest sequence
numbers.
Trouble is, the logic was wrong - journal_read_bucket() returns true if
it found journal entries we need, but if the range of journal entries
we're looking for loops around the end of the journal - in that case
journal_read_bucket() could return true when it hadn't found the highest
sequence number we'd seen yet, and in that case the binary search did
the wrong thing. Whoops.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 04:03:25 +0000 (21:03 -0700)]
bcache: Shutdown fix
Stopping a cache set is supposed to make it stop attached backing
devices, but somewhere along the way that code got lost. Fixing this
mainly has the effect of fixing our reboot notifier.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 04:25:02 +0000 (21:25 -0700)]
bcache: Fix a sysfs splat on shutdown
If we stopped a bcache device when we were already detaching (or
something like that), bcache_device_unlink() would try to remove a
symlink from sysfs that was already gone because the bcache dev kobject
had already been removed from sysfs.
So keep track of whether we've removed stuff from sysfs.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 01:04:21 +0000 (18:04 -0700)]
bcache: Fix a dumb race
In the far-too-complicated closure code - closures can have destructors,
for probably dubious reasons; they get run after the closure is no
longer waiting on anything but before dropping the parent ref, intended
just for freeing whatever memory the closure is embedded in.
Trouble is, when remaining goes to 0 and we've got nothing more to run -
we also have to unlock the closure, setting remaining to -1. If there's
a destructor, that unlock isn't doing anything - nobody could be trying
to lock it if we're about to free it - but if the unlock _is needed...
that check for a destructor was racy. Argh.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Fri, 7 Jun 2013 01:15:57 +0000 (18:15 -0700)]
bcache: Use standard utility code
Some of bcache's utility code has made it into the rest of the kernel,
so drop the bcache versions.
Bcache used to have a workaround for allocating from a bio set under
generic_make_request() (if you allocated more than once, the bios you
already allocated would get stuck on current->bio_list when you
submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT
when allocating bios under generic_make_request() so that allocation
could fail and it could retry from workqueue. But bio_alloc_bioset() has
a workaround now, so we can drop this hack and the associated error
handling.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Wed, 5 Jun 2013 13:21:07 +0000 (06:21 -0700)]
bcache: Track dirty data by stripe
To make background writeback aware of raid5/6 stripes, we first need to
track the amount of dirty data within each stripe - we do this by
breaking up the existing sectors_dirty into per stripe atomic_ts
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Sun, 12 May 2013 00:07:26 +0000 (17:07 -0700)]
bcache: Initialize sectors_dirty when attaching
Previously, dirty_data wouldn't get initialized until the first garbage
collection... which was a bit of a problem for background writeback (as
the PD controller keys off of it) and also confusing for users.
This is also prep work for making background writeback aware of raid5/6
stripes.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Sat, 11 May 2013 22:59:37 +0000 (15:59 -0700)]
bcache: Improve lazy sorting
The old lazy sorting code was kind of hacky - rewrite in a way that
mathematically makes more sense; the idea is that the size of the sets
of keys in a btree node should increase by a more or less fixed ratio
from smallest to biggest.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Wed, 15 May 2013 03:33:16 +0000 (20:33 -0700)]
bcache: Rip out pkey()/pbtree()
Old gcc doesnt like the struct hack, and it is kind of ugly. So finish
off the work to convert pr_debug() statements to tracepoints, and delete
pkey()/pbtree().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Fri, 26 Apr 2013 22:39:55 +0000 (15:39 -0700)]
bcache: Fix/revamp tracepoints
The tracepoints were reworked to be more sensible, and fixed a null
pointer deref in one of the tracepoints.
Converted some of the pr_debug()s to tracepoints - this is partly a
performance optimization; it used to be that with DEBUG or
CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
was changed to an empty inline function.
Some of the pr_debug() statements had rather expensive function calls as
part of the arguments, so this code was getting run unnecessarily even
on non debug kernels - in some fast paths, too.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Thu, 25 Apr 2013 20:58:35 +0000 (13:58 -0700)]
bcache: Refactor btree io
The most significant change is that btree reads are now done
synchronously, instead of asynchronously and doing the post read stuff
from a workqueue.
This was originally done because we can't block on IO under
generic_make_request(). But - we already have a mechanism to punt cache
lookups to workqueue if needed, so if we just use that we don't have to
deal with the complexity of doing things asynchronously.
The main benefit is this makes the locking situation saner; we can hold
our write lock on the btree node until we're finished reading it, and we
don't need that btree_node_read_done() flag anymore.
Also, for writes, btree_write() was broken out into btree_node_write()
and btree_leaf_dirty() - the old code with the boolean argument was dumb
and confusing.
The prio_blocked mechanism was improved a bit too, now the only counter
is in struct btree_write, we don't mess with transfering a count from
struct btree anymore.
This required changing garbage collection to block prios at the start
and unblock when it finishes, which is cleaner than what it was doing
anyways (the old code had mostly the same effect, but was doing it in a
convoluted way)
And the btree iter btree_node_read_done() uses was converted to a real
mempool.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Wed, 29 May 2013 04:53:19 +0000 (21:53 -0700)]
bcache: fix a spurious gcc complaint, use scnprintf
An old version of gcc was complaining about using a const int as the
size of a stack allocated array. Which should be fine - but using
ARRAY_SIZE() is better, anyways.
Also, refactor the code to use scnprintf().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Linus Torvalds [Sat, 22 Jun 2013 19:44:45 +0000 (09:44 -1000)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are two fixes that came in this week, one for a regression we
introduced in 3.10 in the GIC interrupt code, and the other one fixes
a typo in newly introduced code"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
irqchip: gic: call gic_cpu_init() as well in CPU_STARTING_FROZEN case
ARM: dts: Correct the base address of pinctrl_3 on Exynos5250
Linus Torvalds [Sat, 22 Jun 2013 19:02:44 +0000 (09:02 -1000)]
Merge tag 'driver-core-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg Kroah-Hartman:
"Here's a single patch for the firmware core that resolves a reported
oops in the firmware core that people have been hitting."
* tag 'driver-core-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware loader: fix use-after-free by double abort
Linus Torvalds [Sat, 22 Jun 2013 19:01:47 +0000 (09:01 -1000)]
Merge tag 'usb-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are two USB patches for 3.10.
One updates the Kconfig wording for CONFIG_USB_PHY to make it,
hopefully, more obvious what this option is (I know you complained
about this when it hit the tree.) The other is a new device id for a
driver"
* tag 'usb-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: ti_usb_3410_5052: new device id for Abbot strip port cable
usb: phy: Improve Kconfig help for CONFIG_USB_PHY
Linus Torvalds [Sat, 22 Jun 2013 19:00:28 +0000 (09:00 -1000)]
Merge tag 'tty-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pul tty fixes from Greg Kroah-Hartman:
"Here are two tty core fixes that resolve some regressions that have
been reported recently. Both tiny fixes, but needed"
* tag 'tty-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix transient pty write() EIO
tty/vt: Return EBUSY if deallocating VT1 and it is busy
Pull SCSI target fixes from Nicholas Bellinger:
"Included is the recent tcm_qla2xxx residual underrun length fix from
Roland, along with Joern's iscsi-target patch for session_lock
breakage within iscsit_stop_time2retain_timer() code. Both are CC'ed
to stable.
The remaining two are specific to recent iscsi-target + iser
conversion changes. One drops some left-over debug noise, and Andy's
patch fixes configfs attribute handling during an explicit network
portal feature bit disable when iser-target is unsupported."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi-target: Remove left over v3.10-rc debug printks
target/iscsi: Fix op=disable + error handling cases in np_store_iser
tcm_qla2xxx: Fix residual for underrun commands that fail
target/iscsi: don't corrupt bh_count in iscsit_stop_time2retain_timer()
Linus Torvalds [Sat, 22 Jun 2013 18:43:17 +0000 (08:43 -1000)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Another set of fixes for Kernel 3.10.
This series contain:
- two Kbuild fixes for randconfig
- a buffer overflow when using rtl28xuu with r820t tuner
- one clk fixup on exynos4-is driver"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] Fix build when drivers are builtin and frontend modules
[media] s5p makefiles: don't override other selections on obj-[ym]
[media] exynos4-is: Fix FIMC-IS clocks initialization
[media] rtl28xxu: fix buffer overflow when probing Rafael Micro r820t tuner
Linus Torvalds [Sat, 22 Jun 2013 18:42:20 +0000 (08:42 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Several fixes for bugs caught while looking through f_pos (ab)users"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aout32 coredump compat fix
splice: don't pass the address of ->f_pos to methods
mconsole: we'd better initialize pos before passing it to vfs_read()...
Al Viro [Sat, 22 Jun 2013 07:01:38 +0000 (11:01 +0400)]
aout32 coredump compat fix
dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout
handles it correctly (seeks by PAGE_SIZE - sizeof(struct user),
getting the current position to PAGE_SIZE), compat one seeks
by PAGE_SIZE and ends up at PAGE_SIZE + already written...
Linus Torvalds [Fri, 21 Jun 2013 16:33:48 +0000 (06:33 -1000)]
Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"This series fixes a couple of build failures, and fixes MTRR cleanup
and memory setup on very specific memory maps.
Finally, it fixes triggering backtraces on all CPUs, which was
inadvertently disabled on x86."
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Fix dummy variable buffer allocation
x86: Fix trigger_all_cpu_backtrace() implementation
x86: Fix section mismatch on load_ucode_ap
x86: fix build error and kconfig for ia32_emulation and binfmt
range: Do not add new blank slot with add_range_with_merge
x86, mtrr: Fix original mtrr range get for mtrr_cleanup
Linus Torvalds [Fri, 21 Jun 2013 16:33:06 +0000 (06:33 -1000)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm radeon fixes from Dave Airlie:
"One core fix, but mostly radeon fixes for s/r and big endian UVD
support, and a fix to stop the GPU being reset for no good reason, and
crashing people's machines."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: update lockup tracking when scheduling in empty ring
drm/prime: Honor requested file flags when exporting a buffer
drm/radeon: fix UVD on big endian
drm/radeon: fix write back suspend regression with uvd v2
drm/radeon: do not try to uselessly update virtual memory pagetable
Linus Torvalds [Fri, 21 Jun 2013 16:31:10 +0000 (06:31 -1000)]
Merge tag 'acpi-3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
- Fix for a regression causing a failure to turn on some devices on
some systems during initialization introduced by a recent revert of
an ACPI PM change that broke something else. Fortunately, we know
exactly what devices are affected, so we can add a fix just for them
leaving everyone else alone.
- ACPI power resources initialization fix preventing a NULL pointer
from being dereferenced in the acpi_add_power_resource() error code
path.
- ACPI dock station driver fix that adds missing locking to
write_undock().
- ACPI resources allocation fix changing the scope of an old workaround
so that it doesn't affect systems that aren't actually buggy. This
was reported a couple of days ago to fix DMA problems on some new
platforms so we need it in -stable. From Mika Westerberg.
* tag 'acpi-3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / LPSS: Power up LPSS devices during enumeration
ACPI / PM: Fix error code path for power resources initialization
ACPI / dock: Take ACPI scan lock in write_undock()
ACPI / resources: call acpi_get_override_irq() only for legacy IRQ resources
Linus Torvalds [Fri, 21 Jun 2013 16:29:22 +0000 (06:29 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Three one-line fixes for my first pull request; one for x86 host, one
for x86 guest, one for PPC"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86: kvmclock: zero initialize pvclock shared memory area
kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
KVM: x86: remove vcpu's CPL check in host-invoked XCR set
Linus Torvalds [Fri, 21 Jun 2013 16:27:40 +0000 (06:27 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
"This fixes a problem preventing the kernel and userland librbd
libraries from sharing data with the new format 2 images"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: use the correct length for format 2 object names
H. Peter Anvin [Fri, 21 Jun 2013 10:01:21 +0000 (03:01 -0700)]
Merge tag 'efi-urgent' into x86/urgent
* Don't leak random kernel memory to EFI variable NVRAM when attempting
to initiate garbage collection. Also, free the kernel memory when
we're done with it instead of leaking - Ben Hutchings
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Andy Grover [Wed, 29 May 2013 19:05:59 +0000 (12:05 -0700)]
target/iscsi: Fix op=disable + error handling cases in np_store_iser
Writing 0 when iser was not previously enabled, so succeed but do
nothing so that user-space code doesn't need a try: catch block
when ib_isert logic is not available.
Also, return actual error from add_network_portal using PTR_ERR
during op=enable failure.
Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Dave Airlie [Thu, 20 Jun 2013 22:52:19 +0000 (08:52 +1000)]
Merge branch 'drm-fixes-3.10' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
One user visible fix to stop misreport GPU hangs and subsequent resets.
* 'drm-fixes-3.10' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: update lockup tracking when scheduling in empty ring
Jerome Glisse [Wed, 19 Jun 2013 14:02:28 +0000 (10:02 -0400)]
drm/radeon: update lockup tracking when scheduling in empty ring
There might be issue with lockup detection when scheduling on an
empty ring that have been sitting idle for a while. Thus update
the lockup tracking data when scheduling new work in an empty ring.
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Thu, 20 Jun 2013 18:18:35 +0000 (08:18 -1000)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two smaller fixes - plus a context tracking tracing fix that is a bit
bigger"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/context-tracking: Add preempt_schedule_context() for tracing
sched: Fix clear NOHZ_BALANCE_KICK
sched/x86: Construct all sibling maps if smt
Linus Torvalds [Thu, 20 Jun 2013 18:17:36 +0000 (08:17 -1000)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Four fixes. The mmap ones are unfortunately larger than desired -
fuzzing uncovered bugs that needed perf context life time management
changes to fix properly"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
perf: Fix mmap() accounting hole
perf: Fix perf mmap bugs
kprobes: Fix to free gone and unused optprobes
Linus Torvalds [Thu, 20 Jun 2013 18:16:07 +0000 (08:16 -1000)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu idle fixes from Thomas Gleixner:
- Add a missing irq enable. Fallout of the idle conversion
- Fix stackprotector wreckage caused by the idle conversion
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
idle: Enable interrupts in the weak arch_cpu_idle() implementation
idle: Add the stack canary init to cpu_startup_entry()
Linus Torvalds [Thu, 20 Jun 2013 18:15:13 +0000 (08:15 -1000)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
- Fix inconstinant clock usage in virtual time accounting
- Fix a build error in KVM caused by the NOHZ work
- Remove a pointless timekeeping duty assignment which breaks NOHZ
- Use a proper notifier return value to avoid random behaviour
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Remove useless timekeeping duty attribution to broadcast source
nohz: Fix notifier return val that enforce timekeeping
kvm: Move guest entry/exit APIs to context_tracking
vtime: Use consistent clocks among nohz accounting
Linus Torvalds [Thu, 20 Jun 2013 18:08:46 +0000 (08:08 -1000)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fix fro, Benjamin Herrenschmidt:
"We accidentally broke hugetlbfs on Freescale embedded processors which
use a slightly different page table layout than our server processors"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix bad pmd error with book3E config
Linus Torvalds [Thu, 20 Jun 2013 18:07:42 +0000 (08:07 -1000)]
Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tilepro fix from Chris Metcalf:
"This change allows the older tilepro architecture to be correctly
built by newer gccs, despite a change that caused gcc to start trying
to use an out-of-line implementation for __builtin_ffsll().
This should be inline again starting with gcc 4.7.4 and 4.8.2 or so,
but meanwhile this change keeps things from breaking, with the only
cost being a few bytes of code in the kernel to provide __ffsdi2 even
for compilers that do inline it"
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tilepro: work around module link error with gcc 4.7
[media] Fix build when drivers are builtin and frontend modules
There are a large number of reports that the media build is
not compiling when some drivers are compiled as builtin, while
the needed frontends are compiled as module.
On the last one of such reports:
From: kbuild test robot <fengguang.wu@intel.com>
Subject: saa7134-dvb.c:undefined reference to `zl10039_attach'
The .config file has:
CONFIG_VIDEO_SAA7134=y
CONFIG_VIDEO_SAA7134_DVB=y
# CONFIG_MEDIA_ATTACH is not set
CONFIG_DVB_ZL10039=m
And it produces all those errors:
drivers/built-in.o: In function `set_type':
tuner-core.c:(.text+0x2f263e): undefined reference to `tea5767_attach'
tuner-core.c:(.text+0x2f273e): undefined reference to `tda9887_attach'
drivers/built-in.o: In function `tuner_probe':
tuner-core.c:(.text+0x2f2d20): undefined reference to `tea5767_autodetection'
drivers/built-in.o: In function `av7110_attach':
av7110.c:(.text+0x330bda): undefined reference to `ves1x93_attach'
av7110.c:(.text+0x330bf7): undefined reference to `stv0299_attach'
av7110.c:(.text+0x330c63): undefined reference to `tda8083_attach'
av7110.c:(.text+0x330d09): undefined reference to `ves1x93_attach'
av7110.c:(.text+0x330d33): undefined reference to `tda8083_attach'
av7110.c:(.text+0x330d5d): undefined reference to `stv0297_attach'
av7110.c:(.text+0x330dbe): undefined reference to `stv0299_attach'
drivers/built-in.o: In function `tuner_attach_dtt7520x':
ngene-cards.c:(.text+0x3381cb): undefined reference to `dvb_pll_attach'
drivers/built-in.o: In function `demod_attach_lg330x':
ngene-cards.c:(.text+0x33828a): undefined reference to `lgdt330x_attach'
drivers/built-in.o: In function `demod_attach_stv0900':
ngene-cards.c:(.text+0x3383d5): undefined reference to `stv090x_attach'
drivers/built-in.o: In function `cineS2_probe':
ngene-cards.c:(.text+0x338b7f): undefined reference to `drxk_attach'
drivers/built-in.o: In function `configure_tda827x_fe':
saa7134-dvb.c:(.text+0x346ae7): undefined reference to `tda10046_attach'
drivers/built-in.o: In function `dvb_init':
saa7134-dvb.c:(.text+0x347283): undefined reference to `mt352_attach'
saa7134-dvb.c:(.text+0x3472cd): undefined reference to `mt352_attach'
saa7134-dvb.c:(.text+0x34731c): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x34733c): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x34735c): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x347378): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x3473db): undefined reference to `tda10046_attach'
drivers/built-in.o:saa7134-dvb.c:(.text+0x347502): more undefined references to `tda10046_attach' follow
drivers/built-in.o: In function `dvb_init':
saa7134-dvb.c:(.text+0x347812): undefined reference to `mt352_attach'
saa7134-dvb.c:(.text+0x347951): undefined reference to `mt312_attach'
saa7134-dvb.c:(.text+0x3479a9): undefined reference to `mt312_attach'
>> saa7134-dvb.c:(.text+0x3479c1): undefined reference to `zl10039_attach'
This is happening because a builtin module can't use directly a symbol
found on a module. By enabling CONFIG_MEDIA_ATTACH, the configuration
becomes valid, as dvb_attach() macro loads the module if needed, making
the symbol available to the builtin module.
While this bug started to appear after the patches that use IS_DEFINED
macro (like changeset 7b34be71db533f3e0cf93d53cf62d036cdb5418a), this
bug is a way ancient than that.
The thing is that, before the IS_DEFINED() patches, the logic used to be:
The above code, with the .config file used, was evoluting to FALSE
(instead of TRUE as it should be, as CONFIG_DVB_ZL10039 is 'm'),
and were adding the static inline code at saa7134-dvb, instead
of the external call. So, while it weren't producing any compilation
error, the code weren't working either.
So, as the overhead for using CONFIG_MEDIA_ATTACH is minimal, just
enable it, if MODULES is defined.
Shawn Guo [Wed, 12 Jun 2013 11:30:27 +0000 (19:30 +0800)]
irqchip: gic: call gic_cpu_init() as well in CPU_STARTING_FROZEN case
Commit c011470 (irqchip: gic: Perform the gic_secondary_init() call via
CPU notifier) moves gic_secondary_init() that used to be called in
.smp_secondary_init hook into a notifier call. But it changes the
system behavior a little bit. Before the commit, gic_cpu_init()
is called not only when kernel brings up the secondary cores but also
when system resuming procedure hot-plugs the cores back to kernel.
While after the commit, the function will not be called in the latter
case, where the 'action' will not be CPU_STARTING but
CPU_STARTING_FROZEN. This behavior difference at least causes the
following suspend/resume regression on imx6q.
$ echo mem > /sys/power/state
PM: Syncing filesystems ... done.
PM: Preparing system for mem sleep
mmc1: card e624 removed
Freezing user space processes ... (elapsed 0.01 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
PM: Entering mem sleep
PM: suspend of devices complete after 5.930 msecs
PM: suspend devices took 0.010 seconds
PM: late suspend of devices complete after 0.343 msecs
PM: noirq suspend of devices complete after 0.828 msecs
Disabling non-boot CPUs ...
CPU1: shutdown
CPU2: shutdown
CPU3: shutdown
Enabling non-boot CPUs ...
CPU1: Booted secondary processor
INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3} (detected by 0, t=2102 jiffies, g=4294967169, c=4294967168, q=17)
Task dump for CPU 1:
swapper/1 R running 0 0 1 0x00000000
Backtrace:
[<bf895ff4>] (0xbf895ff4) from [<00000000>] ( (null))
Backtrace aborted due to bad frame pointer <8007ccdc>
Task dump for CPU 2:
swapper/2 R running 0 0 1 0x00000000
Backtrace:
[<8075dbdc>] (0x8075dbdc) from [<00000000>] ( (null))
Backtrace aborted due to bad frame pointer <00000002>
Task dump for CPU 3:
swapper/3 R running 0 0 1 0x00000000
Backtrace:
[<8075dbdc>] (0x8075dbdc) from [<00000000>] ( (null))
Fix the regression by checking 'action' being CPU_STARTING_FROZEN to
have gic_cpu_init() called for secondary cores when system resumes.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Joseph Lo <josephl@nvidia.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.
trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.
x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.
I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.
Tested via: echo l > /proc/sysrq-trigger
Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )
After the change, this shows NMI backtraces on all CPUs
Jed Davis [Thu, 20 Jun 2013 03:07:14 +0000 (04:07 +0100)]
perf: arm64: Record the user-mode PC in the call chain.
With this change, we no longer lose the innermost entry in the user-mode
part of the call chain. See also the x86 port, which includes the ip,
and the corresponding change in arch/arm.
Signed-off-by: Jed Davis <jld@mozilla.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[media] s5p makefiles: don't override other selections on obj-[ym]
The $obj-m/$obj-y vars should be adding new modules to build, not
overriding it. So, it should never use
$obj-y := foo.o
instead, it should use:
$obj-y += foo.o
Failing to do that is very bad, as it will suppress needed modules.
Book3E uses the hugepd at PMD level and don't encode pte directly
at the pmd level. So it will find the lower bits of pmd set
and the pmd_bad check throws error. Infact the current code
will never take the free_hugepd_range call at all because it will
clear the pmd if it find a hugepd pointer. Fix this by clearing
bad pmd only if it is not a hugepd pointer.
ACPI / LPSS: Power up LPSS devices during enumeration
Commit 7cd8407 (ACPI / PM: Do not execute _PS0 for devices without
_PSC during initialization) introduced a regression on some systems
with Intel Lynxpoint Low-Power Subsystem (LPSS) where some devices
need to be powered up during initialization, but their device objects
in the ACPI namespace have _PS0 and _PS3 only (without _PSC or power
resources).
To work around this problem, make the ACPI LPSS driver power up
devices it knows about by using a new helper function
acpi_device_fix_up_power() that does all of the necessary
sanity checks and calls acpi_dev_pm_explicit_set() to put the
device into D0.
Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPI / PM: Fix error code path for power resources initialization
Commit 781d737 (ACPI: Drop power resources driver) introduced a
bug in the power resources initialization error code path causing
a NULL pointer to be referenced in acpi_release_power_resource()
if there's an error triggering a jump to the 'err' label in
acpi_add_power_resource(). This happens because the list_node
field of struct acpi_power_resource has not been initialized yet
at this point and doing a list_del() on it is a bad idea.
To prevent this problem from occuring, initialize the list_node
field of struct acpi_power_resource upfront.
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.9+ <stable@vger.kernel.org> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
ACPI / dock: Take ACPI scan lock in write_undock()
Since commit 3757b94 (ACPI / hotplug: Fix concurrency issues and
memory leaks) acpi_bus_scan() and acpi_bus_trim() must always be
called under acpi_scan_lock, but currently the following scenario
violating that requirement is possible:
Fix that by making write_undock() acquire acpi_scan_lock before
calling handle_eject_request() as appropriate (begin_undock() is
under the lock too in analogy with acpi_dock_deferred_cb()).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.9+ <stable@vger.kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com>
Mika Westerberg [Mon, 20 May 2013 15:41:45 +0000 (15:41 +0000)]
ACPI / resources: call acpi_get_override_irq() only for legacy IRQ resources
acpi_get_override_irq() was added because there was a problem with
buggy BIOSes passing wrong IRQ() resource for the RTC IRQ. The
commit that added the workaround was 61fd47e0c8476 (ACPI: fix two
IRQ8 issues in IOAPIC mode).
With ACPI 5 enumerated devices there are typically one or more
extended IRQ resources per device (and these IRQs can be shared).
However, the acpi_get_override_irq() workaround forces all IRQs in
range 0 - 15 (the legacy ISA IRQs) to be edge triggered, active high
as can be seen from the dmesg below:
ACPI: IRQ 6 override to edge, high
ACPI: IRQ 7 override to edge, high
ACPI: IRQ 7 override to edge, high
ACPI: IRQ 13 override to edge, high
Also /proc/interrupts for the I2C controllers (INT33C2 and INT33C3) shows
the same thing:
7: 4 0 0 0 IO-APIC-edge INT33C2:00, INT33C3:00
The _CSR method for INT33C2 (and INT33C3) device returns following
resource:
which states that this is supposed to be level triggered, active low,
shared IRQ instead.
Fix this by making sure that acpi_get_override_irq() gets only called
when we are dealing with legacy IRQ() or IRQNoFlags() descriptors.
While we are there, correct pr_warning() to print the right triggering
value.
This change turns out to be necessary to make DMA work correctly on
systems based on the Intel Lynxpoint PCH (Platform Controller Hub).
[rjw: Changelog] Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: 3.9+ <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Paul Gortmaker [Wed, 19 Jun 2013 15:15:26 +0000 (11:15 -0400)]
x86: Fix section mismatch on load_ucode_ap
We are in the process of removing all the __cpuinit annotations.
While working on making that change, an existing problem was
made evident:
WARNING: arch/x86/kernel/built-in.o(.text+0x198f2): Section mismatch
in reference from the function cpu_init() to the function
.init.text:load_ucode_ap() The function cpu_init() references
the function __init load_ucode_ap(). This is often because cpu_init
lacks a __init annotation or the annotation of load_ucode_ap is wrong.
This now appears because in my working tree, cpu_init() is no longer
tagged as __cpuinit, and so the audit picks up the mismatch. The 2nd
hypothesis from the audit is the correct one, as there was an incorrect
__init tag on the prototype in the header (but __cpuinit was used on
the function itself.)
The audit is telling us that the prototype's __init annotation took
effect and the function did land in the .init.text section. Checking
with objdump on a mainline tree that still has __cpuinit shows that
the __cpuinit on the function takes precedence over the __init on the
prototype, but that won't be true once we make __cpuinit a no-op.
Even though we are removing __cpuinit, we temporarily align both
the function and the prototype on __cpuinit so that the changeset
can be applied to stable trees if desired.
[ hpa: build fix only, no object code change ]
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: stable <stable@vger.kernel.org> # 3.9+ Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1371654926-11729-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
David Daney [Mon, 17 Jun 2013 15:46:07 +0000 (08:46 -0700)]
mn10300: Fix include dependency in irqflags.h et al.
We need to pick up the definition of raw_smp_processor_id() from
asm/smp.h. For the !SMP case, we need to supply a definition of
raw_smp_processor_id().
Because of the include dependencies we cannot use smp_call_func_t in
asm/smp.h, but we do need linux/thread_info.h
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sparc fixes from David Miller:
"Various sparc bug fixes, in particular:
1) TSB hashes have to be flushed before TLB on sparc64, from Dave
Kleikamp.
2) LEON timer interrupts can get stuck, from Andreas Larsson.
3) Sparc64 needs to handle lack of address-congruence devicetree
property, from Bob Picco"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: tsb must be flushed before tlb
sparc,leon: Convert to use devm_ioremap_resource
sparc64 address-congruence property
sparc32, leon: Enable interrupts before going idle to avoid getting stuck
sparc32, leon: Remove separate "ticker" timer for SMP
sparc: kernel: using strlcpy() instead of strcpy()
arch: sparc: prom: looping issue, need additional length check in the outside looping
sparc: remove inline marking of EXPORT_SYMBOL functions
sparc: Switch to asm-generic/linkage.h
Linus Torvalds [Wed, 19 Jun 2013 16:23:56 +0000 (06:23 -1000)]
Merge branch 'parisc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"This contains a kernel segfault fix when reading /proc/kpageflags or
/proc/kpagecount, two fixes for the serial port and PCI graphic card
support on C8000 workstations and a fix to use unshadowed registers
for flushing D- and I-caches."
* 'parisc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Use unshadowed index register for flush instructions in flush_dcache_page_asm and flush_icache_page_asm
parisc: provide pci_mmap_page_range() for parisc
parisc: fix serial ports on C8000 workstation
parisc: fix kernel BUG at arch/parisc/include/asm/mmzone.h:50 (part 2)
James Hogan [Fri, 14 Jun 2013 09:31:01 +0000 (10:31 +0100)]
metag: fix mm/hugetlb.c build breakage
Commit 106c992a5ebe ("mm/hugetlb: add more arch-defined huge_pte
functions") added an include of <asm-generic/hugetlb.h> to each
architecture's <asm/hugetlb.h> (except s390). Unfortunately metag was
missed which resulted in build errors when hugetlbfs is enabled (see
below).
Add the include for metag too to fix the build errors:
mm/hugetlb.c In function 'make_huge_pte':
mm/hugetlb.c +2250 : error: implicit declaration of function 'huge_pte_mkwrite'
mm/hugetlb.c +2250 : error: implicit declaration of function 'huge_pte_mkdirty'
...
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 19 Jun 2013 16:19:46 +0000 (06:19 -1000)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"The larger changes this time are
- "ARM: 7755/1: handle user space mapped pages in flush_kernel_dcache_page"
which fixes more data corruption problems with O_DIRECT
- "ARM: 7759/1: decouple CPU offlining from reboot/shutdown" which
gets us back to working shutdown/reboot on SMP platforms
- "ARM: 7752/1: errata: LoUIS bit field in CLIDR register is incorrect"
which fixes a shutdown regression found in v3.10 on Versatile
Express platforms.
The remainder are the quite small, maybe one or two line changes"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7759/1: decouple CPU offlining from reboot/shutdown
ARM: 7756/1: zImage/virt: remove hyp-stub.S during distclean
ARM: 7755/1: handle user space mapped pages in flush_kernel_dcache_page
ARM: 7754/1: Fix the CPU ID and the mask associated to the PJ4B
ARM: 7753/1: map_init_section flushes incorrect pmd
ARM: 7752/1: errata: LoUIS bit field in CLIDR register is incorrect
The ISP clock register content is not preserved over the ISP power domain
off/on cycle. Instead of setting the clock frequencies once at probe time
the clock rates set up is moved to the runtime_resume handler, which is
invoked after the related power domain is already enabled, ensuring the
clocks are properly configured when the device is actively used.
This fixes the FIMC-IS malfunctions and STREAM ON timeout errors accuring
on some boards:
[ 59.860000] fimc_is_general_irq_handler:583 ISR_NDONE: 5: 0x800003e8, IS_ERROR_UNKNOWN
[ 59.860000] fimc_is_general_irq_handler:586 IS_ERROR_TIME_OUT
Steven Rostedt [Fri, 24 May 2013 19:23:40 +0000 (15:23 -0400)]
tracing/context-tracking: Add preempt_schedule_context() for tracing
Dave Jones hit the following bug report:
===============================
[ INFO: suspicious RCU usage. ]
3.10.0-rc2+ #1 Not tainted
-------------------------------
include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
other info that might help us debug this:
RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
2 locks held by cc1/63645:
#0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
What happened was that the function tracer traced the schedule_user() code
that tells RCU that the system is coming back from userspace, and to
add the CPU back to the RCU monitoring.
Because the function tracer does a preempt_disable/enable_notrace() calls
the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
then preempt_schedule() is called. But this is called before the user_exit()
function can inform the kernel that the CPU is no longer in user mode and
needs to be accounted for by RCU.
The fix is to create a new preempt_schedule_context() that checks if
the kernel is still in user mode and if so to switch it to kernel mode
before calling schedule. It also switches back to user mode coming back
from schedule in need be.
The only user of this currently is the preempt_enable_notrace(), which is
only used by the tracing subsystem.
Vincent Guittot [Wed, 5 Jun 2013 08:13:11 +0000 (10:13 +0200)]
sched: Fix clear NOHZ_BALANCE_KICK
I have faced a sequence where the Idle Load Balance was sometime not
triggered for a while on my platform, in the following scenario:
CPU 0 and CPU 1 are running tasks and CPU 2 is idle
CPU 1 kicks the Idle Load Balance
CPU 1 selects CPU 2 as the new Idle Load Balancer
CPU 2 sets NOHZ_BALANCE_KICK for CPU 2
CPU 2 sends a reschedule IPI to CPU 2
While CPU 3 wakes up, CPU 0 or CPU 1 migrates a waking up task A on CPU 2
CPU 2 finally wakes up, runs task A and discards the Idle Load Balance
task A quickly goes back to sleep (before a tick occurs on CPU 2)
CPU 2 goes back to idle with NOHZ_BALANCE_KICK set
Whenever CPU 2 will be selected as the ILB, no reschedule IPI will be sent
because NOHZ_BALANCE_KICK is already set and no Idle Load Balance will be
performed.
We must wait for the sched softirq to be raised on CPU 2 thanks to another
part the kernel to come back to clear NOHZ_BALANCE_KICK.
The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if
we can't raise the sched_softirq for the Idle Load Balance.
Change since V1:
- move the clear of NOHZ_BALANCE_KICK in got_nohz_idle_kick if the ILB
can't run on this CPU (as suggested by Peter)
perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP.
For some reason, the LDLAT extra reg definition for snb_ep
showed up as duplicate in the snb table.
This patch moves the definition of LDLAT back into the
snb_ep table.
Peter Zijlstra [Tue, 4 Jun 2013 08:44:21 +0000 (10:44 +0200)]
perf: Fix mmap() accounting hole
Vince's fuzzer once again found holes. This time it spotted a leak in
the locked page accounting.
When an event had redirected output and its close() was the last
reference to the buffer we didn't have a vm context to undo accounting.
Change the code to destroy the buffer on the last munmap() and detach
all redirected events at that time. This provides us the right context
to undo the vm accounting.
Igor Mammedov [Mon, 10 Jun 2013 16:31:11 +0000 (18:31 +0200)]
x86: kvmclock: zero initialize pvclock shared memory area
kernel might hung in pvclock_clocksource_read() due to
uninitialized memory might contain odd version value in
following cycle:
do {
version = __pvclock_read_cycles(src, &ret, &flags);
} while ((src->version & 1) || version != src->version);
if secondary kvmclock is accessed before it's registered with kvm.
Clear garbage in pvclock shared memory area right after it's
allocated to avoid this issue.
Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521 Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[See BZ for analysis. We may want a different fix for 3.11, but
this is the safest for now - Paolo] Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Scott Wood [Tue, 11 Jun 2013 16:38:31 +0000 (11:38 -0500)]
kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
kwmppc_lazy_ee_enable() should be called as late as possible,
or else we get things like WARN_ON(preemptible()) in enable_kernel_fp()
in configurations where preemptible() works.
Note that book3s_pr already waits until just before __kvmppc_vcpu_run
to call kvmppc_lazy_ee_enable().
Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>