Kent Overstreet [Wed, 11 Sep 2013 01:48:51 +0000 (18:48 -0700)]
bcache: Add btree_map() functions
Lots of stuff has been open coding its own btree traversal - which is
generally pretty simple code, but there are a few subtleties.
This adds new new functions, bch_btree_map_nodes() and
bch_btree_map_keys(), which do the traversal for you. Everything that's
open coding btree traversal now (with the exception of garbage
collection) is slowly going to be converted to these two functions;
being able to write other code at a higher level of abstraction is a
big improvement w.r.t. overall code quality.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:50:06 +0000 (17:50 -0700)]
bcache: Convert writeback to a kthread
This simplifies the writeback flow control quite a bit - previously, it
was conceptually two coroutines, refill_dirty() and read_dirty(). This
makes the code quite a bit more straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:50:19 +0000 (17:50 -0700)]
bcache: Convert gc to a kthread
We needed a dedicated rescuer workqueue for gc anyways... and gc was
conceptually a dedicated thread, just one that wasn't running all the
time. Switch it to a dedicated thread to make the code a bit more
straightforward.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:29:09 +0000 (17:29 -0700)]
bcache: Convert bucket_wait to wait_queue_head_t
At one point we did do fancy asynchronous waiting stuff with
bucket_wait, but that's all gone (and bucket_wait is used a lot less
than it used to be). So use the standard primitives.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:26:39 +0000 (17:26 -0700)]
bcache: Refactor journalling flow control
Making things less asynchronous that don't need to be - bch_journal()
only has to block when the journal or journal entry is full, which is
emphatically not a fast path. So make it a normal function that just
returns when it finishes, to make the code and control flow easier to
follow.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:46:36 +0000 (18:46 -0700)]
bcache: Add explicit keylist arg to btree_insert()
Some refactoring - better to explicitly pass stuff around instead of
having it all in the "big bag of state", struct btree_op. Going to prune
struct btree_op quite a bit over time.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:22:44 +0000 (17:22 -0700)]
bcache: Insert multiple keys at a time
We'll often end up with a list of adjacent keys to insert -
because bch_data_insert() may have to fragment the data it writes.
Originally, to simplify things and avoid having to deal with corner
cases bch_btree_insert() would pass keys from this list one at a time to
btree_insert_recurse() - mainly because the list of keys might span leaf
nodes, so it was easier this way.
With the btree_insert_node() refactoring, it's now a lot easier to just
pass down the whole list and have btree_insert_recurse() iterate over
leaf nodes until it's done.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 11 Sep 2013 01:41:15 +0000 (18:41 -0700)]
bcache: Add btree_insert_node()
The flow of control in the old btree insertion code was rather -
backwards; we'd recurse down the btree (in btree_insert_recurse()), and
then if we needed to split the keys to be inserted into the parent node
would be effectively returned up to btree_insert_recurse(), which would
notice there was more work to do and finish the insertion.
The main problem with this was that the full logic for btree insertion
could only be used by calling btree_insert_recurse; if you'd gotten to a
btree leaf some other way and had a key to insert, if it turned out that
node needed to be split you were SOL.
This inverts the flow of control so btree_insert_node() does _full_
btree insertion, including splitting - and takes a (leaf) btree node to
insert into as a parameter.
This means we can now _correctly_ handle cache misses - for cache
misses, we need to insert a fake "check" key into the btree when we
discover we have a cache miss - while we still have the btree locked.
Previously, if the btree node was full inserting a cache miss would just
fail.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:20:19 +0000 (17:20 -0700)]
bcache: Explicitly track btree node's parent
This is prep work for the reworked btree insertion code.
The way we set b->parent is ugly and hacky... the problem is, when
btree_split() or garbage collection splits or rewrites a btree node, the
parent changes for all its (potentially already cached) children.
I may change this later and add some code to look through the btree node
cache and find all our cached child nodes and change the parent pointer
then...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Sat, 17 Aug 2013 09:13:15 +0000 (02:13 -0700)]
bcache: Stripe size isn't necessarily a power of two
Originally I got this right... except that the divides didn't use
do_div(), which broke 32 bit kernels. When I went to fix that, I forgot
that the raid stripe size usually isn't a power of two... doh
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 25 Jul 2013 00:16:09 +0000 (17:16 -0700)]
bcache: Use blkdev_issue_discard()
The old asynchronous discard code was really a relic from when all the
allocation code was asynchronous - now that allocation runs out of a
dedicated thread there's no point in keeping around all that complicated
machinery.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 29 Aug 2013 01:01:24 +0000 (18:01 -0700)]
bcache: Fix for handling overlapping extents when reading in a btree node
btree_sort_fixup() was overly clever, because it was trying to avoid
pulling a key off the btree iterator in more than one place.
This led to a really obscure bug where we'd break early from the loop in
btree_sort_fixup() if the current key overlapped with keys in more than
one older set, and the next key it overlapped with was zero size.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Wed, 31 Jul 2013 23:29:54 +0000 (16:29 -0700)]
bcache: Fix a writeback performance regression
Background writeback works by scanning the btree for dirty data and
adding those keys into a fixed size buffer, then for each dirty key in
the keybuf writing it to the backing device.
When read_dirty() finishes and it's time to scan for more dirty data, we
need to wait for the outstanding writeback IO to finish - they still
take up slots in the keybuf (so that foreground writes can check for
them to avoid races) - without that wait, we'll continually rescan when
we'll be able to add at most a key or two to the keybuf, and that takes
locks that starves foreground IO. Doh.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
bcache: Correct printf()-style format length modifier
drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’:
drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
bcache: Strip endline when writing the label through sysfs
sysfs attributes with unusual characters have crappy failure modes
in Squeeze (udev 164); later versions of udev are unaffected.
This should make these characters more unusual.
Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 01:31:58 +0000 (18:31 -0700)]
bcache: Allocation kthread fixes
The alloc kthread should've been using try_to_freeze() - and also there
was the potential for the alloc kthread to get woken up after it had
shut down, which would have been bad.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Fri, 12 Jul 2013 02:43:21 +0000 (19:43 -0700)]
bcache: Fix GC_SECTORS_USED() calculation
Part of the job of garbage collection is to add up however many sectors
of live data it finds in each bucket, but that doesn't work very well if
it doesn't reset GC_SECTORS_USED() when it starts. Whoops.
This wouldn't have broken anything horribly, but allocation tries to
preferentially reclaim buckets that are mostly empty and that's not
gonna work with an incorrect GC_SECTORS_USED() value.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Fri, 12 Jul 2013 05:42:14 +0000 (22:42 -0700)]
bcache: Journal replay fix
The journal replay code starts by finding something that looks like a
valid journal entry, then it does a binary search over the unchecked
region of the journal for the journal entries with the highest sequence
numbers.
Trouble is, the logic was wrong - journal_read_bucket() returns true if
it found journal entries we need, but if the range of journal entries
we're looking for loops around the end of the journal - in that case
journal_read_bucket() could return true when it hadn't found the highest
sequence number we'd seen yet, and in that case the binary search did
the wrong thing. Whoops.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 04:03:25 +0000 (21:03 -0700)]
bcache: Shutdown fix
Stopping a cache set is supposed to make it stop attached backing
devices, but somewhere along the way that code got lost. Fixing this
mainly has the effect of fixing our reboot notifier.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 04:25:02 +0000 (21:25 -0700)]
bcache: Fix a sysfs splat on shutdown
If we stopped a bcache device when we were already detaching (or
something like that), bcache_device_unlink() would try to remove a
symlink from sysfs that was already gone because the bcache dev kobject
had already been removed from sysfs.
So keep track of whether we've removed stuff from sysfs.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Thu, 11 Jul 2013 01:04:21 +0000 (18:04 -0700)]
bcache: Fix a dumb race
In the far-too-complicated closure code - closures can have destructors,
for probably dubious reasons; they get run after the closure is no
longer waiting on anything but before dropping the parent ref, intended
just for freeing whatever memory the closure is embedded in.
Trouble is, when remaining goes to 0 and we've got nothing more to run -
we also have to unlock the closure, setting remaining to -1. If there's
a destructor, that unlock isn't doing anything - nobody could be trying
to lock it if we're about to free it - but if the unlock _is needed...
that check for a destructor was racy. Argh.
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Kent Overstreet [Fri, 7 Jun 2013 01:15:57 +0000 (18:15 -0700)]
bcache: Use standard utility code
Some of bcache's utility code has made it into the rest of the kernel,
so drop the bcache versions.
Bcache used to have a workaround for allocating from a bio set under
generic_make_request() (if you allocated more than once, the bios you
already allocated would get stuck on current->bio_list when you
submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT
when allocating bios under generic_make_request() so that allocation
could fail and it could retry from workqueue. But bio_alloc_bioset() has
a workaround now, so we can drop this hack and the associated error
handling.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Wed, 5 Jun 2013 13:21:07 +0000 (06:21 -0700)]
bcache: Track dirty data by stripe
To make background writeback aware of raid5/6 stripes, we first need to
track the amount of dirty data within each stripe - we do this by
breaking up the existing sectors_dirty into per stripe atomic_ts
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Sun, 12 May 2013 00:07:26 +0000 (17:07 -0700)]
bcache: Initialize sectors_dirty when attaching
Previously, dirty_data wouldn't get initialized until the first garbage
collection... which was a bit of a problem for background writeback (as
the PD controller keys off of it) and also confusing for users.
This is also prep work for making background writeback aware of raid5/6
stripes.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Sat, 11 May 2013 22:59:37 +0000 (15:59 -0700)]
bcache: Improve lazy sorting
The old lazy sorting code was kind of hacky - rewrite in a way that
mathematically makes more sense; the idea is that the size of the sets
of keys in a btree node should increase by a more or less fixed ratio
from smallest to biggest.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Wed, 15 May 2013 03:33:16 +0000 (20:33 -0700)]
bcache: Rip out pkey()/pbtree()
Old gcc doesnt like the struct hack, and it is kind of ugly. So finish
off the work to convert pr_debug() statements to tracepoints, and delete
pkey()/pbtree().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Fri, 26 Apr 2013 22:39:55 +0000 (15:39 -0700)]
bcache: Fix/revamp tracepoints
The tracepoints were reworked to be more sensible, and fixed a null
pointer deref in one of the tracepoints.
Converted some of the pr_debug()s to tracepoints - this is partly a
performance optimization; it used to be that with DEBUG or
CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
was changed to an empty inline function.
Some of the pr_debug() statements had rather expensive function calls as
part of the arguments, so this code was getting run unnecessarily even
on non debug kernels - in some fast paths, too.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Thu, 25 Apr 2013 20:58:35 +0000 (13:58 -0700)]
bcache: Refactor btree io
The most significant change is that btree reads are now done
synchronously, instead of asynchronously and doing the post read stuff
from a workqueue.
This was originally done because we can't block on IO under
generic_make_request(). But - we already have a mechanism to punt cache
lookups to workqueue if needed, so if we just use that we don't have to
deal with the complexity of doing things asynchronously.
The main benefit is this makes the locking situation saner; we can hold
our write lock on the btree node until we're finished reading it, and we
don't need that btree_node_read_done() flag anymore.
Also, for writes, btree_write() was broken out into btree_node_write()
and btree_leaf_dirty() - the old code with the boolean argument was dumb
and confusing.
The prio_blocked mechanism was improved a bit too, now the only counter
is in struct btree_write, we don't mess with transfering a count from
struct btree anymore.
This required changing garbage collection to block prios at the start
and unblock when it finishes, which is cleaner than what it was doing
anyways (the old code had mostly the same effect, but was doing it in a
convoluted way)
And the btree iter btree_node_read_done() uses was converted to a real
mempool.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Kent Overstreet [Wed, 29 May 2013 04:53:19 +0000 (21:53 -0700)]
bcache: fix a spurious gcc complaint, use scnprintf
An old version of gcc was complaining about using a const int as the
size of a stack allocated array. Which should be fine - but using
ARRAY_SIZE() is better, anyways.
Also, refactor the code to use scnprintf().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Linus Torvalds [Sat, 22 Jun 2013 19:44:45 +0000 (09:44 -1000)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are two fixes that came in this week, one for a regression we
introduced in 3.10 in the GIC interrupt code, and the other one fixes
a typo in newly introduced code"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
irqchip: gic: call gic_cpu_init() as well in CPU_STARTING_FROZEN case
ARM: dts: Correct the base address of pinctrl_3 on Exynos5250
Linus Torvalds [Sat, 22 Jun 2013 19:02:44 +0000 (09:02 -1000)]
Merge tag 'driver-core-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg Kroah-Hartman:
"Here's a single patch for the firmware core that resolves a reported
oops in the firmware core that people have been hitting."
* tag 'driver-core-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
firmware loader: fix use-after-free by double abort
Linus Torvalds [Sat, 22 Jun 2013 19:01:47 +0000 (09:01 -1000)]
Merge tag 'usb-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are two USB patches for 3.10.
One updates the Kconfig wording for CONFIG_USB_PHY to make it,
hopefully, more obvious what this option is (I know you complained
about this when it hit the tree.) The other is a new device id for a
driver"
* tag 'usb-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: ti_usb_3410_5052: new device id for Abbot strip port cable
usb: phy: Improve Kconfig help for CONFIG_USB_PHY
Linus Torvalds [Sat, 22 Jun 2013 19:00:28 +0000 (09:00 -1000)]
Merge tag 'tty-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pul tty fixes from Greg Kroah-Hartman:
"Here are two tty core fixes that resolve some regressions that have
been reported recently. Both tiny fixes, but needed"
* tag 'tty-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix transient pty write() EIO
tty/vt: Return EBUSY if deallocating VT1 and it is busy
Pull SCSI target fixes from Nicholas Bellinger:
"Included is the recent tcm_qla2xxx residual underrun length fix from
Roland, along with Joern's iscsi-target patch for session_lock
breakage within iscsit_stop_time2retain_timer() code. Both are CC'ed
to stable.
The remaining two are specific to recent iscsi-target + iser
conversion changes. One drops some left-over debug noise, and Andy's
patch fixes configfs attribute handling during an explicit network
portal feature bit disable when iser-target is unsupported."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi-target: Remove left over v3.10-rc debug printks
target/iscsi: Fix op=disable + error handling cases in np_store_iser
tcm_qla2xxx: Fix residual for underrun commands that fail
target/iscsi: don't corrupt bh_count in iscsit_stop_time2retain_timer()
Linus Torvalds [Sat, 22 Jun 2013 18:43:17 +0000 (08:43 -1000)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Another set of fixes for Kernel 3.10.
This series contain:
- two Kbuild fixes for randconfig
- a buffer overflow when using rtl28xuu with r820t tuner
- one clk fixup on exynos4-is driver"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] Fix build when drivers are builtin and frontend modules
[media] s5p makefiles: don't override other selections on obj-[ym]
[media] exynos4-is: Fix FIMC-IS clocks initialization
[media] rtl28xxu: fix buffer overflow when probing Rafael Micro r820t tuner
Linus Torvalds [Sat, 22 Jun 2013 18:42:20 +0000 (08:42 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Several fixes for bugs caught while looking through f_pos (ab)users"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aout32 coredump compat fix
splice: don't pass the address of ->f_pos to methods
mconsole: we'd better initialize pos before passing it to vfs_read()...
Al Viro [Sat, 22 Jun 2013 07:01:38 +0000 (11:01 +0400)]
aout32 coredump compat fix
dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout
handles it correctly (seeks by PAGE_SIZE - sizeof(struct user),
getting the current position to PAGE_SIZE), compat one seeks
by PAGE_SIZE and ends up at PAGE_SIZE + already written...
Linus Torvalds [Fri, 21 Jun 2013 16:33:48 +0000 (06:33 -1000)]
Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"This series fixes a couple of build failures, and fixes MTRR cleanup
and memory setup on very specific memory maps.
Finally, it fixes triggering backtraces on all CPUs, which was
inadvertently disabled on x86."
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Fix dummy variable buffer allocation
x86: Fix trigger_all_cpu_backtrace() implementation
x86: Fix section mismatch on load_ucode_ap
x86: fix build error and kconfig for ia32_emulation and binfmt
range: Do not add new blank slot with add_range_with_merge
x86, mtrr: Fix original mtrr range get for mtrr_cleanup
Linus Torvalds [Fri, 21 Jun 2013 16:33:06 +0000 (06:33 -1000)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm radeon fixes from Dave Airlie:
"One core fix, but mostly radeon fixes for s/r and big endian UVD
support, and a fix to stop the GPU being reset for no good reason, and
crashing people's machines."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: update lockup tracking when scheduling in empty ring
drm/prime: Honor requested file flags when exporting a buffer
drm/radeon: fix UVD on big endian
drm/radeon: fix write back suspend regression with uvd v2
drm/radeon: do not try to uselessly update virtual memory pagetable
Linus Torvalds [Fri, 21 Jun 2013 16:31:10 +0000 (06:31 -1000)]
Merge tag 'acpi-3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
- Fix for a regression causing a failure to turn on some devices on
some systems during initialization introduced by a recent revert of
an ACPI PM change that broke something else. Fortunately, we know
exactly what devices are affected, so we can add a fix just for them
leaving everyone else alone.
- ACPI power resources initialization fix preventing a NULL pointer
from being dereferenced in the acpi_add_power_resource() error code
path.
- ACPI dock station driver fix that adds missing locking to
write_undock().
- ACPI resources allocation fix changing the scope of an old workaround
so that it doesn't affect systems that aren't actually buggy. This
was reported a couple of days ago to fix DMA problems on some new
platforms so we need it in -stable. From Mika Westerberg.
* tag 'acpi-3.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / LPSS: Power up LPSS devices during enumeration
ACPI / PM: Fix error code path for power resources initialization
ACPI / dock: Take ACPI scan lock in write_undock()
ACPI / resources: call acpi_get_override_irq() only for legacy IRQ resources
Linus Torvalds [Fri, 21 Jun 2013 16:29:22 +0000 (06:29 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Three one-line fixes for my first pull request; one for x86 host, one
for x86 guest, one for PPC"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86: kvmclock: zero initialize pvclock shared memory area
kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
KVM: x86: remove vcpu's CPL check in host-invoked XCR set
Linus Torvalds [Fri, 21 Jun 2013 16:27:40 +0000 (06:27 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
"This fixes a problem preventing the kernel and userland librbd
libraries from sharing data with the new format 2 images"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: use the correct length for format 2 object names
H. Peter Anvin [Fri, 21 Jun 2013 10:01:21 +0000 (03:01 -0700)]
Merge tag 'efi-urgent' into x86/urgent
* Don't leak random kernel memory to EFI variable NVRAM when attempting
to initiate garbage collection. Also, free the kernel memory when
we're done with it instead of leaking - Ben Hutchings
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Andy Grover [Wed, 29 May 2013 19:05:59 +0000 (12:05 -0700)]
target/iscsi: Fix op=disable + error handling cases in np_store_iser
Writing 0 when iser was not previously enabled, so succeed but do
nothing so that user-space code doesn't need a try: catch block
when ib_isert logic is not available.
Also, return actual error from add_network_portal using PTR_ERR
during op=enable failure.
Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Dave Airlie [Thu, 20 Jun 2013 22:52:19 +0000 (08:52 +1000)]
Merge branch 'drm-fixes-3.10' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
One user visible fix to stop misreport GPU hangs and subsequent resets.
* 'drm-fixes-3.10' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: update lockup tracking when scheduling in empty ring
Jerome Glisse [Wed, 19 Jun 2013 14:02:28 +0000 (10:02 -0400)]
drm/radeon: update lockup tracking when scheduling in empty ring
There might be issue with lockup detection when scheduling on an
empty ring that have been sitting idle for a while. Thus update
the lockup tracking data when scheduling new work in an empty ring.
Signed-off-by: Jerome Glisse <jglisse@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Thu, 20 Jun 2013 18:18:35 +0000 (08:18 -1000)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two smaller fixes - plus a context tracking tracing fix that is a bit
bigger"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/context-tracking: Add preempt_schedule_context() for tracing
sched: Fix clear NOHZ_BALANCE_KICK
sched/x86: Construct all sibling maps if smt
Linus Torvalds [Thu, 20 Jun 2013 18:17:36 +0000 (08:17 -1000)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Four fixes. The mmap ones are unfortunately larger than desired -
fuzzing uncovered bugs that needed perf context life time management
changes to fix properly"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
perf: Fix mmap() accounting hole
perf: Fix perf mmap bugs
kprobes: Fix to free gone and unused optprobes
Linus Torvalds [Thu, 20 Jun 2013 18:16:07 +0000 (08:16 -1000)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu idle fixes from Thomas Gleixner:
- Add a missing irq enable. Fallout of the idle conversion
- Fix stackprotector wreckage caused by the idle conversion
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
idle: Enable interrupts in the weak arch_cpu_idle() implementation
idle: Add the stack canary init to cpu_startup_entry()
Linus Torvalds [Thu, 20 Jun 2013 18:15:13 +0000 (08:15 -1000)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
- Fix inconstinant clock usage in virtual time accounting
- Fix a build error in KVM caused by the NOHZ work
- Remove a pointless timekeeping duty assignment which breaks NOHZ
- Use a proper notifier return value to avoid random behaviour
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick: Remove useless timekeeping duty attribution to broadcast source
nohz: Fix notifier return val that enforce timekeeping
kvm: Move guest entry/exit APIs to context_tracking
vtime: Use consistent clocks among nohz accounting
Linus Torvalds [Thu, 20 Jun 2013 18:08:46 +0000 (08:08 -1000)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fix fro, Benjamin Herrenschmidt:
"We accidentally broke hugetlbfs on Freescale embedded processors which
use a slightly different page table layout than our server processors"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix bad pmd error with book3E config
Linus Torvalds [Thu, 20 Jun 2013 18:07:42 +0000 (08:07 -1000)]
Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tilepro fix from Chris Metcalf:
"This change allows the older tilepro architecture to be correctly
built by newer gccs, despite a change that caused gcc to start trying
to use an out-of-line implementation for __builtin_ffsll().
This should be inline again starting with gcc 4.7.4 and 4.8.2 or so,
but meanwhile this change keeps things from breaking, with the only
cost being a few bytes of code in the kernel to provide __ffsdi2 even
for compilers that do inline it"
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tilepro: work around module link error with gcc 4.7
[media] Fix build when drivers are builtin and frontend modules
There are a large number of reports that the media build is
not compiling when some drivers are compiled as builtin, while
the needed frontends are compiled as module.
On the last one of such reports:
From: kbuild test robot <fengguang.wu@intel.com>
Subject: saa7134-dvb.c:undefined reference to `zl10039_attach'
The .config file has:
CONFIG_VIDEO_SAA7134=y
CONFIG_VIDEO_SAA7134_DVB=y
# CONFIG_MEDIA_ATTACH is not set
CONFIG_DVB_ZL10039=m
And it produces all those errors:
drivers/built-in.o: In function `set_type':
tuner-core.c:(.text+0x2f263e): undefined reference to `tea5767_attach'
tuner-core.c:(.text+0x2f273e): undefined reference to `tda9887_attach'
drivers/built-in.o: In function `tuner_probe':
tuner-core.c:(.text+0x2f2d20): undefined reference to `tea5767_autodetection'
drivers/built-in.o: In function `av7110_attach':
av7110.c:(.text+0x330bda): undefined reference to `ves1x93_attach'
av7110.c:(.text+0x330bf7): undefined reference to `stv0299_attach'
av7110.c:(.text+0x330c63): undefined reference to `tda8083_attach'
av7110.c:(.text+0x330d09): undefined reference to `ves1x93_attach'
av7110.c:(.text+0x330d33): undefined reference to `tda8083_attach'
av7110.c:(.text+0x330d5d): undefined reference to `stv0297_attach'
av7110.c:(.text+0x330dbe): undefined reference to `stv0299_attach'
drivers/built-in.o: In function `tuner_attach_dtt7520x':
ngene-cards.c:(.text+0x3381cb): undefined reference to `dvb_pll_attach'
drivers/built-in.o: In function `demod_attach_lg330x':
ngene-cards.c:(.text+0x33828a): undefined reference to `lgdt330x_attach'
drivers/built-in.o: In function `demod_attach_stv0900':
ngene-cards.c:(.text+0x3383d5): undefined reference to `stv090x_attach'
drivers/built-in.o: In function `cineS2_probe':
ngene-cards.c:(.text+0x338b7f): undefined reference to `drxk_attach'
drivers/built-in.o: In function `configure_tda827x_fe':
saa7134-dvb.c:(.text+0x346ae7): undefined reference to `tda10046_attach'
drivers/built-in.o: In function `dvb_init':
saa7134-dvb.c:(.text+0x347283): undefined reference to `mt352_attach'
saa7134-dvb.c:(.text+0x3472cd): undefined reference to `mt352_attach'
saa7134-dvb.c:(.text+0x34731c): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x34733c): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x34735c): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x347378): undefined reference to `tda10046_attach'
saa7134-dvb.c:(.text+0x3473db): undefined reference to `tda10046_attach'
drivers/built-in.o:saa7134-dvb.c:(.text+0x347502): more undefined references to `tda10046_attach' follow
drivers/built-in.o: In function `dvb_init':
saa7134-dvb.c:(.text+0x347812): undefined reference to `mt352_attach'
saa7134-dvb.c:(.text+0x347951): undefined reference to `mt312_attach'
saa7134-dvb.c:(.text+0x3479a9): undefined reference to `mt312_attach'
>> saa7134-dvb.c:(.text+0x3479c1): undefined reference to `zl10039_attach'
This is happening because a builtin module can't use directly a symbol
found on a module. By enabling CONFIG_MEDIA_ATTACH, the configuration
becomes valid, as dvb_attach() macro loads the module if needed, making
the symbol available to the builtin module.
While this bug started to appear after the patches that use IS_DEFINED
macro (like changeset 7b34be71db533f3e0cf93d53cf62d036cdb5418a), this
bug is a way ancient than that.
The thing is that, before the IS_DEFINED() patches, the logic used to be:
The above code, with the .config file used, was evoluting to FALSE
(instead of TRUE as it should be, as CONFIG_DVB_ZL10039 is 'm'),
and were adding the static inline code at saa7134-dvb, instead
of the external call. So, while it weren't producing any compilation
error, the code weren't working either.
So, as the overhead for using CONFIG_MEDIA_ATTACH is minimal, just
enable it, if MODULES is defined.
Shawn Guo [Wed, 12 Jun 2013 11:30:27 +0000 (19:30 +0800)]
irqchip: gic: call gic_cpu_init() as well in CPU_STARTING_FROZEN case
Commit c011470 (irqchip: gic: Perform the gic_secondary_init() call via
CPU notifier) moves gic_secondary_init() that used to be called in
.smp_secondary_init hook into a notifier call. But it changes the
system behavior a little bit. Before the commit, gic_cpu_init()
is called not only when kernel brings up the secondary cores but also
when system resuming procedure hot-plugs the cores back to kernel.
While after the commit, the function will not be called in the latter
case, where the 'action' will not be CPU_STARTING but
CPU_STARTING_FROZEN. This behavior difference at least causes the
following suspend/resume regression on imx6q.
$ echo mem > /sys/power/state
PM: Syncing filesystems ... done.
PM: Preparing system for mem sleep
mmc1: card e624 removed
Freezing user space processes ... (elapsed 0.01 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
PM: Entering mem sleep
PM: suspend of devices complete after 5.930 msecs
PM: suspend devices took 0.010 seconds
PM: late suspend of devices complete after 0.343 msecs
PM: noirq suspend of devices complete after 0.828 msecs
Disabling non-boot CPUs ...
CPU1: shutdown
CPU2: shutdown
CPU3: shutdown
Enabling non-boot CPUs ...
CPU1: Booted secondary processor
INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3} (detected by 0, t=2102 jiffies, g=4294967169, c=4294967168, q=17)
Task dump for CPU 1:
swapper/1 R running 0 0 1 0x00000000
Backtrace:
[<bf895ff4>] (0xbf895ff4) from [<00000000>] ( (null))
Backtrace aborted due to bad frame pointer <8007ccdc>
Task dump for CPU 2:
swapper/2 R running 0 0 1 0x00000000
Backtrace:
[<8075dbdc>] (0x8075dbdc) from [<00000000>] ( (null))
Backtrace aborted due to bad frame pointer <00000002>
Task dump for CPU 3:
swapper/3 R running 0 0 1 0x00000000
Backtrace:
[<8075dbdc>] (0x8075dbdc) from [<00000000>] ( (null))
Fix the regression by checking 'action' being CPU_STARTING_FROZEN to
have gic_cpu_init() called for secondary cores when system resumes.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Joseph Lo <josephl@nvidia.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.
trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.
x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.
I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.
Tested via: echo l > /proc/sysrq-trigger
Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )
After the change, this shows NMI backtraces on all CPUs
Jed Davis [Thu, 20 Jun 2013 03:07:14 +0000 (04:07 +0100)]
perf: arm64: Record the user-mode PC in the call chain.
With this change, we no longer lose the innermost entry in the user-mode
part of the call chain. See also the x86 port, which includes the ip,
and the corresponding change in arch/arm.
Signed-off-by: Jed Davis <jld@mozilla.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[media] s5p makefiles: don't override other selections on obj-[ym]
The $obj-m/$obj-y vars should be adding new modules to build, not
overriding it. So, it should never use
$obj-y := foo.o
instead, it should use:
$obj-y += foo.o
Failing to do that is very bad, as it will suppress needed modules.
Book3E uses the hugepd at PMD level and don't encode pte directly
at the pmd level. So it will find the lower bits of pmd set
and the pmd_bad check throws error. Infact the current code
will never take the free_hugepd_range call at all because it will
clear the pmd if it find a hugepd pointer. Fix this by clearing
bad pmd only if it is not a hugepd pointer.
ACPI / LPSS: Power up LPSS devices during enumeration
Commit 7cd8407 (ACPI / PM: Do not execute _PS0 for devices without
_PSC during initialization) introduced a regression on some systems
with Intel Lynxpoint Low-Power Subsystem (LPSS) where some devices
need to be powered up during initialization, but their device objects
in the ACPI namespace have _PS0 and _PS3 only (without _PSC or power
resources).
To work around this problem, make the ACPI LPSS driver power up
devices it knows about by using a new helper function
acpi_device_fix_up_power() that does all of the necessary
sanity checks and calls acpi_dev_pm_explicit_set() to put the
device into D0.
Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPI / PM: Fix error code path for power resources initialization
Commit 781d737 (ACPI: Drop power resources driver) introduced a
bug in the power resources initialization error code path causing
a NULL pointer to be referenced in acpi_release_power_resource()
if there's an error triggering a jump to the 'err' label in
acpi_add_power_resource(). This happens because the list_node
field of struct acpi_power_resource has not been initialized yet
at this point and doing a list_del() on it is a bad idea.
To prevent this problem from occuring, initialize the list_node
field of struct acpi_power_resource upfront.
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.9+ <stable@vger.kernel.org> Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
ACPI / dock: Take ACPI scan lock in write_undock()
Since commit 3757b94 (ACPI / hotplug: Fix concurrency issues and
memory leaks) acpi_bus_scan() and acpi_bus_trim() must always be
called under acpi_scan_lock, but currently the following scenario
violating that requirement is possible:
Fix that by making write_undock() acquire acpi_scan_lock before
calling handle_eject_request() as appropriate (begin_undock() is
under the lock too in analogy with acpi_dock_deferred_cb()).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 3.9+ <stable@vger.kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com>
Mika Westerberg [Mon, 20 May 2013 15:41:45 +0000 (15:41 +0000)]
ACPI / resources: call acpi_get_override_irq() only for legacy IRQ resources
acpi_get_override_irq() was added because there was a problem with
buggy BIOSes passing wrong IRQ() resource for the RTC IRQ. The
commit that added the workaround was 61fd47e0c8476 (ACPI: fix two
IRQ8 issues in IOAPIC mode).
With ACPI 5 enumerated devices there are typically one or more
extended IRQ resources per device (and these IRQs can be shared).
However, the acpi_get_override_irq() workaround forces all IRQs in
range 0 - 15 (the legacy ISA IRQs) to be edge triggered, active high
as can be seen from the dmesg below:
ACPI: IRQ 6 override to edge, high
ACPI: IRQ 7 override to edge, high
ACPI: IRQ 7 override to edge, high
ACPI: IRQ 13 override to edge, high
Also /proc/interrupts for the I2C controllers (INT33C2 and INT33C3) shows
the same thing:
7: 4 0 0 0 IO-APIC-edge INT33C2:00, INT33C3:00
The _CSR method for INT33C2 (and INT33C3) device returns following
resource:
which states that this is supposed to be level triggered, active low,
shared IRQ instead.
Fix this by making sure that acpi_get_override_irq() gets only called
when we are dealing with legacy IRQ() or IRQNoFlags() descriptors.
While we are there, correct pr_warning() to print the right triggering
value.
This change turns out to be necessary to make DMA work correctly on
systems based on the Intel Lynxpoint PCH (Platform Controller Hub).
[rjw: Changelog] Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: 3.9+ <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Paul Gortmaker [Wed, 19 Jun 2013 15:15:26 +0000 (11:15 -0400)]
x86: Fix section mismatch on load_ucode_ap
We are in the process of removing all the __cpuinit annotations.
While working on making that change, an existing problem was
made evident:
WARNING: arch/x86/kernel/built-in.o(.text+0x198f2): Section mismatch
in reference from the function cpu_init() to the function
.init.text:load_ucode_ap() The function cpu_init() references
the function __init load_ucode_ap(). This is often because cpu_init
lacks a __init annotation or the annotation of load_ucode_ap is wrong.
This now appears because in my working tree, cpu_init() is no longer
tagged as __cpuinit, and so the audit picks up the mismatch. The 2nd
hypothesis from the audit is the correct one, as there was an incorrect
__init tag on the prototype in the header (but __cpuinit was used on
the function itself.)
The audit is telling us that the prototype's __init annotation took
effect and the function did land in the .init.text section. Checking
with objdump on a mainline tree that still has __cpuinit shows that
the __cpuinit on the function takes precedence over the __init on the
prototype, but that won't be true once we make __cpuinit a no-op.
Even though we are removing __cpuinit, we temporarily align both
the function and the prototype on __cpuinit so that the changeset
can be applied to stable trees if desired.
[ hpa: build fix only, no object code change ]
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: stable <stable@vger.kernel.org> # 3.9+ Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1371654926-11729-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
David Daney [Mon, 17 Jun 2013 15:46:07 +0000 (08:46 -0700)]
mn10300: Fix include dependency in irqflags.h et al.
We need to pick up the definition of raw_smp_processor_id() from
asm/smp.h. For the !SMP case, we need to supply a definition of
raw_smp_processor_id().
Because of the include dependencies we cannot use smp_call_func_t in
asm/smp.h, but we do need linux/thread_info.h
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sparc fixes from David Miller:
"Various sparc bug fixes, in particular:
1) TSB hashes have to be flushed before TLB on sparc64, from Dave
Kleikamp.
2) LEON timer interrupts can get stuck, from Andreas Larsson.
3) Sparc64 needs to handle lack of address-congruence devicetree
property, from Bob Picco"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: tsb must be flushed before tlb
sparc,leon: Convert to use devm_ioremap_resource
sparc64 address-congruence property
sparc32, leon: Enable interrupts before going idle to avoid getting stuck
sparc32, leon: Remove separate "ticker" timer for SMP
sparc: kernel: using strlcpy() instead of strcpy()
arch: sparc: prom: looping issue, need additional length check in the outside looping
sparc: remove inline marking of EXPORT_SYMBOL functions
sparc: Switch to asm-generic/linkage.h
Linus Torvalds [Wed, 19 Jun 2013 16:23:56 +0000 (06:23 -1000)]
Merge branch 'parisc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"This contains a kernel segfault fix when reading /proc/kpageflags or
/proc/kpagecount, two fixes for the serial port and PCI graphic card
support on C8000 workstations and a fix to use unshadowed registers
for flushing D- and I-caches."
* 'parisc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Use unshadowed index register for flush instructions in flush_dcache_page_asm and flush_icache_page_asm
parisc: provide pci_mmap_page_range() for parisc
parisc: fix serial ports on C8000 workstation
parisc: fix kernel BUG at arch/parisc/include/asm/mmzone.h:50 (part 2)