Chris Wilson [Fri, 19 Aug 2016 15:54:28 +0000 (16:54 +0100)]
drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass
Very old numbers indicate this is a 66% improvement when remapping the
entire object for fence contention - due to the elimination of
track_pfn_insert and its strcmp.
Chris Wilson [Fri, 19 Aug 2016 15:54:26 +0000 (16:54 +0100)]
io-mapping: Always create a struct to hold metadata about the io-mapping
Currently, we only allocate a structure to hold metadata if we need to
allocate an ioremap for every access, such as on x86-32. However, it
would be useful to store basic information about the io-mapping, such as
its page protection, on all platforms.
Chris Wilson [Fri, 19 Aug 2016 15:54:24 +0000 (16:54 +0100)]
drm/i915/fbc: Don't set an illegal fence if unfenced
If the frontbuffer doesn't have an associated fence, it will have a
fence reg of -1. If we attempt to OR in this register into the FBC
control register we end up setting all control bits, oops!
Chris Wilson [Fri, 19 Aug 2016 15:54:23 +0000 (16:54 +0100)]
drm/i915: Flush delayed fence releases after reset
What I never hit in testing, but Mika immediately did, was a GPU hang
with a pending fence release (where a tiled object has been changed by
the user to be untiled, and the update has not yet been committed to the
fence register). As the stride/tiling is 0, this causes a divide-by-zero
error when trying to write the new fence parameters:
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Fixes: 49ef5294cda2 ("drm/i915: Move fence tracking from object to vma") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-1-chris@chris-wilson.co.uk
Dave Gordon [Fri, 19 Aug 2016 14:23:42 +0000 (15:23 +0100)]
drm/i915: Reattach comment, complete type specification
In the recent patch bc3d674 drm/i915: Allow userspace to request no-error-capture upon ...
the final version moved the flags and the associated #defines around
so they were adjacent; unfortunately, they ended up between a comment
and the thing (hw_id) to which the comment applies :(
So this patch reshuffles the comment and subject back together.
Also, as we're touching 'hw_id', let's change it from just 'unsigned'
to a fully-specified 'unsigned int', because some code checking tools
(including checkpatch) object to plain 'unsigned'.
Chris Wilson [Thu, 18 Aug 2016 16:17:17 +0000 (17:17 +0100)]
drm/i915/cmdparser: Use binary search for faster register lookup
A significant proportion of the cmdparsing time for some batches is the
cost to find the register in the mmiotable. We ensure that those tables
are in ascending order such that we could do a binary search if it was
ever merited. It is.
Chris Wilson [Thu, 18 Aug 2016 16:17:15 +0000 (17:17 +0100)]
drm/i915/cmdparser: Compare against the previous command descriptor
On the blitter (and in test code), we see long sequences of repeated
commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
we can skip the hashtable lookup by remembering the previous command
descriptor and doing a straightforward compare of the command header.
The corollary is that we need to do one extra comparison before lookup
up new commands.
v2: Less magic mask (ok, it is still magic, but now you cannot see!)
Chris Wilson [Thu, 18 Aug 2016 16:17:14 +0000 (17:17 +0100)]
drm/i915/cmdparser: Improve hash function
The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:
/*
* If we attempt to generate a perfect hash, we should be able to look at bits
* 31:29 of a command from a batch buffer and use the full mask for that
* client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
*/
Chris Wilson [Thu, 18 Aug 2016 16:17:13 +0000 (17:17 +0100)]
drm/i915/cmdparser: Only cache the dst vmap
For simplicity, we want to continue using a contiguous mapping of the
command buffer, but we can reduce the number of vmappings we hold by
switching over to a page-by-page copy from the user batch buffer to the
shadow. The cost for saving one linear mapping is about 5% in trivial
workloads - which is more or less the overhead in calling kmap_atomic().
Chris Wilson [Thu, 18 Aug 2016 16:17:12 +0000 (17:17 +0100)]
drm/i915/cmdparser: Use cached vmappings
The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine how best we
need to clflush the range (rather than every page of the object).
The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.
Chris Wilson [Thu, 18 Aug 2016 16:17:11 +0000 (17:17 +0100)]
drm/i915/cmdparser: Add the TIMESTAMP register for the other engines
Since I have been using the BCS_TIMESTAMP to measure latency of
execution upon the blitter ring, allow regular userspace to also read
from that register. They are already allowed RCS_TIMESTAMP!
Chris Wilson [Thu, 18 Aug 2016 16:17:10 +0000 (17:17 +0100)]
drm/i915/cmdparser: Make initialisation failure non-fatal
If the developer adds a register in the wrong order, we BUG during boot.
That makes development and testing very difficult. Let's be a bit more
friendly and disable the command parser with a big warning if the tables
are invalid.
Chris Wilson [Thu, 18 Aug 2016 16:17:09 +0000 (17:17 +0100)]
drm/i915: Stop discarding GTT cache-domain on unbind vma
Since commit 43566dedde54 ("drm/i915: Broaden application of
set-domain(GTT)") we allowed objects to be in the GTT domain, but unbound.
Therefore removing the GTT cache domain when removing the GGTT vma is no
longer semantically correct.
An unfortunate side-effect is we lose the wondrously named
i915_gem_object_finish_gtt(), not to be confused with
i915_gem_gtt_finish_object()!
Chris Wilson [Thu, 18 Aug 2016 16:17:08 +0000 (17:17 +0100)]
drm/i915: Bump the inactive tracking for all VMA accessed
We track the LRU access for eviction and bump the last access for the
user GGTT on set-to-gtt. When we do so we need to not only bump the
primary GGTT VMA but all partials as well. Similarly we want to
bump the last access tracking for when unpinning an object from the
scanout so that they do not get promptly evicted and hopefully remain
available for reuse on the next frame.
Chris Wilson [Thu, 18 Aug 2016 16:17:07 +0000 (17:17 +0100)]
drm/i915: Track display alignment on VMA
When using the aliasing ppgtt and pageflipping with the shrinker/eviction
active, we note that we often have to rebind the backbuffer before
flipping onto the scanout because it has an invalid alignment. If we
store the worst-case alignment required for a VMA, we can avoid having
to rebind at critical junctures.
Chris Wilson [Thu, 18 Aug 2016 16:17:06 +0000 (17:17 +0100)]
drm/i915: Fallback to using unmappable memory for scanout
The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and hoping for the best. (Any userspace that is capable of
using ginormous scanouts is also likely not to rely on pure GTT
updates.) With the partial vma fault support, we are no longer
restricted to only using scanouts that we can pin (though it is still
preferred for performance reasons and for powersaving features like
FBC).
v2: Skip fence pinning when not mappable.
v3: Add a comment to explain the possible ramifications of not being
able to use fences for unmappable scanouts.
v4: Rebase to skip over some local patches
v5: Rebase to defer until after we have unmappable GTT fault support
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk
Chris Wilson [Thu, 18 Aug 2016 16:17:05 +0000 (17:17 +0100)]
drm/i915: Choose not to evict faultable objects from the GGTT
Often times we do not want to evict mapped objects from the GGTT as
these are quite expensive to teardown and frequently reused (causing an
equally, if not more so, expensive setup). In particular, when faulting
in a new object we want to avoid evicting an active object, or else we
may trigger a page-fault-of-doom as we ping-pong between evicting two
objects.
Chris Wilson [Thu, 18 Aug 2016 16:17:04 +0000 (17:17 +0100)]
drm/i915: Drop ORIGIN_GTT for untracked GTT writes
If FBC is set on a framebuffer that is unmapped, all GTT faults will be
from a partial mapping. Writes by the user through the partial VMA are
then untracked by the FBC and so we must use the ORIGIN_CPU when flushing
the I915_GEM_DOMAIN_GTT.
Chris Wilson [Thu, 18 Aug 2016 16:17:03 +0000 (17:17 +0100)]
drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object
If we want to create a partial vma from a chunk that is the same size as
the object, create a normal ggtt vma instead. The benefit is that it
will match future requests for the normal ggtt.
Chris Wilson [Thu, 18 Aug 2016 16:17:02 +0000 (17:17 +0100)]
drm/i915: Fix partial GGTT faulting
We want to always use the partial VMA as a fallback for a failure to
bind the object into the GGTT. This extends the support partial objects
in the GGTT to cover everything, not just objects too large.
Chris Wilson [Thu, 18 Aug 2016 16:17:01 +0000 (17:17 +0100)]
drm/i915: Choose partial chunksize based on tile row size
In order to support setting up fences for partial mappings of an object,
we have to align those mappings with the fence. The minimum chunksize we
choose is at least the size of a single tile row.
v2: Make minimum chunk size a define for later use
Chris Wilson [Thu, 18 Aug 2016 16:16:59 +0000 (17:16 +0100)]
drm/i915: Rename fence.lru_list to link
Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.
Chris Wilson [Thu, 18 Aug 2016 16:16:58 +0000 (17:16 +0100)]
drm/i915/userptr: Make gup errors stickier
Keep any error reported by the gup_worker until we are notified that the
arena has changed (via the mmu-notifier). This has the importance of
making two consecutive calls to i915_gem_object_get_pages() reporting
the same error, and curtailing a loop of detecting a fault and requeueing
a gup_worker.
Chris Wilson [Thu, 18 Aug 2016 16:16:57 +0000 (17:16 +0100)]
drm/i915: Allocate rings from stolen
If we have stolen available, make use of it for ringbuffer allocation.
Previously this was restricted to !llc platforms, as writing to stolen
requires a GGTT mapping - but now that we have partial mappable support,
the mappable aperture isn't quite so precious so we can use it more
freely and ringbuffers are a good user for the otherwise wasted stolen.
Chris Wilson [Thu, 18 Aug 2016 16:16:56 +0000 (17:16 +0100)]
drm/i915: Allow ringbuffers to be bound anywhere
Now that we have WC vmapping available, we can bind our rings anywhere
in the GGTT and do not need to restrict them to the mappable region.
Except for stolen objects, for which direct access is verbatim and we
must use the mappable aperture.
Chris Wilson [Thu, 18 Aug 2016 16:16:55 +0000 (17:16 +0100)]
drm/i915: Move map-and-fenceable tracking to the VMA
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Chris Wilson [Thu, 18 Aug 2016 16:16:53 +0000 (17:16 +0100)]
drm/i915: Fallback to single page GTT mmappings for relocations
If we cannot pin the entire object into the mappable region of the GTT,
try to pin a single page instead. This is much more likely to succeed,
and prevents us falling back to the clflush slow path.
Chris Wilson [Thu, 18 Aug 2016 16:16:52 +0000 (17:16 +0100)]
drm/i915: Refactor execbuffer relocation writing
With the introduction of the reloc page cache, we are just one step away
from refactoring the relocation write functions into one. Not only does
it tidy the code (slightly), but it greatly simplifies the control logic
much to gcc's satisfaction.
v2: Add selftests to document the relationship between the clflush
flags, the KMAP bit and packing into the page-aligned pointer.
Chris Wilson [Thu, 18 Aug 2016 16:16:50 +0000 (17:16 +0100)]
drm/i915: Pin the pages first in shmem prepare read/write
There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody via the shrinker
may steal the lock (which lock? right now, it is struct_mutex, THE lock)
and change the cache domains after we have already inspected them.
(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the two functions look more similar.)
Chris Wilson [Thu, 18 Aug 2016 16:16:49 +0000 (17:16 +0100)]
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
Chris Wilson [Thu, 18 Aug 2016 16:16:48 +0000 (17:16 +0100)]
drm/i915: Before accessing an object via the cpu, flush GTT writes
If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.
v2: Add i915_gem_obj_finish_shmem_access() for symmetry
Chris Wilson [Thu, 18 Aug 2016 16:16:46 +0000 (17:16 +0100)]
drm/i915: Cache kmap between relocations
When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.
Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!
Chris Wilson [Thu, 18 Aug 2016 16:16:45 +0000 (17:16 +0100)]
drm/i915: Fallback to single page pwrite/pread if unable to release fence
If we cannot release the fence (for example if someone is inexplicably
trying to write into a tiled framebuffer that is currently pinned to the
display! *cough* kms_frontbuffer_tracking *cough*) fallback to using the
page-by-page pwrite/pread interface, rather than fail the syscall
entirely.
Since this is triggerable by the user (along pwrite) we have to remove
the WARN_ON(fence->pin_count).
Chris Wilson [Thu, 18 Aug 2016 16:16:44 +0000 (17:16 +0100)]
drm/i915: Mark up the GTT flush following WC writes as ORIGIN_CPU
Similarly to invalidating beforehand, if the object is mmapped via
I915_MMAP_WC we cannot track writes through the I915_GEM_DOMAIN_GTT. At
the conclusion of the write, i915_gem_object_flush_gtt_writes() we also
need to treat the origin carefully in case it may have been untracked.
See also commit aeecc9696aa0 ("drm/i915: use ORIGIN_CPU for frontbuffer
invalidation on WC mmaps").
Chris Wilson [Thu, 18 Aug 2016 16:16:43 +0000 (17:16 +0100)]
drm/i915: Use ORIGIN_CPU for fb invalidation from pwrite
As pwrite does not use the fence for its GTT access, and may even go
through a secondary interface avoiding the main VMA, we cannot treat the
write as automatically invalidated by the hardware and so we require
ORIGIN_CPU frontbufer invalidate/flushes.
Chris Wilson [Thu, 18 Aug 2016 16:16:41 +0000 (17:16 +0100)]
agp/intel: Flush chipset writes after updating a single PTE
After we update one PTE for a page, the caller expects to be able to
immediately use that through a GGTT read/write. To comply with the
callers expectations we therefore need to flush the chipset buffers
before returning.
Reported-by: Matti Hämäläinen <ccr@tnsp.org> Fixes: d6473f566417 ("drm/i915: Add support for mapping an object page...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Tested-by: Matti Hämäläinen <ccr@tnsp.org> Cc: drm-intel-fixes@lists.freedesktop.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-2-chris@chris-wilson.co.uk
Chris Wilson [Thu, 18 Aug 2016 16:16:40 +0000 (17:16 +0100)]
drm/i915: Unconditionally flush any chipset buffers before execbuf
If userspace is asynchronously streaming into the batch or other
execobjects, we may not flush those writes along with a change in cache
domain (as there is no change). Therefore those writes may end up in
internal chipset buffers and not visible to the GPU upon execution. We
must issue a flush command or otherwise we encounter incoherency in the
batchbuffers and the GPU executing invalid commands (i.e. hanging) quite
regularly.
v2: Throw a paranoid wmb() into the general flush so that we remain
consistent with before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841 Fixes: 1816f9236303 ("drm/i915: Support creation of unbound wc user...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Tested-by: Matti Hämäläinen <ccr@tnsp.org> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
Matt Roper [Fri, 17 Jun 2016 20:42:20 +0000 (13:42 -0700)]
drm/i915/gen9: Drop invalid WARN() during data rate calculation
It's possible to have a non-zero plane mask and still wind up with a
total data rate of zero. There are two cases where this can happen:
* planes are active (from the KMS point of view), but are
all fully clipped (positioned offscreen)
* the only active plane on a CRTC is the cursor (which is handled
independently and not counted into the general data rate computations
These are both valid display setups (although unusual), so we need to
drop the WARN().
Matt Roper [Fri, 17 Jun 2016 20:42:18 +0000 (13:42 -0700)]
drm/i915/gen9: Initialize intel_state->active_crtcs during WM sanitization (v2)
intel_state->active_crtcs is usually only initialized when doing a
modeset. During our first atomic commit after boot, we're effectively
faking a modeset to sanitize the DDB/wm setup, so ensure that this field
gets initialized before use.
v2:
- Don't clobber active_crtcs if our first commit really is a modeset
(Maarten)
- Grab connection_mutex when faking a modeset during sanitization
(Maarten)
Chris Wilson [Wed, 17 Aug 2016 10:33:33 +0000 (11:33 +0100)]
drm/i915: Remember to set vma->pages for the preallocated stolen object
In commit 247177ddd517 ("drm/i915: Always set the vma->pages"), as it
title implies, we always set vma->pages for bound objects. Even before
that, we would set vma->ggtt_view.pages, for globally bound objects.
This was forgotten for the fixup inside the preallocated stolen objects,
which has to recreate a global GTT binding outside of the usual VMA
insertion path
Chris Wilson [Wed, 17 Aug 2016 11:09:06 +0000 (12:09 +0100)]
drm/i915: Mark i915_hpd_poll_init_work as static
Local function with forgotten static declaration.
Fixes: 19625e85c6ec ("drm/i915: Enable polling when we don't have hpd") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Lyude <cpaul@redhat.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471432146-5196-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tvrtko Ursulin [Tue, 16 Aug 2016 16:04:21 +0000 (17:04 +0100)]
drm/i915: Initialize legacy semaphores from engine hw id indexed array
Build the legacy semaphore initialisation array using the engine
hardware ids instead of driver internal ones. This makes the
static array size dependent only on the number of gen6 semaphore
engines.
Also makes the per-engine semaphore wait and signal tables
hardware id indexed saving some more space.
v2: Refactor I915_GEN6_NUM_ENGINES to GEN6_SEMAPHORE_LAST. (Chris Wilson)
v3: More polish. (Chris Wilson)
Tvrtko Ursulin [Tue, 16 Aug 2016 16:04:20 +0000 (17:04 +0100)]
drm/i915: Add enum for hardware engine identifiers
Put the engine hardware id in the common header so they are
not only associated with the GuC since they are needed for
the legacy semaphores implementation.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Tue, 16 Aug 2016 08:50:40 +0000 (09:50 +0100)]
drm/i915: Embrace the race in busy-ioctl
Daniel Vetter proposed a new challenge to the serialisation inside the
busy-ioctl that exposed a flaw that could result in us reporting the
wrong engine as being busy. If the request is reallocated as we test
its busyness and then reassigned to this object by another thread, we
would not notice that the test itself was incorrect.
We are faced with a choice of using __i915_gem_active_get_request_rcu()
to first acquire a reference to the request preventing the race, or to
acknowledge the race and accept the limitations upon the accuracy of the
busy flags. Note that we guarantee that we never falsely report the
object as idle (providing userspace itself doesn't race), and so the
most important use of the busy-ioctl and its guarantees are fulfilled.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471337440-16777-1-git-send-email-chris@chris-wilson.co.uk
Chris Wilson [Mon, 15 Aug 2016 09:49:10 +0000 (10:49 +0100)]
drm/i915: Only record active and pending requests upon a GPU hang
There is no other state pertaining to the completed requests in the
hang, other than gleamed through the ringbuffer, so including the
expired requests in the list of outstanding requests simply adds noise.
Chris Wilson [Mon, 15 Aug 2016 09:49:09 +0000 (10:49 +0100)]
drm/i915: Print the batchbuffer offset next to BBADDR in error state
It is useful when looking at captured error states to check the recorded
BBADDR register (the address of the last batchbuffer instruction loaded)
against the expected offset of the batch buffer, and so do a quick check
that (a) the capture is true or (b) HEAD hasn't wandered off into the
badlands.
Chris Wilson [Mon, 15 Aug 2016 09:49:08 +0000 (10:49 +0100)]
drm/i915: Move debug only per-request pid tracking from request to ctx
Since contexts are not currently shared between userspace processes, we
have an exact correspondence between context creator and guilty batch
submitter. Therefore we can save some per-batch work by inspecting the
context->pid upon error instead. Note that we take the context's
creator's pid rather than the file's pid in order to better track fd
passed over sockets.
Chris Wilson [Mon, 15 Aug 2016 09:49:07 +0000 (10:49 +0100)]
drm/i915: Introduce i915_ggtt_offset()
This little helper only exists to safely discard the upper unused 32bits
of the general 64-bit VMA address - as we know that all Global GTT
currently are less than 4GiB in size and so that the upper bits must be
zero. In many places, we use a u32 for the global GTT offset and we want
to document where we are discarding the full VMA offset.
Chris Wilson [Mon, 15 Aug 2016 09:49:06 +0000 (10:49 +0100)]
drm/i915: Track pinned VMA
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.
In a few places, we repeat a call to clear a pointer to a vma whilst
unpinning and releasing a reference to its owner. Refactor those into a
common function.
Chris Wilson [Mon, 15 Aug 2016 09:49:00 +0000 (10:49 +0100)]
drm/i915: Move common seqno reset to intel_engine_cs.c
Since the intel_engine_init_seqno() is shared by all engine submission
backends, move it out of the legacy intel_ringbuffer.c and
into the new home for common routines, intel_engine_cs.c
Chris Wilson [Mon, 15 Aug 2016 09:48:59 +0000 (10:48 +0100)]
drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c
Since the scratch allocation and cleanup is shared by all engine
submission backends, move it out of the legacy intel_ringbuffer.c and
into the new home for common routines, intel_engine_cs.c
Chris Wilson [Mon, 15 Aug 2016 09:48:57 +0000 (10:48 +0100)]
drm/i915: Use VMA for ringbuffer tracking
Use the GGTT VMA as the primary cookie for handing ring objects as
the most common action upon the ring is mapping and unmapping which act
upon the VMA itself. By restructuring the code to work with the ring
VMA, we can shrink the code and remove a few cycles from context pinning.
v2: Move the flush of the object back to before the first pin. We use
the am-I-bound? query to only have to check the flush on the first
bind and so avoid stalling on active rings.
Lots of little renames and small hoops.
Chris Wilson [Mon, 15 Aug 2016 09:48:56 +0000 (10:48 +0100)]
drm/i915: Move assertion for iomap access to i915_vma_pin_iomap
Access through the GTT requires the device to be awake. Ideally
i915_vma_pin_iomap() is short-lived and the pinning demarcates the
access through the iomap. This is not entirely true, we have a mixture
of long lived pins that exceed the wakelock (such as legacy ringbuffers)
and short lived pin that do live within the wakelock (such as execlist
ringbuffers).
Chris Wilson [Mon, 15 Aug 2016 09:48:55 +0000 (10:48 +0100)]
drm/i915: Only change the context object's domain when binding
We know that the only access to the context object is via the GPU, and
the only time when it can be out of the GPU domain is when it is swapped
out and unbound. Therefore we only need to clflush the object when
binding, thus avoiding any potential stall on touching the domain on an
active context.
Chris Wilson [Mon, 15 Aug 2016 09:48:54 +0000 (10:48 +0100)]
drm/i915: Use VMA as the primary object for context state
When working with contexts, we most frequently want the GGTT VMA for the
context state, first and foremost. Since the object is available via the
VMA, we need only then store the VMA.
v2: Formatting tweaks to debugfs output, restored some comments removed
in the next patch
Chris Wilson [Mon, 15 Aug 2016 09:48:50 +0000 (10:48 +0100)]
drm/i915: Add convenience wrappers for vma's object get/put
The VMA are unreferenced, they belong to the object and live until they
are closed. However, if we want to use the VMA as a cookie and use it to
keep the object alive, we want to hold onto a reference to the object
for the lifetime of the VMA cookie. To facilitate this, add a couple of
simple wrappers for managing the reference count on the object owning the
VMA.
Chris Wilson [Mon, 15 Aug 2016 09:48:49 +0000 (10:48 +0100)]
drm/i915: Add fetch_and_zero() macro
A simple little macro to clear a pointer and return the old value. This
is useful for writing
value = *ptr;
if (!value)
return;
*ptr = 0;
...
free(value);
in a slightly more concise form:
value = fetch_and_zero(ptr);
if (!value)
return;
...
free(value);
with the idea that this establishes a pattern that may be extended for
atomic use (using xchg or cmpxchg) i.e. atomic_fetch_and_zero() and
similar to llist.
Chris Wilson [Mon, 15 Aug 2016 09:48:47 +0000 (10:48 +0100)]
drm/i915: Always set the vma->pages
Previously, we would only set the vma->pages pointer for GGTT entries.
However, if we always set it, we can use it to prettify some code that
may want to access the backing store associated with the VMA (as
assigned to the VMA).
Chris Wilson [Mon, 15 Aug 2016 09:48:45 +0000 (10:48 +0100)]
drm/i915: Reduce i915_gem_objects to only show object information
No longer is knowing how much of the GTT (both mappable aperture and
beyond) relevant, and the output clutters the real information - that is
how many objects are allocated and bound (and by who) so that we can
quickly grasp if there is a leak.
v2: Relent, and rename pinned to indicate display only. Since the
display objects are semi-static and are of variable size, they are the
interesting objects to watch over time for aperture leaking. The other
pins are either static (such as the scratch page) or very short lived
(such as execbuf) and not part of the precious GGTT.
Chris Wilson [Mon, 15 Aug 2016 09:48:44 +0000 (10:48 +0100)]
drm/i915: Focus debugfs/i915_gem_pinned to show only display pins
Only those objects pinned to the display have semi-permanent pins of a
global nature (other pins are transient within their local vm). Simplify
i915_gem_pinned to only show the pertinent information about the pinned
objects within the GGTT.
v2: i915_gem_gtt_info is still shared with debugfs/i915_gem_gtt,
rename i915_gem_pinned to i915_gem_pin_display to better reflect its
contents
Chris Wilson [Mon, 15 Aug 2016 09:48:43 +0000 (10:48 +0100)]
drm/i915: Remove inactive/active list from debugfs
These two files (i915_gem_active, i915_gem_inactive) no longer give
pertinent information since active/inactive tracking is per-vm and so we
need the information per-vm. They are obsolete so remove them.
Chris Wilson [Mon, 15 Aug 2016 09:48:42 +0000 (10:48 +0100)]
drm/i915: Store the active context object on all engines upon error
With execlists, we have context objects everywhere, not just RCS. So
store them for post-mortem debugging. This also has a secondary effect
of removing one more unsafe list iteration with using preserved state
from the hanging request. And now we can cross-reference the request's
context state with that loaded by the GPU.
Chris Wilson [Mon, 15 Aug 2016 09:48:41 +0000 (10:48 +0100)]
drm/i915: Reduce amount of duplicate buffer information captured on error
When capturing the error state, we do not need to know about every
address space - just those that are related to the error. We know which
context is active at the time, therefore we know which VM are implicated
in the error. We can then restrict the VM which we report to the
relevant subset.
v2: s/i/count_active/ (and similar)
Rewrite label generation for "Buffers"
Chris Wilson [Mon, 15 Aug 2016 09:48:40 +0000 (10:48 +0100)]
drm/i915: Record the position of the start of the request
Not only does it make for good documentation and debugging aide, but it is
also vital for when we want to unwind requests - such as when throwing away
an incomplete request.
Dave Airlie [Mon, 15 Aug 2016 06:53:57 +0000 (16:53 +1000)]
Merge tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel into drm-next
- refactor ddi buffer programming a bit (Ville)
- large-scale renaming to untangle naming in the gem code (Chris)
- rework vma/active tracking for accurately reaping idle mappings of shared
objects (Chris)
- misc dp sst/mst probing corner case fixes (Ville)
- tons of cleanup&tunings all around in gem
- lockless (rcu-protected) request lookup, plus use it everywhere for
non(b)locking waits (Chris)
- pipe crc debugfs fixes (Rodrigo)
- random fixes all over
* tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel: (222 commits)
drm/i915: Update DRIVER_DATE to 20160808
drm/i915: fix aliasing_ppgtt leak
drm/i915: Update comment before i915_spin_request
drm/i915: Use drm official vblank_no_hw_counter callback.
drm/i915: Fix copy_to_user usage for pipe_crc
Revert "drm/i915: Track active streams also for DP SST"
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Assert that the request hasn't been retired
drm/i915: Repack fence tiling mode and stride into a single integer
drm/i915: Document and reject invalid tiling modes
drm/i915: Remove locking for get_tiling
drm/i915: Remove pinned check from madvise ioctl
drm/i915: Reduce locking inside swfinish ioctl
drm/i915: Remove (struct_mutex) locking for busy-ioctl
drm/i915: Remove (struct_mutex) locking for wait-ioctl
drm/i915: Do a nonblocking wait first in pread/pwrite
drm/i915: Remove unused no-shrinker-steal
drm/i915: Tidy generation of the GTT mmap offset
drm/i915/shrinker: Wait before acquiring struct_mutex under oom
drm/i915: Simplify do_idling() (Ironlake vt-d w/a)
...
Dave Airlie [Mon, 15 Aug 2016 06:46:36 +0000 (16:46 +1000)]
Merge tag 'topic/drm-misc-2016-08-12' of git://anongit.freedesktop.org/drm-intel into drm-next
- more fence destaging and cleanup (Gustavo&Sumit)
- DRIVER_LEGACY to untangle from DRIVER_MODESET
- drm_mm refactor (Chris)
- fbdev-less compile fies
- clipped plane src/dst rects (Ville)
- + a few mediatek patches that build on top of that (Bibby+Daniel)
- small stuff all over really
* tag 'topic/drm-misc-2016-08-12' of git://anongit.freedesktop.org/drm-intel: (43 commits)
dma-buf/fence: kerneldoc: remove spurious section header
dma-buf/fence: kerneldoc: remove unused struct members
Revert "gpu: drm: omapdrm: dss-of: add missing of_node_put after calling of_parse_phandle"
drm: Protect fb_defio in drivers with CONFIG_KMS_FBDEV_EMULATION
drm/radeon|amgpu: Make fbdev emulation optional
drm/vmwgfx: select CONFIG_FB
drm: Remove superflous linux/fb.h includes
drm/fb-helper: Add a dummy remove_conflicting_framebuffers
dma-buf/sync_file: only enable fence signalling on poll()
Documentation: add doc for sync_file_get_fence()
dma-buf/sync_file: add sync_file_get_fence()
dma-buf/sync_file: refactor fence storage in struct sync_file
dma-buf/fence-array: add fence_is_array()
drm/dp_helper: Rate limit timeout errors from drm_dp_i2c_do_msg()
drm/dp_helper: Print first error received on failure in drm_dp_dpcd_access()
drm: Add ratelimited versions of the DRM_DEBUG* macros
drm: Make sure drm_vblank_no_hw_counter isn't abused
drm/mediatek: Fix mtk_atomic_complete for runtime_pm
drm/mediatek: plane: Use FB's format's cpp to compute x offset
drm/mediatek: plane: Merge mtk_plane_enable into mtk_plane_atomic_update
...
Dave Airlie [Mon, 15 Aug 2016 06:41:45 +0000 (16:41 +1000)]
Merge tag 'imx-drm-next-2016-08-12' of git://git.pengutronix.de/git/pza/linux into drm-next
imx-drm updates and encoder atomic_mode_set helper callback
- add pixel clock and DE polarity configuration from device tree
using display timing bindings for parallel and LVDS output
- cleanup/remove trivial functions
- cleanup and fixes in preparation for capture support
- add atomic_mode_set helper and use it in imx-ldb - this is an
alternative to the encoder mode_set callback that passes the
crtc and connector state instead of just the mode. It allows
drivers to get information from the attached connector without
having to iterate over all connectors
- add drm_bridge support to imx-ldb, for bridges attached via LVDS
* tag 'imx-drm-next-2016-08-12' of git://git.pengutronix.de/git/pza/linux:
drm/imx-ldb: Add support to drm-bridge
drm/imx: imx-ldb: use encoder atomic_mode_set callback
drm/atomic-helper: Add atomic_mode_set helper callback
drm/imx: Remove imx_drm_handle_vblank()
gpu: ipu-v3: Add missing IDMAC channel names
gpu: ipu-v3: rename CSI client device
gpu: ipu-v3: Fix IRT usage
gpu: ipu-v3: Fix CSI data format for 16-bit media bus formats
gpu: ipu-v3: set correct full sensor frame for PAL/NTSC
gpu: ipu-v3: Add VDI input IDMAC channels
gpu: ipu-v3: Add ipu_get_num()
gpu: ipu-cpmem: Add ipu_cpmem_get_burstsize()
gpu: ipu-cpmem: Add ipu_cpmem_set_uv_offset()
drm/imx: Remove imx_drm_crtc_id()
drm/imx: Remove imx_drm_crtc_vblank_get/_put()
drm/imx: convey the pixelclk-active and de-active flags from DT to the ipu-di driver
drm: add a helper function to extract 'de-active' and 'pixelclk-active' from DT
Dave Airlie [Mon, 15 Aug 2016 06:41:17 +0000 (16:41 +1000)]
Merge tag 'mediatek-drm-next-2016-08-12' of git://git.pengutronix.de/git/pza/linux into drm-next
mediatek-drm maintainers and gamma correction
- add MAINTAINERS entry for mediatek-drm driver
- add support for AAL and GAMMA engines
- hook up gamma correction LUT
- add support for temporal dithering to OD and GAMMA engines
* tag 'mediatek-drm-next-2016-08-12' of git://git.pengutronix.de/git/pza/linux:
drm/mediatek: set mt8173 dithering function
drm/mediatek: Add gamma correction.
drm/mediatek: Add GAMMA engine basic function
drm/mediatek: Add AAL engine basic function
drm: mediatek: add Maintainers entry for Mediatek DRM drivers
Dave Airlie [Mon, 15 Aug 2016 06:39:28 +0000 (16:39 +1000)]
Merge branch 'drm-next-tilcdc-atomic' of https://github.com/jsarha/linux into drm-next
Please pull tilcdc atomic modeset support and some non critical fixes.
* 'drm-next-tilcdc-atomic' of https://github.com/jsarha/linux: (29 commits)
drm/tilcdc: Change tilcdc_crtc_page_flip() to tilcdc_crtc_update_fb()
drm/tilcdc: Remove unnecessary pm_runtime_get() and *_put() calls
drm/tilcdc: Get rid of legacy dpms mechanism
drm/tilcdc: Use drm_atomic_helper_resume/suspend()
drm/tilcdc: Enable and disable interrupts in crtc start() and stop()
drm/tilcdc: tfp410: Add atomic modeset helpers to connector funcs
drm/tilcdc: tfp410: Set crtc panel info at init phase
drm/tilcdc: panel: Add atomic modeset helpers to connector funcs
drm/tilcdc: panel: Set crtc panel info at init phase
drm/tilcdc: Remove tilcdc_verify_fb()
drm/tilcdc: Remove obsolete crtc helper functions
drm/tilcdc: Set DRIVER_ATOMIC and use atomic crtc helpers
drm/tilcdc: Add drm_mode_config_reset() call to tilcdc_load()
drm/tilcdc: Add atomic mode config funcs
drm/tilcdc: Add tilcdc_crtc_atomic_check()
drm/tilcdc: Add tilcdc_crtc_mode_set_nofb()
drm/tilcdc: Initialize dummy primary plane from crtc init
drm/tilcdc: Add dummy primary plane implementation
drm/tilcdc: Make tilcdc_crtc_page_flip() work if crtc is not yet on
drm/tilcdc: Make tilcdc_crtc_page_flip() public
...
Linus Torvalds [Mon, 15 Aug 2016 02:01:31 +0000 (19:01 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal updates from Zhang Rui:
- Fix a race condition when updating cooling device, which may lead to
a situation where a thermal governor never updates the cooling
device. From Michele Di Giorgio.
- Fix a zero division error when disabling the forced idle injection
from the intel powerclamp. From Petr Mladek.
- Add suspend/resume callback for intel_pch_thermal thermal driver.
From Srinivas Pandruvada.
- Another two fixes for clocking cooling driver and hwmon sysfs I/F.
From Michele Di Giorgio and Kuninori Morimoto.
[ Hmm. That suspend/resume callback for intel_pch_thermal doesn't look
like a fix, but I'm letting it slide.. - Linus ]
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: clock_cooling: Fix missing mutex_init()
thermal: hwmon: EXPORT_SYMBOL_GPL for thermal hwmon sysfs
thermal: fix race condition when updating cooling device
thermal/powerclamp: Prevent division by zero when counting interval
thermal: intel_pch_thermal: Add suspend/resume callback