Imre Deak [Wed, 10 Aug 2016 11:07:29 +0000 (14:07 +0300)]
drm/i915: Merge the PPS register definitions
The PPS registers are pretty much the same everywhere, the differences
being:
- Register fields appearing, disappearing from one platform to the
next: panel-reset-on-powerdown, backlight-on, panel-port,
register-unlock
- Different register base addresses
- Different number of PPS instances: 2 on VLV/CHV/BXT, 1 everywhere
else.
We can merge the separate set of PPS definitions by extending the PPS
instance argument to all platforms and using instance 0 on platforms
with a single instance. This means we'll need to calculate the register
addresses dynamically based on the given platform and PPS instance.
v2:
- Simplify if ladder in intel_pps_get_registers(). (Ville)
Dave Gordon [Tue, 9 Aug 2016 14:19:22 +0000 (15:19 +0100)]
drm/i915/guc: use for_each_engine_id() where appropriate
Now that host structures are indexed by host engine-id rather than
guc_id, we can usefully convert some for_each_engine() loops to use
for_each_engine_id() and avoid multiple dereferences of engine->id.
Also a few related tweaks to cache structure members locally wherever
they're used more than once or twice, hopefully eliminating memory
references.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Dave Gordon [Tue, 9 Aug 2016 14:19:21 +0000 (15:19 +0100)]
drm/i915/guc: add engine mask to GuC client & pass to GuC
The Context Descriptor passed by the kernel to the GuC contains a field
specifying which engine(s) the context will use. Historically, this was
always set to "all of them", but if we had a separate client for each
engine, we could be more precise, and set only the bit for the engine
that the client was associated with. So this patch enables this usage,
in preparation for having multiple clients, though at this point there
is still only a single client used for all supported engines.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Dave Gordon [Tue, 9 Aug 2016 14:19:20 +0000 (15:19 +0100)]
drm/i915/guc: refactor guc_init_doorbell_hw()
We have essentially the same code in each of two different
loops, so we can refactor it into a little helper function.
This also reduces the amount of work done during startup,
as we now only reprogram h/w found to be in a state other
than that expected, and so avoid the overhead of setting
doorbell registers to the state they're already in.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Dave Gordon [Tue, 9 Aug 2016 14:19:19 +0000 (15:19 +0100)]
drm/i915/guc: doorbell reset should avoid used doorbells
guc_init_doorbell_hw() borrows the (currently single) GuC client to use
in reinitialising ALL the doorbell registers (as the hardware doesn't
reset them when the GuC is reset). As a prerequisite for accommodating
multiple clients, it should only reset doorbells that are supposed to be
disabled, avoiding those that are marked as in use by any client.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Chris Wilson [Tue, 9 Aug 2016 16:47:52 +0000 (17:47 +0100)]
drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh
The bottom-half we use for processing the breadcrumb interrupt is a
task, which is an RCU protected struct. When accessing this struct, we
need to be holding the RCU read lock to prevent it disappearing beneath
us. We can use the RCU annotation to mark our irq_seqno_bh pointer as
being under RCU guard and then use the RCU accessors to both provide
correct ordering of access through the pointer.
Most notably, this fixes the access from hard irq context to use the RCU
read lock, which both Daniel and Tvrtko complained about.
Chris Wilson [Tue, 9 Aug 2016 16:47:51 +0000 (17:47 +0100)]
drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.
v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.
v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104 Fixes: 2529d57050af (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk
Chris Wilson [Tue, 9 Aug 2016 17:08:25 +0000 (18:08 +0100)]
drm/i915: Always mark the writer as also a read for busy ioctl
One of the few guarantees we want the busy ioctl to provide is that the
reported busy writer is included in the set of busy read engines. This
should be provided by the ordering of setting and retiring the active
trackers, but we can do better by explicitly setting the busy read
engine flag for the last writer.
v2: More comments inside __busy_write_id() to explain why both fields
are set.
Ville Syrjälä [Fri, 5 Aug 2016 20:28:30 +0000 (23:28 +0300)]
drm/i915: Use the g4x+ approach on gen2 for handling display stuff around GPU reset
We don't have GPU reset support for gen2, which means the display
hardware is unaffected when a GPU hang is handled. However as the ring
has in fact stopped, any flips still in the ring will never complete,
and thus the display base address updates will never happen. So we
really need to fix that up manually just like we do on g4x+.
In fact, let's just use intel_has_gpu_reset() instead of IS_GEN2()
since that'll also handle cases where someone would disable the GPU
reset support on gen3/4 for whatever reason.
drm/i915: Add a way to test the modeset done during gpu reset, v3.
Add force_reset_modeset_test as a parameter to force the modeset path during gpu reset.
This allows a IGT test to set the knob and trigger a hang to force the gpu reset,
even on platforms that wouldn't otherwise require it.
Changes since v1:
- Split out fix to separate commit.
Changes since v2:
- This commit is purely about force_reset_modeset_test now.
drm/i915: Fix modeset handling during gpu reset, v5.
This function would call drm_modeset_lock_all, while the suspend/resume
functions already have their own locking. Fix this by factoring out
__intel_display_resume, and calling the atomic helpers for duplicating
atomic state and disabling all crtc's during suspend.
Changes since v1:
- Deal with -EDEADLK right after lock_all and clean up calls
to hw readout.
- Always take all modeset locks so updates during gpu reset are blocked.
Changes since v2:
- Fix deadlock in intel_update_primary_planes.
- Move WARN_ON(EDEADLK) to __intel_display_resume.
- pctx -> ctx
- only call __intel_display_resume on success in intel_display_resume.
Changes since v3:
- Rebase on top of dev_priv -> dev change.
- Use drm_modeset_lock_all_ctx instead of drm_modeset_lock_all.
Changes since v4 [by vsyrjala]:
- Deal with skip_intermediate_wm
- Update comment w.r.t. mode_config.mutex vs. ->detect()
- Rebase due to INTEL_GEN() etc.
Ville Syrjälä [Fri, 5 Aug 2016 17:00:17 +0000 (20:00 +0300)]
drm/i915: Don't mark PCH underrun reporting as disabled for transcoder B/C on LPT-H
Marking PCH transcoder FIFO underrun reporting as disabled for
transcoder B/C on LPT-H will block us from enabling the south error
interrupt. So let's only mark transcoder A underrun reporting as
disabled initially.
This is a little tricky to hit since you need a machine with LPT-H, and
the BIOS must enable either pipe B or C at boot. Then i915 would mark
the "transcoder B/C" underrun reporting as disabled and never enable it
again, meaning south interrupts would never get enabled either. The only
other interrupt in there is actually the poison interrupt which, if we
could ever trigger it, would just result in a little error in dmesg.
Here's the resulting change in SDEIMR on my HSW when I boot it with
multiple displays attached:
- (0x000c4004): 0xf115ffff
+ (0x000c4004): 0xf114ffff
My previous attempt [1] tried to fix this a little differently, but
Daniel requested I do this instead.
Ville Syrjälä [Fri, 5 Aug 2016 17:41:34 +0000 (20:41 +0300)]
drm/i915: Add some curly braces
intel_enable_pipe() looks rather confusing when one side doesn't have
the curly braces, and the other one does. And what's even worse,
there's another if-else inside the braceless side. Let's put braces
around it to make it clear which branch goes where.
Chris Wilson [Tue, 9 Aug 2016 08:23:34 +0000 (09:23 +0100)]
drm/i915: Do not overwrite the request with zero on reallocation
When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
Enable lockless lookup of request tracking via RCU"), we acknowledge that
we may race with another thread that could have reallocated the request.
In order for the first thread not to blow up, the second thread must not
clear the request completed before overwriting it. In the RCU lookup, we
allow for the engine/seqno to be replaced but we do not allow for it to
be zeroed.
The choice we make is to either add extra checking to the RCU lookup, or
embrace the inherent races (as intended). It is more complicated as we
need to manually clear everything we depend upon being zero initialised,
but we benefit from not emiting the memset() to clear the entire
frequently allocated structure (that memset turns up in throughput
profiles). And at the same time, the lookup remains flexible for future
adjustments.
v2: Old style LRC requires another variable to be initialize. (The
danger inherent in not zeroing everything.)
v3: request->batch also needs to be cleared
v4: signaling.tsk is no long used unset, but pid still exists
Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Goel, Akash" <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-2-git-send-email-chris@chris-wilson.co.uk
Chris Wilson [Tue, 9 Aug 2016 08:23:33 +0000 (09:23 +0100)]
drm/i915: Add smp_rmb() to busy ioctl's RCU dance
In the debate as to whether the second read of active->request is
ordered after the dependent reads of the first read of active->request,
just give in and throw a smp_rmb() in there so that ordering of loads is
assured.
Chris Wilson [Tue, 9 Aug 2016 07:37:02 +0000 (08:37 +0100)]
drm/i915: Don't check for idleness before retiring after a GPU hang
When we force the cleanup after a GPU hang, we want to retire all
requests, or else we may leak them if truly wedged (and the GPU never
advances again). Converting to the active request helpers had the issue
of doing the check against busyness before reporting the request, so if
we claim the GPU had hung but this engine hadn't we could potential skip
the request cleanup - triggering the self-check BUG.
This was originally introduced to be used by the busy-ioctl, but in the
end busy ioctl performed a different dance. Since there are no users,
and no likely users, remove an unwanted chunk of the API.
Rodrigo Vivi [Wed, 3 Aug 2016 15:22:57 +0000 (08:22 -0700)]
drm/i915: Fix copy_to_user usage for pipe_crc
Copy to user return the number of bytes it couldn't write
and zero on success. So any number different than 0 should
be considered a fault, not only when it doesn't write
the full size.
v2: fixed the inverted logic. (Ville)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
active_streams will get totally out of whack with SST unless we
sync up with the hw state at readout, obviously! We don't yet
do that, so now the WARNs fire all the time. Let's revert :(
Matthew Auld [Tue, 2 Aug 2016 08:36:53 +0000 (09:36 +0100)]
drm/i915: fix WaInsertDummyPushConstPs
As pointed out by Chris Harris, we are using the wrong WA name, it
should in fact be WaToEnableHwFixForPushConstHWBug, also it should be
applied from C0 onwards for both BXT and KBL.
Fixes: 7b9005cd45f3 ("drm/i915: Add WaInsertDummyPushConstP for bxt and kbl") Cc: Chris Harris <chris.harris@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reported-by: Chris Harris <chris.harris@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470127013-29653-1-git-send-email-matthew.auld@intel.com
Chris Wilson [Fri, 5 Aug 2016 09:14:24 +0000 (10:14 +0100)]
drm/i915: Assert that the request hasn't been retired
With all callers now not playing tricks with dropping the struct_mutex
between waiting and retiring, we can assert that the request is ready to
be retired.
Chris Wilson [Fri, 5 Aug 2016 09:14:23 +0000 (10:14 +0100)]
drm/i915: Repack fence tiling mode and stride into a single integer
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.
Chris Wilson [Fri, 5 Aug 2016 09:14:22 +0000 (10:14 +0100)]
drm/i915: Document and reject invalid tiling modes
Through the GTT interface to the fence registers, we can only handle
linear, X and Y tiling. The more esoteric tiling patterns are ignored.
Document that the tiling ABI only supports upto Y tiling, and reject any
attempts to set a tiling mode other than NONE, X or Y.
Chris Wilson [Fri, 5 Aug 2016 09:14:21 +0000 (10:14 +0100)]
drm/i915: Remove locking for get_tiling
Since we are not concerned with userspace racing itself with set-tiling
(the order is indeterminant even if we take a lock), then we can safely
read back the single obj->tiling_mode and do the static lookup of
swizzle mode without having to take a lock.
get-tiling is reasonably frequent due to the back-channel passing around
of tiling parameters in DRI2/DRI3.
v2: Make tiling_mode a full unsigned int so that we can trivially use it
with READ_ONCE(). Separating it out into manual control over the flags
field was too noisy for a simple patch. Note that we could use the lower
bits of obj->stride for the tiling mode.
Chris Wilson [Fri, 5 Aug 2016 09:14:20 +0000 (10:14 +0100)]
drm/i915: Remove pinned check from madvise ioctl
We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.
Still despite it being an overzealous check, that error code is part of
the current ABI and so we must proceed with caution.
Chris Wilson [Fri, 5 Aug 2016 09:14:19 +0000 (10:14 +0100)]
drm/i915: Reduce locking inside swfinish ioctl
We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)
v2: Use access once for compiler hints (or not as it is a bitfield)
v3: READ_ONCE, obj->pin_display is not a bitfield anymore
v4: Don't be creative with goto.
Chris Wilson [Fri, 5 Aug 2016 09:14:18 +0000 (10:14 +0100)]
drm/i915: Remove (struct_mutex) locking for busy-ioctl
By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.
Chris Wilson [Fri, 5 Aug 2016 09:14:17 +0000 (10:14 +0100)]
drm/i915: Remove (struct_mutex) locking for wait-ioctl
With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME), locklessly - this is handled by
i915_gem_active_wait_unlocked().
The impact of this is actually quite small - the return to userspace
following the wait was already lockless and so we don't see much gain in
latency improvement upon completing the wait. What we do achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause - but it is still one less contention
point!
Chris Wilson [Fri, 5 Aug 2016 09:14:16 +0000 (10:14 +0100)]
drm/i915: Do a nonblocking wait first in pread/pwrite
If we try and read or write to an active request, we first must wait
upon the GPU completing that request. Let's do that without holding the
mutex (and so allow someone else to access the GPU whilst we wait). Upon
completion, we will acquire the mutex and only then start the operation
(i.e. we do not rely on state from before the initial wait).
v2: Repaint the goto labels
v3: Move the tracepoints back to the start of the ioctls
Chris Wilson [Fri, 5 Aug 2016 09:14:14 +0000 (10:14 +0100)]
drm/i915: Tidy generation of the GTT mmap offset
If we make the observation that mmap-offsets are only released when we
free an object, we can then deduce that the shrinker only creates free
space in the mmap arena indirectly by flushing the request list and
freeing expired objects. If we combine this with the lockless
vma-manager and lockless idling, we can avoid taking our big struct_mutex
until we need to actually free the requests.
One side-effect is that we defer the madvise checking until we need the
pages (i.e. the fault handler). This brings us into line with the other
delayed checks (and madvise in general).
v2: s/ret/err/ and use if (!err) rather than if (ret == 0)
Chris Wilson [Fri, 5 Aug 2016 09:14:13 +0000 (10:14 +0100)]
drm/i915/shrinker: Wait before acquiring struct_mutex under oom
We can now wait for the GPU (all engines) to become idle without
requiring the struct_mutex. Inside the shrinker, we need to currently
take the struct_mutex in order to purge objects and to purge the objects
we need the GPU to be idle - causing a stall whilst we hold the
struct_mutex. We can hide most of that stall by performing the wait
before taking the struct_mutex and only doing essential waits for
new rendering on objects to be freed.
Now that we pass along the expected interruptible nature for the
wait-for-idle, we do not need to modify the global
i915->mm.interruptible for this single call. (Only the immediate call to
i915_gem_wait_for_idle() takes the interruptible status as the other
action, dma_map_sg(), is independent of i915.ko)
Chris Wilson [Fri, 5 Aug 2016 09:14:11 +0000 (10:14 +0100)]
drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.
As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).
v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value
Chris Wilson [Fri, 5 Aug 2016 09:14:10 +0000 (10:14 +0100)]
drm/i915: Remove forced stop ring on suspend/unload
Before suspending (or unloading), we would first wait upon all rendering
to be completed and then disable the rings. This later step is a remanent
from DRI1 days when we did not use request tracking for all operations
upon the ring. Now that we are sure we are waiting upon the very last
operation by the engine, we can forgo clobbering the ring registers,
though we do keep the assert that the engine is indeed idle before
sleeping.
Chris Wilson [Fri, 5 Aug 2016 09:14:09 +0000 (10:14 +0100)]
drm/i915/userptr: Remove superfluous interruptible=false on waiting
Inside the kthread context, we can't be interrupted by signals so
touching the mm.interruptible flag is pointless and wait-request now
consumes EIO itself.
Chris Wilson [Fri, 5 Aug 2016 09:14:08 +0000 (10:14 +0100)]
drm/i915: Convert non-blocking userptr waits for requests over to using RCU
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.
Chris Wilson [Fri, 5 Aug 2016 09:14:07 +0000 (10:14 +0100)]
drm/i915: Convert non-blocking waits for requests over to using RCU
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.
v2: Move i915_gem_fault tracepoint back to the start of the function,
before the unlocked wait.
It is useful to be able to wait on pending rendering without grabbing
the struct_mutex. We can do this by using the i915_gem_active_get_rcu()
primitive to acquire a reference to the pending request without
requiring struct_mutex, just the RCU read lock, and then call
i915_wait_request().
v2: Rebase onto new i915_gem_active_get_unlocked() semantics that take
the RCU read lock on behalf of the caller.
Daniel Vetter [Fri, 5 Aug 2016 08:36:15 +0000 (10:36 +0200)]
Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge the 4.8 pull request state from Dave - conflicts were
getting out of hand, and Chris has some patches which outright don't
apply without everything merged together again.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Matt Roper [Thu, 4 Aug 2016 21:08:00 +0000 (14:08 -0700)]
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
The bspec was updated a couple weeks ago to add an extra block per line
to plane watermark calculations for linear pixel formats.
Bspec update 115327 description:
"Gen9+ - Updated the plane blocks per line calculation for linear
cases. Adds +1 for all linear cases to handle the non-block aligned
stride cases."
Chris Wilson [Thu, 4 Aug 2016 15:32:42 +0000 (16:32 +0100)]
drm/i915: Export our request as a dma-buf fence on the reservation object
If the GEM objects being rendered with in this request have been
exported via dma-buf to a third party, hook ourselves into the dma-buf
reservation object so that the third party can serialise with our
rendering via the dma-buf fences.
Chris Wilson [Thu, 4 Aug 2016 15:32:41 +0000 (16:32 +0100)]
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Chris Wilson [Thu, 4 Aug 2016 15:32:39 +0000 (16:32 +0100)]
drm/i915: Move obj->active:5 to obj->flags
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).
v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).
Chris Wilson [Thu, 4 Aug 2016 15:32:38 +0000 (16:32 +0100)]
drm/i915: Use dev_priv consistently through the intel_frontbuffer interface
Rather than a mismash of struct drm_device *dev and struct
drm_i915_private *dev_priv being used freely within a function, be
consistent and only pass along dev_priv.
Chris Wilson [Thu, 4 Aug 2016 15:32:37 +0000 (16:32 +0100)]
drm/i915: Use atomics to manipulate obj->frontbuffer_bits
The individual bits inside obj->frontbuffer_bits are protected by each
plane->mutex, but the whole bitfield may be accessed by multiple KMS
operations simultaneously and so the RMW need to be under atomics.
However, for updating the single field we do not need to mandate that it
be under the struct_mutex, one more step towards its removal as the de
facto BKL.
Chris Wilson [Thu, 4 Aug 2016 15:32:36 +0000 (16:32 +0100)]
drm/i915: Make fb_tracking.lock a spinlock
We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.
v2: Move the cheap unlikely tests into the caller
v3: Move the kerneldoc into the header (now separated out into
intel_fronbuffer.h for better kerneldoc and readability)
Chris Wilson [Thu, 4 Aug 2016 15:32:35 +0000 (16:32 +0100)]
drm/i915: Separate intel_frontbuffer into its own header
In view of adding inline functions into the intel_frontbuffer section,
we first split the header into its own file so that we can integrate it
more easily with kerneldoc.
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()
Chris Wilson [Thu, 4 Aug 2016 15:32:33 +0000 (16:32 +0100)]
drm/i915: Make i915_vma_pin() small and inline
Not only is i915_vma_pin() called for every single object on every single
execbuf, it is usually a simple increment as the VMA is already bound for
execution by the GPU. Rearrange the tests for unbound and pin_count
overflow so that we can do the increment and test very cheaply and
compact enough to inline the operation into execbuf. The trick used is
to note that we can check for an overflow bit (keeping space available
for it inside the flags) at the same time as checking the binding bits.
Chris Wilson [Thu, 4 Aug 2016 15:32:32 +0000 (16:32 +0100)]
drm/i915: Combine all i915_vma bitfields into a single set of flags
In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.
Chris Wilson [Thu, 4 Aug 2016 15:32:31 +0000 (16:32 +0100)]
drm/i915: Start passing around i915_vma from execbuffer
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Chris Wilson [Thu, 4 Aug 2016 15:32:29 +0000 (16:32 +0100)]
drm/i915: Record allocated vma size
Tracking the size of the VMA as allocated allows us to dramatically
reduce the complexity of later functions (like inserting the VMA in to
the drm_mm range manager).
Chris Wilson [Thu, 4 Aug 2016 15:32:28 +0000 (16:32 +0100)]
drm/i915: Update i915_gem_get_ggtt_size/_alignment to use drm_i915_private
For consistency, internal functions should take drm_i915_private rather
than drm_device. Now that we are subclassing drm_device, there are no
more size wins, but being consistent is its own blessing.
Chris Wilson [Thu, 4 Aug 2016 15:32:27 +0000 (16:32 +0100)]
drm/i915: Update the GGTT size/alignment query functions
In order to be consistent with other address space functions, we want to
pass around 64-bit sizes, even though all known global GTT are limited
to 4GiB. Similarly, we are trying to be consistent in using the _ggtt_
nomenclature when referring to the special global GTT.
v2: Update docs to consistently state "global GTT".
Chris Wilson [Thu, 4 Aug 2016 15:32:26 +0000 (16:32 +0100)]
drm/i915: Convert 4096 alignment request to 0 for drm_mm allocations
As we always allocate in chunks of 4096 (that being both the PAGE_SIZE
and our own GTT_PAGE_SIZE), we know that all results from the drm_mm are
aligned to at least 4096. The drm_mm allocator itself is optimised for
alignment == 0, and so by converting alignments of 4096 to 0 we can
satisfy our own requirements and still hit the faster path.
Chris Wilson [Thu, 4 Aug 2016 15:32:24 +0000 (16:32 +0100)]
drm/i915: Reduce WARN(i915_gem_valid_gtt_space) to a debug-only check
i915_gem_valid_gtt_space() is used after inserting the VMA to double
check the list - the location should have been chosen to pass all the
restrictions.
Chris Wilson [Thu, 4 Aug 2016 15:32:23 +0000 (16:32 +0100)]
drm/i915: Pad GTT views of exec objects up to user specified size
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.
v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility
Chris Wilson [Thu, 4 Aug 2016 15:32:22 +0000 (16:32 +0100)]
drm/i915: Fix up vma alignment to be u64
This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the vma interface in the next patches.
v2: long long casts not required for u64 printk (%llx)
Move the single line to the callsite as the name is now misleading, and
the purpose is solely to add the request to the execution queue. Here,
we can see that if we failed to dispatch the batch from the request, we
can forgo flushing the GPU when closing the request.
Chris Wilson [Thu, 4 Aug 2016 15:32:20 +0000 (16:32 +0100)]
drm/i915: Remove request retirement before each batch
This reimplements the denial-of-service protection against igt from
commit 227f782e4667 ("drm/i915: Retire requests before creating a new
one") and transfers the stall from before each batch into get_pages().
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we have made the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking.
v2: Recycle objects prior to i915_gem_object_get_pages()
v3: Remove the reference to the ring from i915_gem_requests_ring() as it
operates on an intel_engine_cs.
v4: Since commit 9b5f4e5ed6fd ("drm/i915: Retire oldest completed request
before allocating next") we no longer need the safeguard to retire
requests before get_pages(). We no longer see the huge latencies when
hitting the shrinker between allocations.
Chris Wilson [Thu, 4 Aug 2016 15:32:19 +0000 (16:32 +0100)]
drm/i915: Double check the active status on the batch pool
We should not rely on obj->active being uptodate unless we manually
flush it. Instead, we can verify that the next available batch object is
idle by looking at its last active request (and checking it for
completion).
v2: remove the struct drm_device forward declaration added in the
process of removing its necessity
Chris Wilson [Thu, 4 Aug 2016 15:32:17 +0000 (16:32 +0100)]
drm/i915: Combine loops within i915_gem_evict_something
Slight micro-optimise to produce combine loops so that gcc is able to
optimise the inner-loops concisely. Since we are reviewing the loops, we
can update the comments to describe the current state of affairs, in
particular the distinction between evicting from the global GTT (which
may contain untracked items and transient global pins) and the
per-process GTT.
Chris Wilson [Wed, 3 Aug 2016 16:09:00 +0000 (17:09 +0100)]
drm/i915: Acquire audio powerwell for HD-Audio registers
On Haswell/Broadwell, the HD-Audio block is inside the HDMI/display
power well and so the sna-hda audio codec acquires the display power
well while it is operational. However, Skylake separates the powerwells
again, but yet we still need the audio powerwell to setup the registers.
(But then the hardware uses those registers even while powered off???)
Acquiring the powerwell around setting the chicken bits when setting up
the audio channel does at least silence the WARNs from touching our
registers whilst unpowered. We silence our own test cases, but maybe
there is a latent bug in using the audio channel?
v2: Grab both rpm wakelock and audio wakelock
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96214 Fixes: 03b135cebc47 "ALSA: hda - remove dependency on i915 power well for SKL") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Libin Yang <libin.yang@intel.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Marius Vlad <marius.c.vlad@intel.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Cc: stable@vger.kernel.org Link: http://patchwork.freedesktop.org/patch/msgid/1470240540-29004-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Thu, 28 Jul 2016 14:50:47 +0000 (17:50 +0300)]
drm/i915: Don't try to ack sink irqs when there are none
My ASUS PB278 at least doesn't seem to appreciate when you try to
ack sink irqs when there are none. Results in this sort of dmesg spam
[drm:drm_dp_dpcd_access] too many retries, giving up
Let's skip the ack if there are no pending irqs. I have no clue why we
do this in two places. One of them likely should just go away. Oh, and
MST has its own sink irq handler too...
Ville Syrjälä [Fri, 29 Jul 2016 13:51:16 +0000 (16:51 +0300)]
drm/i915: Allow MST sinks to work even if drm_probe_ddc() fails
With HSW + Dell UP2414Q (at least) drm_probe_ddc() occasionally fails,
and then we'll assume that the entire display has been disconnected.
We don't need the EDID from the main link, so we can simply check if
the sink is MST capable, and if so treat is as connected.
v2: Skip drm_probe_ddc() entirely for MST (Daniel)
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Cc: Jim Bride <jim.bride@linux.intel.com> Cc: Manasi D Navare <manasi.d.navare@intel.com> Cc: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469800276-6979-1-git-send-email-ville.syrjala@linux.intel.com
Ville Syrjälä [Thu, 28 Jul 2016 14:50:41 +0000 (17:50 +0300)]
drm/i915: Track active streams also for DP SST
s/active_mst_links/active_streams/ and use it also for SST. We can then
use this information in the hpd handling to see if the link is active
or not, and thus whether we may need to retrain.
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Cc: Jim Bride <jim.bride@linux.intel.com> Cc: Manasi D Navare <manasi.d.navare@intel.com> Cc: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469717448-4297-6-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Thu, 28 Jul 2016 14:50:40 +0000 (17:50 +0300)]
drm/i915: Reject mixing MST and SST/HDMI on the same digital port
We can't mix MST with SST/HDMI on the same physical port, so we'll need
to reject such configurations in check_digital_port_conflicts(). Nothing
else will prevent this as MST has its fake encoders and its own connectors
so the cloning checks won't catch this.
The same digital port can be used multiple times, but only if all the
encoders involved are MST encoders, so we only want to check MST vs.
SST/HDMI, not MST vs. MST. And SST/HDMI vs. SST/HDMI we already check.
Ville Syrjälä [Thu, 28 Jul 2016 14:50:39 +0000 (17:50 +0300)]
drm/i915: Avoid mixing up SST and MST in DDI setup
The MST vs. SST selection should depend purely on the choice of the
connector/encoder. So don't try to determine the correct DDI mode
based on the intel_dp->is_mst, which simply tells us whether the sink
is in MST mode or not. Instead derive the information from the encoder
type. Since the link training code deals in non-fake encoders, we'll
also need to keep a second copy of that information around, which we'll
now designate as 'link_mst'.
Ville Syrjälä [Fri, 29 Jul 2016 13:52:39 +0000 (16:52 +0300)]
drm/i915: Read PSR caps/intermediate freqs/etc. only once on eDP
Currently we re-read a bunch of static eDP panel caps from the DPCD
over and over again. Let's do it only once to save some time and effort.
v2: Make thing less confusing with intel_edp_init_dpcd() (Chris)
Move no_aux_handshake setup in there as well
v3: Move tps3/rate printout to intel_dp_long_pulse() so that
we'll still get them on eDP as well
Chris Wilson [Thu, 4 Aug 2016 08:09:53 +0000 (09:09 +0100)]
drm/i915: Add missing rpm wakelock to GGTT pread
Joonas spotted a discrepancy between the pwrite and pread ioctls, in
that pwrite takes the rpm wakelock around its GGTT access, The wakelock
is required in order for the GTT to function. In disregard for the
current convention, we take the rpm wakelock around the access itself
rather than around the struct_mutex as the nesting is not strictly
required and such ordering will one day be fixed by explicitly noting
the barrier dependencies between the GGTT and rpm.
Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ...") Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Thu, 4 Aug 2016 07:43:53 +0000 (08:43 +0100)]
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
Erratum SKL075: Display Flicker May Occur When Both VT-d And FBC Are Enabled
"Display flickering may occur when both FBC (Frame Buffer Compression)
and VT - d (Intel® Virtualization Technology for Directed I/O) are enabled
and in use by the display controller."
Ville found the w/a name in the database:
WaFbcTurnOffFbcWhenHyperVisorIsUsed:skl,bxt and also dug out that it
affects Broxton.
v2: Log when the quirk is applied.
v3: Ensure i915.enable_fbc is false when !HAS_FBC()
v4: Fix function name after rebase
v5: Add Broxton to the workaround
Note for backporting to stable, we need to add
#define mkwrite_device_info(ptr) \
((struct intel_device_info *)INTEL_INFO(ptr))
Chris Wilson [Wed, 27 Jul 2016 18:11:17 +0000 (19:11 +0100)]
drm/i915: Fix use of engine->index for register offset
Since commit de1add360522 ("drm/i915: Decouple execbuf uAPI from internal
implementation") the index of the engine (its engine->id) in the
internal list no longer matches the hardware id. However, in a couple of
locations we missed fixing up the difference. In this case,
RING_FAULT_REG() refers to engine->id which is now not what the register
offset actually should be. Fortunately, in both case we should be more
or less looping over 0..I915_NUM_ENGINES.
Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469643077-2523-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org
The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.
Chris Wilson [Thu, 4 Aug 2016 06:52:46 +0000 (07:52 +0100)]
drm/i915: Mark the context and address space as closed
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes. First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.
Chris Wilson [Thu, 4 Aug 2016 06:52:45 +0000 (07:52 +0100)]
drm/i915: Release vma when the handle is closed
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.
v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.
Chris Wilson [Thu, 4 Aug 2016 06:52:44 +0000 (07:52 +0100)]
drm/i915: Track active vma requests
Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.
A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...
v2: Update inline names to be consistent with
i915_gem_object_get_active()
Chris Wilson [Thu, 4 Aug 2016 06:52:43 +0000 (07:52 +0100)]
drm/i915: i915_vma_move_to_active prep patch
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).
Chris Wilson [Thu, 4 Aug 2016 06:52:41 +0000 (07:52 +0100)]
drm/i915: Double check activity before relocations
If the object is active and we need to perform a relocation upon it, we
need to take the slow relocation path. Before we do, double check the
active requests to see if they have completed.
Chris Wilson [Thu, 4 Aug 2016 06:52:39 +0000 (07:52 +0100)]
drm/i915: Disable waitboosting for a saturated engine
If the user floods the GPU with so many requests that the engine stalls
waiting for free space, don't automatically promote the GPU to maximum
frequencies. If the GPU really is saturated with work, it will migrate
to high clocks by itself, otherwise it is merely a user flooding us with
busy-work.
Chris Wilson [Thu, 4 Aug 2016 06:52:37 +0000 (07:52 +0100)]
drm/i915: Convert intel_overlay to request tracking
intel_overlay already tracks its last flip request, along with action to
take after its completion. Refactor intel_overlay to reuse the common
i915_gem_active tracker.
Chris Wilson [Thu, 4 Aug 2016 06:52:36 +0000 (07:52 +0100)]
drm/i915: Track requests inside each intel_ring
By tracking each request occupying space inside an individual
intel_ring, we can greatly simplify the logic of tracking available
space and not worry about other timelines. (Each ring is an ordered
timeline of committed requests.)