Tvrtko Ursulin [Thu, 17 Nov 2016 09:02:43 +0000 (09:02 +0000)]
drm/i915: Fix gen9 forcewake range table
Commit 0dd356bb6ff5 ("drm/i915: Eliminate Gen9 special case")
accidentaly dropped a MMIO range between 0xc000 to 0xcfff out
of the blitter forcewake domain. Fix it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 0dd356bb6ff5 ("drm/i915: Eliminate Gen9 special case") Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1479373363-16528-1-git-send-email-tvrtko.ursulin@linux.intel.com
drm/i915: set DIDL using the ACPI video output device _ADR method return.
which went out of its way to cater to a specific BIOS, setting up DIDL
based on _ADR method. Perhaps that approach worked on that specific
machine, but on the machines I checked the _ADR method invents the
device identifiers out of thin air if DIDL has not been set. The source
for _ADR is also supposed to be the DIDL set by the driver, not the
other way around.
With this, we'll also limit the number of outputs to what the driver
actually has.
A side effect of this change is that the DIDL, and by proxy CADL, will
be initialized in the order of the connector list. That, in turn, has
internal panels in front, ensuring they're included in the DIDL and CADL
lists. Hopefully this ensures the BIOS does not block backlight hotkey
events, thinking the internal panel is off.
v2: do not set ACPI_DEVICE_ID_SCHEME in the device id (Peter Wu)
v3: Rebase
Cc: Peter Wu <peter@lekensteyn.nl> Cc: Rainer Koenig <Rainer.Koenig@ts.fujitsu.com> Cc: Jan-Marek Glogowski <glogow@fbihome.de> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: Paolo Stivanin <paolostivanin@fastmail.fm> Tested-by: Rainer Koenig <Rainer.Koenig@ts.fujitsu.com> Tested-by: Paolo Stivanin <paolostivanin@fastmail.fm> Tested-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Reviewed-and-tested-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/9660d29cf310c17bbf4d58c0e09d5b047446e2d5.1479295490.git.jani.nikula@intel.com
Min He [Wed, 16 Nov 2016 14:05:04 +0000 (22:05 +0800)]
drm/i915: fix the dequeue logic for single_port_submission context
For a single_port_submission context, GVT expects that it can only be
submitted to port 0, and there shouldn't be any other context in port 1
at the same time. This is required by GVT-g context to have an opportunity
to save/restore some non-hw context render registers.
This patch is to workaround GVT-g.
v2: optimized code by following Chris's advice, and added more comments to
explain the patch.
v3: followed the coding style.
Chris Wilson [Wed, 16 Nov 2016 15:27:21 +0000 (15:27 +0000)]
drm/i915/execlists: Use a local lock for dfs_link access
Avoid requiring struct_mutex for exclusive access to the temporary
dfs_link inside the i915_dependency as not all callers may want to touch
struct_mutex. So rather than force them to take a highly contended
lock, introduce a local lock for the execlists schedule operation.
Reported-by: David Weinehall <david.weinehall@linux.intel.com> Fixes: 9a151987d709 ("drm/i915: Add execution priority boosting for mmioflips") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161116152721.11053-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Praveen Paneri [Tue, 15 Nov 2016 17:19:20 +0000 (22:49 +0530)]
drm/i915/bxt: Broxton decoupled MMIO
Decoupled MMIO is an alternative way to access forcewake domain
registers, which requires less cycles for a single read/write and
avoids frequent software forcewake.
This certainly gives advantage over the forcewake as this new
mechanism “decouples” CPU cycles and allow them to complete even
when GT is in a CPD (frequency change) or C6 state.
This can co-exist with forcewake and we will continue to use forcewake
as appropriate. E.g. 64-bit register writes to avoid writing 2 dwords
separately and land into funny situations.
v2:
- Moved platform check out of the function and got rid of duplicate
functions to find out decoupled power domain (Chris)
- Added a check for forcewake already held and skipped decoupled
access (Chris)
- Skipped writing 64 bit registers through decoupled MMIO (Chris)
v3:
- Improved commit message with more info on decoupled mmio (Tvrtko)
- Changed decoupled operation to enum and used u32 instead of
uint_32 data type for register offset (Tvrtko)
- Moved HAS_DECOUPLED_MMIO to device info (Tvrtko)
- Added lookup table for converting fw_engine to pd_engine (Tvrtko)
- Improved __gen9_decoupled_read and __gen9_decoupled_write
routines (Tvrtko)
v4:
- Fixed alignment and variable names (Chris)
- Write GEN9_DECOUPLED_REG0_DW1 register in just one go (Zhe Wang)
v5:
- Changed HAS_DECOUPLED_MMIO() argument name to dev_priv (Tvrtko)
- Sanitize info->had_decoupled_mmio at init (Chris)
dev_priv->hw_ddb is only used by skl_update_crtcs, but the ddb
allocation for each pipe is calculated in crtc_state.
We can rid of the global member by looking at crtc_state.
Do this by saving all active old ddb allocations from the old crtc_state
in an array, and then point them to the new allocation every time we update
a crtc.
This will allow us to keep track of the intermediate ddb allocations,
which is what hw_ddb was previously used for. With hw_ddb gone all
SKL-style watermark values are properly maintained only in crtc_state.
drm/i915/gen9+: Preserve old allocation from crtc_state.
This is the last bit required for making nonblocking modesets work
correctly. The state in intel_crtc->hw_ddb is updated in the
nonblocking part of a nonblocking commit.
This means that even attempting a commit before a nonblocking modeset
completes will fail, because intel_crtc->hw_ddb still has stale values.
The stale values are 0 if the crtc is being enabled resulting in a
failure during atomic check, but it may also result in double use of
ddb allocations.
Fix this by explicitly copying the ddb allocation from the old state.
This has to be done explicitly, because a modeset that doesn't change
active pipes, or a modeset converted to a fastset will will clear the
current state.
drm/i915/gen9+: Program watermarks as a separate step during evasion, v3.
The watermark updates for SKL style watermarks are no longer done
in the plane callbacks, but are now called in a separate watermark
update function that's called during the same vblank evasion,
before the plane updates.
This also gets rid of the global skl_results, which was required for
keeping track of the current atomic commit.
Changes since v1:
- Move line unwrap to correct patch. (Lyude)
- Make sure we don't regress ILK watermarks. (Matt)
- Rephrase commit message. (Matt)
Changes since v2:
- Fix disable watermark check to use the correct way to determine single
step watermark support.
drm/i915: Add an atomic evasion step to watermark programming, v4.
Allow the driver to write watermarks during atomic evasion.
This will make it possible to write the watermarks in a cleaner
way on gen9+.
intel_atomic_state is not used here yet, but will be used when
we program all watermarks as a separate step during evasion.
This also writes linetime all the time, while before it was only
done during plane updates. This looks like this could be a bugfix,
but I'm not sure what it affects.
Changes since v1:
- Add comment about atomic evasion to commit message.
- Unwrap I915_WRITE call. (Lyude)
Changes since v2:
- Rename atomic_evade_watermarks to atomic_update_watermarks. (Ville)
- Add line wraps where appropriate, fix grammar in commit message. (Matt)
Changes since v3:
- Actually fix commit message. (Matt)
- Line wrap calls to watermark update functions. (Matt)
Chris Wilson [Tue, 15 Nov 2016 09:22:49 +0000 (09:22 +0000)]
drm/i915: Add execution priority boosting for mmioflips
Commit 6b5e90f58c56 ("drm/i915/scheduler: Boost priorities for flips")
added priority boosting for the modern atomic pageflips (and modesets),
but we should do the same for existing users of mmioflips (we don't yet
need to consider csflips as they are not used by execlists and so do not
have any support for a scheduler).
Chris Wilson [Mon, 14 Nov 2016 20:41:05 +0000 (20:41 +0000)]
drm/i915/scheduler: Boost priorities for flips
Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.
v2: Descend dma_fence_array when boosting priorities.
Chris Wilson [Mon, 14 Nov 2016 20:41:04 +0000 (20:41 +0000)]
drm/i915: Store the execution priority on the context
In order to support userspace defining different levels of importance to
different contexts, and in particular the preferred order of execution,
store a priority value on each context. By default, the kernel's
context, which is used for idling and other background tasks, is given
minimum priority (i.e. all user contexts will execute before the kernel).
Chris Wilson [Mon, 14 Nov 2016 20:41:03 +0000 (20:41 +0000)]
drm/i915/scheduler: Execute requests in order of priorities
Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.
The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.
When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.
One more limitation, is that this is first implementation is for
execlists only so currently limited to gen8/gen9.
v2: Replace recursive priority inheritance bumping with an iterative
depth-first search list.
v3: list_next_entry() for walking lists
v4: Explain how the dfs solves the recursion problem with PI.
Chris Wilson [Mon, 14 Nov 2016 20:41:02 +0000 (20:41 +0000)]
drm/i915/scheduler: Record all dependencies upon request construction
The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.
One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).
To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.
v2: Create a dedicated slab for i915_dependency.
Rename the lists.
Chris Wilson [Mon, 14 Nov 2016 20:40:59 +0000 (20:40 +0000)]
drm/i915: Defer transfer onto execution timeline to actual hw submission
Defer the transfer from the client's timeline onto the execution
timeline from the point of readiness to the point of actual submission.
For example, in execlists, a request is finally submitted to hardware
when the hardware is ready, and only put onto the hardware queue when
the request is ready. By deferring the transfer, we ensure that the
timeline is maintained in retirement order if we decide to queue the
requests onto the hardware in a different order than fifo.
v2: Rebased onto distinct global/user timeline lock classes.
v3: Play with the position of the spin_lock().
v4: Nesting finally resolved with distinct sw_fence lock classes.
Chris Wilson [Mon, 14 Nov 2016 20:40:58 +0000 (20:40 +0000)]
drm/i915: Split request submit/execute phase into two
In order to support deferred scheduling, we need to differentiate
between when the request is ready to run (i.e. the submit fence is
signaled) and when the request is actually run (a new execute fence).
This is typically split between the request itself wanting to wait upon
others (for which we use the submit fence) and the CPU wanting to wait
upon the request, for which we use the execute fence to be sure the
hardware is ready to signal completion.
Chris Wilson [Mon, 14 Nov 2016 20:40:57 +0000 (20:40 +0000)]
drm/i915: Create distinct lockclasses for execution vs user timelines
In order to simplify the lockdep annotation, as they become more complex
in the future with deferred execution and multiple paths through the
same functions, create a separate lockclass for the user timeline and
the hardware execution timeline.
We should only ever be locking the user timeline and the execution
timeline in parallel so we only need to create two lock classes, rather
than a separate class for every timeline.
v2: Rename the lock classes to be more consistent with other lockdep.
Chris Wilson [Mon, 14 Nov 2016 20:40:56 +0000 (20:40 +0000)]
drm/i915: Give each sw_fence its own lockclass
Localise the static struct lock_class_key to the caller of
i915_sw_fence_init() so that we create a lock_class instance for each
unique sw_fence rather than all sw_fences sharing the same
lock_class. This eliminate some lockdep false positive when using fences
from within fence callbacks.
For the relatively small number of fences currently in use [2], this adds
160 bytes of unused text/code when lockdep is disabled. This seems
quite high, but fully reducing it via ifdeffery is also quite ugly.
Removing the #fence strings saves 72 bytes with just a single #ifdef.
Ville Syrjälä [Tue, 8 Nov 2016 14:47:11 +0000 (16:47 +0200)]
drm/i915: Remove some duplicated plane swapping logic
On pre-gen4 we connect plane A to pipe B and vice versa to get an FBC
capable plane feeding the LVDS port by default. We have the logic for
the plane swapping duplicated in many places. Let's remove a bit of the
duplication by having the crtc look up the thing from the primary plane.
Ville Syrjälä [Mon, 14 Nov 2016 17:44:07 +0000 (19:44 +0200)]
drm/i915: Simplify DP port limited color range bit platform checks
Instead of checking for everything not supporting the limited color
range bit in the DP port register, let's check for the one thing
that does have it (g4x).
Ville Syrjälä [Mon, 14 Nov 2016 16:53:59 +0000 (18:53 +0200)]
drm/i915: Clean up rotation DSPCNTR/DVSCNTR/etc. setup
Move the plane control register rotation setup away from the
coordinate munging code. This will result in neater looking
code once we add reflection support for CHV.
Ville Syrjälä [Mon, 14 Nov 2016 16:53:58 +0000 (18:53 +0200)]
drm/i915: Use & instead if == to check for rotations
Using == to check for 180 degree rotation only works as long as the
reflection bits aren't set. That will change soon enough for CHV, so
let's stop doing things the wrong way.
Paulo Zanoni [Fri, 11 Nov 2016 16:57:41 +0000 (14:57 -0200)]
drm/i915/fbc: convert intel_fbc.c to use INTEL_GEN()
Because it's shorter, easier to read, newer and cooler. And I don't
think anybody else has pending FBC patches right now, so the conflicts
should be minimal.
Paulo Zanoni [Fri, 11 Nov 2016 16:57:38 +0000 (14:57 -0200)]
drm/i915/fbc: inline intel_fbc_can_choose()
It only has two checks now, so it makes sense to just move the code to
the caller.
Also take this opportunity to make no_fbc_reason make more sense: now
we'll only list "no suitable CRTC for FBC" instead of maybe giving a
reason why the last CRTC we checked was not selected, and we'll more
consistently set the reason (e.g., if no primary planes are visible).
Paulo Zanoni [Fri, 11 Nov 2016 16:57:37 +0000 (14:57 -0200)]
drm/i915/fbc: extract intel_fbc_can_enable()
Extract that part of the code to a new function and call this function
only once during intel_fbc_choose_crtc() instead of once per plane.
Those checks are independent from planes/CRTCs.
Paulo Zanoni [Fri, 11 Nov 2016 16:57:35 +0000 (14:57 -0200)]
drm/i915/fbc: move the intel_fbc_can_choose() call out of the loop
We can just call it earlier, so do it. The goal of the loop is to get
the plane's CRTC state, and we don't need it in order to call
intel_fbc_can_choose().
Chris Wilson [Thu, 3 Nov 2016 20:08:52 +0000 (20:08 +0000)]
drm/i915: Fix test on inputs for vma_compare()
When supplying a view to vma_compare() it is required that the supplied
i915_address_space is the global GTT. I tested the VMA instead (which is
the current position in the rbtree and maybe from any address space).
(This reapplies commit a44342acde30 ("drm/i915: Fix test on inputs for
vma_compare()") as it was lost in the vma split)
Reported-by: Matthew Auld <matthew.auld@intel.com> Tested-by: Matthew Auld <matthew.auld@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98579 Fixes: db6c2b4151f2 ("drm/i915: Store the vma in an rbtree...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161103200852.23431-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Fixes: b42fe9ca0a1e ("drm/i915: Split out i915_vma.c") Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Paulo Zanoni [Tue, 8 Nov 2016 20:22:11 +0000 (18:22 -0200)]
drm/i915/gen9: fix the WM memory bandwidth WA for Y tiling cases
The previous spec version said "double Ytile planes minimum lines",
and I interpreted this as referring to what the spec calls "Y tile
minimum", but in fact it was referring to what the spec calls "Minimum
Scanlines for Y tile". I noticed that Mahesh Kumar had a different
interpretation, so I sent and email to the spec authors and got
clarification on the correct meaning. Also, BSpec was updated and
should be clear now.
Ville Syrjälä [Fri, 11 Nov 2016 17:14:24 +0000 (19:14 +0200)]
drm/i915: Assume non-DP++ port if dvo_port is HDMI and there's no AUX ch specified in the VBT
My heuristic for detecting type 1 DVI DP++ adaptors based on the VBT
port information apparently didn't survive the reality of buggy VBTs.
In this particular case we have a machine with a natice HDMI port, but
the VBT tells us it's a DP++ port based on its capabilities.
The dvo_port information in VBT does claim that we're dealing with a
HDMI port though, but we have other machines which do the same even
when they actually have DP++ ports. So that piece of information alone
isn't sufficient to tell the two apart.
After staring at a bunch of VBTs from various machines, I have to
conclude that the only other semi-reliable clue we can use is the
presence of the AUX channel in the VBT. On this particular machine
AUX channel is specified as zero, whereas on some of the other machines
which listed the DP++ port as HDMI have a non-zero AUX channel.
I've also seen VBTs which have dvo_port a DP but have a zero AUX
channel. I believe those we need to treat as DP ports, so we'll limit
the AUX channel check to just the cases where dvo_port is HDMI.
If we encounter any more serious failures with this heuristic I think
we'll have to have to throw it out entirely. But that could mean that
there is a risk of type 1 DVI dongle users getting greeted by a
black screen, so I'd rather not go there unless absolutely necessary.
v2: Remove the duplicate PORT_A check (Daniel)
Fix some typos in the commit message
Cc: Daniel Otero <daniel.otero@outlook.com> Cc: stable@vger.kernel.org Tested-by: Daniel Otero <daniel.otero@outlook.com> Fixes: d61992565bd3 ("drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97994 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478884464-14251-1-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Jani Nikula [Mon, 31 Oct 2016 10:18:28 +0000 (12:18 +0200)]
drm/i915: rename preliminary_hw_support to alpha_support
The term "preliminary hardware support" has always caused confusion both
among users and developers. It has always been about preliminary driver
support for new hardware, and not so much about preliminary hardware. Of
course, initially both the software and hardware are in early stages,
but the distinction becomes more clear when the user picks up production
hardware and an older kernel to go with it, with just the early support
we had for the hardware at the time the kernel was released. The user
has to specifically enable the alpha quality *driver* support for the
hardware in that specific kernel version.
Rename preliminary_hw_support to alpha_support to emphasize that the
module parameter, config option, and flag are about software, not about
hardware. Improve the language in help texts and debug logging as well.
This appears to be a good time to do the change, as there are currently
no platforms with preliminary^W alpha support.
Chris Wilson [Fri, 11 Nov 2016 14:58:09 +0000 (14:58 +0000)]
drm/i915: Stop skipping the final clflush back to system pages
When we release the shmem backing storage, we make sure that the pages
are coherent with the cpu cache. However, our clflush routine was
skipping the flush as the object had no pages at release time. Fix this by
explicitly flushing the sg_table we are decoupling.
Chris Wilson [Fri, 11 Nov 2016 14:58:08 +0000 (14:58 +0000)]
drm/i915: Only wait upon the execution timeline when unlocked
In order to walk the list of all timelines, we currently require the
struct_mutex. We are sometimes called prior to the struct_mutex being
taken by the caller (i.e !I915_WAIT_LOCKED) in which case we can only
trust the global execution timelines (as these are owned by the device).
This means in the unlocked phase we can only wait upon the currently
executing requests and not all queued.
Tvrtko Ursulin [Wed, 9 Nov 2016 15:13:43 +0000 (15:13 +0000)]
drm/i915: Trim the object sg table
At the moment we allocate enough sg table entries assuming we
will not be able to do any coalescing. But since in practice
we most often can, and more so very effectively, this ends up
wasting a lot of memory.
A simple and effective way of trimming the over-allocated
entries is to copy the table over to a new one allocated to the
exact size.
Experiments on my freshly logged and idle desktop (KDE) showed
that by doing this we can save approximately 1 MiB of RAM, or
when running a typical benchmark like gl_manhattan I have
even seen a 6 MiB saving.
More complicated techniques such as only copying the last used
page and freeing the rest are left to the reader.
v2:
* Update commit message.
* Use temporary sg_table on stack. (Chris Wilson)
Jike Song [Wed, 9 Nov 2016 12:30:59 +0000 (20:30 +0800)]
drm/i915/gvt: add KVMGT support
KVMGT is the MPT implementation based on VFIO/KVM. It provides
a kvmgt_mpt ops to gvt for vGPU access mediation, e.g. to
mediate and emulate the MMIO accesses, to inject interrupts
to vGPU user, to intercept the GTT writing and replace it with
DMA-able address, to write-protect guest PPGTT table for
shadowing synchronization, etc. This patch provides the MPT
implementation for GVT, not yet functional due to theabsence
of mdev.
It's built as kvmgt.ko, depends on vfio.ko, kvm.ko and mdev.ko,
and being required by i915.ko. To not introduce hard dependency
in i915.ko, we used indirect symbol reference. But that means
users have to include kvmgt.ko into init ramdisk if their
i915.ko is included.
Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Xiaoguang Chen <xiaoguang.chen@intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Jike Song [Thu, 3 Nov 2016 10:38:35 +0000 (18:38 +0800)]
drm/i915/gvt: refactor intel_gvt_io_emulation_ops to be intel_gvt_ops
There are currently 4 methods in intel_gvt_io_emulation_ops
to emulate CFG/MMIO reading/writing for intel vGPU. A possibly
better scope is: add 3 more methods for vgpu create/destroy/reset
respectively, and rename the ops to 'intel_gvt_ops', then pass
it to the MPT module (say the future kvmgt) to use: they are
all methods for external usage.
Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Jike Song [Thu, 3 Nov 2016 10:38:34 +0000 (18:38 +0800)]
drm/i915/gvt: allow several MPT methods to be NULL
Hypervisors are different, the MPT ops is a only superset of
all possibly supported hypervisors. There might be other way
out of the MPT to achieve same target. e.g. vfio-based kvmgt
won't provide map_gfn_to_mfn method to establish guest EPT
mapping for aperture, since it will be done in QEMU/KVM, MMIO
is also trapped elsewhere, etc.
Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Jike Song [Thu, 3 Nov 2016 10:38:32 +0000 (18:38 +0800)]
drm/i915/gvt: remove obsolete code for old kvmgt opregion
Current GVT contains some obsolete logic originally cooked to
support the old, non-vfio kvmgt, which is actually workarounds.
We don't support that anymore, so it's safe to remove it and
make a better framework.
Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Bing Niu [Mon, 7 Nov 2016 02:44:36 +0000 (10:44 +0800)]
drm/i915/gvt: don't rely on guest PPGTT entry to free old shadow data
On guest writing a PPGTT entry, if it contains value and the old
entry is valid, gvt will read it and find & free the corresponding
old data for it. However, with the KVM write protection provided
by page_track, the guest entry will be written with new value
before gvt handling. To avoid that, we should use the shadow
entry instead.
Signed-off-by: Bing Niu <bing.niu@intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Chris Wilson [Tue, 8 Nov 2016 14:37:19 +0000 (14:37 +0000)]
drm/i915: Spin until breadcrumb threads are complete
When we need to reset the global seqno on wraparound, we have to wait
until the current rbtrees are drained (or otherwise the next waiter will
be out of sequence). The current mechanism to kick and spin until
complete, may exit too early as it would break if the target thread was
currently running. Instead, we must wake up the threads, but keep
spinning until the trees have been deleted.
In order to appease Tvrtko, busy spin rather than yield().
Arnd Bergmann [Tue, 8 Nov 2016 13:58:17 +0000 (14:58 +0100)]
drm/i915: avoid harmless empty-body warning
The newly added assert_kernel_context_is_current introduces a warning
when built with W=1:
drivers/gpu/drm/i915/i915_gem.c: In function ‘assert_kernel_context_is_current’:
drivers/gpu/drm/i915/i915_gem.c:4417:63: error: suggest braces around empty body in an ‘else’ statement [-Werror=empty-body]
Changing the GEM_BUG_ON() macro from an empty definition to "do { } while (0)"
makes the macro more robust to use and avoids the warning.
Fixes: 3033acab07f9 ("drm/i915: Queue the idling context switch after all other timelines") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161108135834.2166677-1-arnd@arndb.de
Ville Syrjälä [Mon, 7 Nov 2016 20:20:57 +0000 (22:20 +0200)]
drm/i915: Use intel_fb_gtt_offset() also for gen2/3 primary plane
The code to determine the primary plane offset for gen2/3 looks
different than the code for gen4+, but in fact it's doing the same
thing. Let's make it uniform. Allows us to eliminate the 'obj' from
the list of local variables as well.
Ville Syrjälä [Mon, 7 Nov 2016 20:20:56 +0000 (22:20 +0200)]
drm/i915: Fix error handling for cursor/sprite plane create failure
intel_cursor_plane_create() and intel_sprite_plane_create() return
an error pointer, so let's not mistakenly look for a NULL pointer.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://lists.freedesktop.org/archives/intel-gfx/2016-November/110690.html Fixes: b079bd17e301 ("drm/i915: Bail if plane/crtc init fails") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478550057-24864-5-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Ville Syrjälä [Mon, 7 Nov 2016 20:20:55 +0000 (22:20 +0200)]
drm/i915: Grab the rotation from the passed plane state for VLV sprites
Use the passed in plane_state instead of plane->state in
vlv_update_plane(). Currently the two are one and the same, but if we
start queuing up multiple plane updates they might not be.
Chris Wilson [Sun, 6 Nov 2016 12:59:59 +0000 (12:59 +0000)]
drm/i915: Remove chipset flush after cache flush
We always flush the chipset prior to executing with the GPU, so we can
skip the flush during ordinary domain management.
This should help mitigate some of the potential performance regressions,
but likely trivial, from doing the flush unconditionally before execbuf
introduced in commit dcd79934b0dd ("drm/i915: Unconditionally flush any
chipset buffers before execbuf")
Chris Wilson [Mon, 7 Nov 2016 16:52:04 +0000 (16:52 +0000)]
drm/i915: Mark CPU cache as dirty when used for rendering
On LLC, or even snooped, machines rendering via the GPU ends up in the CPU
cache. This cacheline dirt also needs to be flushed to main memory when
moving to an incoherent domain, such as the display's scanout engine.
Mostly, this happens because either the object is marked as dirty from
its first use or is avoided by setting the object into the display
domain from the start.
v2: Treat WT as not requiring a clflush prior to use on the display
engine as well.
Fixes: 0f71979ab7fb ("drm/i915: Performed deferred clflush inside set-cache-level")
References: https://bugs.freedesktop.org/show_bug.cgi?id=95414 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.0+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161107165204.7008-1-chris@chris-wilson.co.uk
Imre Deak [Mon, 7 Nov 2016 09:20:05 +0000 (11:20 +0200)]
drm/i915: Add assert for no pending GPU requests during suspend/resume in LR mode
During resume we will reset the SW/HW tracking for each ring head/tail
pointers and so are not prepared to replay any pending requests (as
opposed to GPU reset time). Add an assert for this both to the suspend
and the resume code.
v2:
- Check for ELSP port idle already during suspend and check !gt.awake
during resume. (Chris)
v3:
- Move the !gt.awake check to i915_gem_resume().
v4:
- s/intel_lr_engines_idle/intel_execlists_idle/ (Chris)
Imre Deak [Mon, 7 Nov 2016 09:20:04 +0000 (11:20 +0200)]
drm/i915: Make sure engines are idle during GPU idling in LR mode
We assume that the GPU is idle once receiving the seqno via the last
request's user interrupt. In execlist mode the corresponding context
completed interrupt can be delayed though and until this latter
interrupt arrives we consider the request to be pending on the ELSP
submit port. This can cause a problem during system suspend where this
last request will be seen by the resume code as still pending. Such
pending requests are normally replayed after a GPU reset, but during
resume we reset both SW and HW tracking of the ring head/tail pointers,
so replaying the pending request with its stale tail pointer will leave
the ring in an inconsistent state. A subsequent request submission can
lead then to the GPU executing from uninitialized area in the ring
behind the above stale tail pointer.
Fix this by making sure any pending request on the ELSP port is
completed before suspending. I used a polling wait since the completion
time I measured was <1ms and since normally we only need to wait during
system suspend. GPU idling during runtime suspend is scheduled with a
delay (currently 50-100ms) after the retirement of the last request at
which point the context completed interrupt must have arrived already.
drm/i915/hsw: Fix GPU hang during resume from S3-devices state
but it could happen even without the explicit GPU reset, since we
disable interrupts afterwards during the suspend sequence.
v2:
- Do an unlocked poll-wait first. (Chris)
v3-4:
- s/intel_lr_engines_idle/intel_execlists_idle/ and move
i915.enable_execlists check to the new helper. (Chris)
Imre Deak [Mon, 7 Nov 2016 09:20:03 +0000 (11:20 +0200)]
drm/i915: Avoid early GPU idling due to race with new request
There is a small race where a new request can be submitted and retired
after the idle worker started to run which leads to idling the GPU too
early. Fix this by deferring the idling to the pending instance of the
worker.
Imre Deak [Mon, 7 Nov 2016 09:20:02 +0000 (11:20 +0200)]
drm/i915: Avoid early GPU idling due to already pending idle work
Atm, in case an idle work handler is already pending but haven't yet
started to run, retiring a new request will not extend the active period
as required, rather simply leaves the pending idle work to be scheduled
at the original expiration time. This may lead to idling the GPU too
early. Fix this by using the delayed-work scheduler alternative which
makes sure the handler's expiration time is extended in this case.
Chris Wilson [Mon, 7 Nov 2016 11:01:28 +0000 (11:01 +0000)]
drm/i915: Limit Valleyview and earlier to only using mappable scanout
Valleyview appears to be limited to only scanning out from the first 512MiB
of the Global GTT. Lets presume that this behaviour was inherited from the
display block copied from g4x (not Ironlake) and all earlier generations
are similarly affected, though testing suggests different symptoms. For
simplicity, impose that these platforms must scanout from the mappable
region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this
catches Cherryview which does not appear to be limited to the low
aperture for its scanout.)
v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about
limiting this workaround to the old style of display engine.
v3: Update changelog to reflect testing by Ville Syrjälä
v4: Include the changes to the comments as well
Reported-by: Luis Botello <luis.botello.ortega@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036 Fixes: 2efb813d5388 ("drm/i915: Fallback to using unmappable memory for scanout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com
Chris Wilson [Mon, 7 Nov 2016 10:54:43 +0000 (10:54 +0000)]
drm/i915: Round tile chunks up for constructing partial VMAs
When we split a large object up into chunks for GTT faulting (because we
can't fit the whole object into the aperture) we have to align our cuts
with the fence registers. Each partial VMA must cover a complete set of
tile rows or the offset into each partial VMA is not aligned with the
whole image. Currently we enforce a minimum size on each partial VMA,
but this minimum size itself was not aligned to the tile row causing
distortion.
Reported-by: Andreas Reis <andreas.reis@gmail.com> Reported-by: Chris Clayton <chris2553@googlemail.com> Reported-by: Norbert Preining <preining@logic.at> Tested-by: Norbert Preining <preining@logic.at> Tested-by: Chris Clayton <chris2553@googlemail.com> Fixes: 03af84fe7f48 ("drm/i915: Choose partial chunksize based on tile row size") Fixes: a61007a83a46 ("drm/i915: Fix partial GGTT faulting") # enabling patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402
Testcase: igt/gem_mmap_gtt/medium-copy-odd Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Fri, 4 Nov 2016 16:12:41 +0000 (16:12 +0000)]
drm/i915: Remove the vma from the object list upon close
Currently, the vma is being unlink from the object lookup on destroy.
However, we are meant to be decoupling it upon close so that the user
cannot access the closed vma whilst it remains active on the GPU.
Ping Gao [Fri, 4 Nov 2016 05:47:35 +0000 (13:47 +0800)]
drm/i915/gvt: implement scratch page table tree for shadow PPGTT
All the unused entries in the page table tree(PML4E->PDPE->PDE->PTE)
should point to scratch page table/scratch page to avoid page walk error
due to the page prefetching.
When removing an entry in shadow PPGTT, it need map to scratch page
also, the older implementation use single scratch page to assign to all
level entries, it doesn't align the page walk behavior when removed
entry is in PML, PDP, PD. To avoid potential page walk error this patch
implement a scratch page tree to replace the single scratch page.
v2: more details in commit message address Kevin's comments.
Signed-off-by: Ping Gao <ping.a.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Du, Changbin [Fri, 4 Nov 2016 04:21:37 +0000 (12:21 +0800)]
drm/i915/gvt: emulate vgpu engine reset control behavior
When SW wishes to reset the render engine, it will program
engine's reset control register and wait response from HW.
We need emulate the behavior of this register so guest i915
driver could walk through the engine reset flow. The registers
are not emulated in gvt yet, this patch add the emulation
logic.
v2: add more desc info in commit message.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Du, Changbin <changbin.du@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Zhenyu Wang [Wed, 2 Nov 2016 07:00:15 +0000 (15:00 +0800)]
drm/i915/gvt: Fix workload status after wait
From commit e95433c73a11759203af1cae5958f998c9673370, workload status setting
was changed to only capture on error path, but we need to set it properly in
normal path too, otherwise we'll fail to complete workload which could lead
guest VM vGPU reset.
v2: uses braces and add Fixes tag.
Fixes: e95433c73a11 ("drm/i915: Rearrange i915_wait_request() accounting with callers") Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Lyude [Wed, 2 Nov 2016 01:06:30 +0000 (21:06 -0400)]
drm/i915: Reinit polling before hpd when resuming
Now that we don't run the connector reprobing from i915_drm_resume(), we
need to make it so we don't have to wait for reprobing to finish so that
we actually speed things up. In order to do this, we need to make sure
that i915_drm_resume() doesn't get blocked by i915_hpd_poll_init_work()
while trying to acquire the mode_config lock that
drm_kms_helper_poll_enable() needs to acquire.
The easiest way to do this is to just enable polling before hpd. This
shouldn't break anything since at that point we have everything else we
need for polling enabled.
As well, this should result in a rather significant improvement in how
quickly we can resume the system.
Signed-off-by: Lyude <lyude@redhat.com> Tested-by: David Weinehall <david.weinehall@linux.intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Testcase: analyze_suspend.py -config config/suspend-callgraph.cfg -filter i915
Lyude [Tue, 1 Nov 2016 21:31:32 +0000 (17:31 -0400)]
drm/i915: Remove redundant reprobe in i915_drm_resume
Weine's investigation on benchmarking the suspend/resume process pointed
out a lot of the time in suspend/resume is being spent reprobing. While
the reprobing process is a lengthy one for good reason, we don't need to
hold up the entire suspend/resume process while we wait for it to
finish. Luckily as it turns out, we already trigger a full connector
reprobe in i915_hpd_poll_init_work(), so we can just ditch reprobing in
i915_drm_resume() entirely.
This won't lead to less time spent resuming just yet since now the
bottleneck will be waiting for the mode_config lock in
drm_kms_helper_poll_enable(), since that will be held as long as
i915_hpd_poll_init_work() is reprobing all of the connectors. But we'll
address that in the next patch.
Signed-off-by: Lyude <lyude@redhat.com> Tested-by: David Weinehall <david.weinehall@linux.intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Testcase: analyze_suspend.py -config config/suspend-callgraph.cfg -filter i915
drm/i915/dp: Extend BDW DP audio workaround to GEN9 platforms
According to BSpec, cdclk for BDW has to be not less than 432 MHz with DP
audio enabled, port width x4, and link rate HBR2 (5.4 GHz). With cdclk less
than 432 MHz, enabling audio leads to pipe FIFO underruns and displays
cycling on/off.
Let's apply this work around to GEN9 platforms too, as it fixes the same
issue.
v2: Move drm_device to drm_i915_private conversion
According to BSpec, cdclk for BDW has to be not less than 432 MHz with DP
audio enabled, port width x4, and link rate HBR2 (5.4 GHz). With cdclk less
than 432 MHz, enabling audio leads to pipe FIFO underruns and displays
cycling on/off.
From BSpec:
"Display» BDW-SKL» dpr» [Register] DP_TP_CTL [BDW+,EXCLUDE(CHV)]
Workaround : Do not use DisplayPort with CDCLK less than 432 MHz, audio
enabled, port width x4, and link rate HBR2 (5.4 GHz), or else there may
be audio corruption or screen corruption."
Since, some DP configurations (e.g., MST) use port width x4 and HBR2
link rate, let's increase the cdclk to >= 432 MHz to enable audio for those
cases.
v4: Changed commit message
v3: Combine BDW pixel rate adjustments into a function (Jani)
v2: Restrict fix to BDW
Retain the set cdclk across modesets (Ville) Cc: stable@vger.kernel.org Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478026080-2925-1-git-send-email-dhinakaran.pandiyan@intel.com
Chris Wilson [Fri, 4 Nov 2016 10:30:01 +0000 (10:30 +0000)]
drm/i915: Fix pages pin counting around swizzle quirk
commit bc0629a76726 ("drm/i915: Track pages pinned due to swizzling
quirk") fixed one problem, but revealed a whole lot more. The root cause
of the pin count mismatch for the swizzle quirk (for L-shaped memory on
gen3/4) was that we were incrementing the pages_pin_count upon getting
the backing pages but then overwriting the pages_pin_count to set it to
1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON
sanitychecks, the fix is to replace the explicit atomic_set with an
atomic_inc.
v2: Consistently use atomics (not mix atomics and helpers) within the
lowlevel get_pages routines. This makes the atomic operations much
clearer.
When a memory slot is being moved or removed users of page track
can be notified. So users can drop write-protection for the pages
in that memory slot.
This notifier type is needed by KVMGT to sync up its shadow page
table when memory slot is being moved or removed.
Register the notifier type track_flush_slot to receive memslot move
and remove event.
Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
[Squashed commits to avoid bisection breakage and reworded the subject.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Chris Wilson [Thu, 3 Nov 2016 20:08:52 +0000 (20:08 +0000)]
drm/i915: Fix test on inputs for vma_compare()
When supplying a view to vma_compare() it is required that the supplied
i915_address_space is the global GTT. I tested the VMA instead (which is
the current position in the rbtree and maybe from any address space).
Reported-by: Matthew Auld <matthew.auld@intel.com> Tested-by: Matthew Auld <matthew.auld@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98579 Fixes: db6c2b4151f2 ("drm/i915: Store the vma in an rbtree...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161103200852.23431-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Chris Wilson [Wed, 2 Nov 2016 17:50:47 +0000 (17:50 +0000)]
drm/i915/guc: Cache the client mapping
Use i915_gem_object_pin_map() for the guc client's lifetime to replace
the peristent kmap + frequent kmap_atomic with a permanent vmapping.
This avoids taking the obj->mm.lock mutex whilst inside irq context
later.
Owen Hofmann [Thu, 27 Oct 2016 18:25:52 +0000 (11:25 -0700)]
kvm: x86: Check memopp before dereference (CVE-2016-8630)
Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a
check for non-NULL under incorrect assumptions. An undefined instruction
with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
to dereference a null pointer here.