Ville Syrjälä [Thu, 1 Jun 2017 14:36:14 +0000 (17:36 +0300)]
drm/i915: Plumb the correct acquire ctx into intel_crtc_disable_noatomic()
If intel_crtc_disable_noatomic() were to ever get called during resume
we'd end up deadlocking since resume has its own acqcuire_ctx but
intel_crtc_disable_noatomic() still tries to use the
mode_config.acquire_ctx. Pass down the correct acquire ctx from the top.
Chris Wilson [Thu, 15 Jun 2017 08:14:35 +0000 (09:14 +0100)]
drm/i915: Split vma exec_link/evict_link
Currently the vma has one link member that is used for both holding its
place in the execbuf reservation list, and in any eviction list. This
dual property is quite tricky and error prone.
Chris Wilson [Thu, 15 Jun 2017 08:14:34 +0000 (09:14 +0100)]
drm/i915: Use vma->exec_entry as our double-entry placeholder
This has the benefit of not requiring us to manipulate the
vma->exec_link list when tearing down the execbuffer, and is a
marginally cheaper test to detect the user error.
Robert Bragg [Tue, 13 Jun 2017 11:23:06 +0000 (12:23 +0100)]
drm/i915/perf: remove perf.hook_lock
In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.
dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.
The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.
Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Robert Bragg [Tue, 13 Jun 2017 11:23:05 +0000 (12:23 +0100)]
drm/i915/perf: per-gen timebase for checking sample freq
An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.
Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.
v2:
Specify frequency of 19.2MHz for BXT (Ville)
Initialize oa_sample_rate_hard_limit per-gen too (Lionel)
Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Robert Bragg [Tue, 13 Jun 2017 11:23:03 +0000 (12:23 +0100)]
drm/i915/perf: Add OA unit support for Gen 8+
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.
Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.
The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).
This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.
The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).
Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.
v5: Drain submitted requests when enabling metric set to ensure no
lite-restore erases the context image we just updated (Lionel)
v6: In addition to drain, switch to kernel context & update all
context in place (Chris)
v7: Add missing mutex_unlock() if switching to kernel context fails
(Matthew)
v8: Simplify OA period/flex-eu-counters programming by using the
batchbuffer instead of modifying ctx-image (Lionel)
v9: Back to updating the context image (due to erroneous testing,
batchbuffer programming the OA unit doesn't actually work)
(Lionel)
Pin context before updating context image (Chris)
Drop MMIO programming now that we switch to a kernel context with
right values in initial context image (Chris)
v10: Just pin_map the contexts we want to modify or let the
configuration happen on first use (Chris)
v11: Update kernel context OA config through the batchbuffer rather
than on the fly ctx-image update (Lionel)
v12: Rework OA context registers update again by swithing away from
user contexts and reconfiguring the kernel context through the
batchbuffer and updating all the other contexts' context image.
Also take care to lock slice/subslice configuration when OA is
on. (Lionel)
v13: Request rpcs updates on all engine when updating the OA config
(Lionel)
v14: Drop any kind of rpcs management now that we monitor sseu
configuration changes in a later patch (Lionel)
Remove usleep after programming the NOA configs on Gen8+, this
doesn't seem to be needed (Lionel)
v15: Respect coding style for block comments (Chris)
v16: Add missing i915_add_request() in case we fail to emit OA
configuration (Matthew)
Signed-off-by: Robert Bragg <robert@sixbynine.org> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Robert Bragg [Tue, 13 Jun 2017 11:23:02 +0000 (12:23 +0100)]
drm/i915/perf: Add 'render basic' Gen8+ OA unit configs
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:
Gen8+ might have mux configurations per slices/subslices. Depending on
whether slices/subslices have been fused off, only part of the
configuration needs to be applied. This change reworks the mux
configurations query mechanism to allow more than one set of registers
to be programmed.
v2: s/n_mux_regs/n_mux_configs/ (Matthew)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Robert Bragg [Tue, 13 Jun 2017 11:23:00 +0000 (12:23 +0100)]
drm/i915: expose _SUBSLICE_MASK GETPARM
Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices can be enabled. This information is
required, for example, to be able to analyse some OA counter reports
where the counter configuration depends on the HW sub slice
configuration.
Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Robert Bragg [Tue, 13 Jun 2017 11:22:59 +0000 (12:22 +0100)]
drm/i915: expose _SLICE_MASK GETPARM
Enables userspace to determine the maximum number of slices that can
be enabled on the device and also know what specific slices can be
enabled. This information is required, for example, to be able to
analyse some OA counter reports where the counter configuration
depends on the HW slice configuration.
Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Chris Wilson [Thu, 18 May 2017 09:46:18 +0000 (10:46 +0100)]
drm/i915: Reinstate reservation_object zapping for batch_pool objects
I removed the zapping of the reservation_object->fence array of shared
fences prematurely. We don't yet have the code to zap that array when
retiring the object, and so currently it remains possible to continually
grow the shared array trapping requests when reusing the batch_pool
object across many timelines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170518094638.5469-4-chris@chris-wilson.co.uk
Chris Wilson [Fri, 9 Jun 2017 11:03:49 +0000 (12:03 +0100)]
drm/i915: Spin for struct_mutex inside shrinker
Having resolved whether or not we would deadlock upon a call to
mutex_lock(&dev->struct_mutex), we can then spin for the contended
struct_mutex if we are not the owner. We cannot afford to simply block
and wait for the mutex, as the owner may itself be waiting for the
allocator -- i.e. a cyclic deadlock. This should significantly improve
the chance of running the shrinker for other processes whilst the GPU is
busy.
A more balanced approach would be to optimistically spin whilst the
mutex owner was on the cpu and there was an opportunity to acquire the
mutex for ourselves quickly. However, that requires support from
kernel/locking/ and a new mutex_spin_trylock() primitive.
Chris Wilson [Fri, 9 Jun 2017 11:03:48 +0000 (12:03 +0100)]
drm/i915: Only restrict noreclaim in the early shrink passes
In our first pass, we do not want to use reclaim at all as we want to
solely reap the i915 buffer caches (its purgeable pages). But we don't
mind it initiates IO or pulls via the FS (but it shouldn't anyway as we
say no to reclaim!). Just drop the GFP_IO constraint for simplicity.
Chris Wilson [Fri, 9 Jun 2017 11:03:47 +0000 (12:03 +0100)]
drm/i915: Remove __GFP_NORETRY from our buffer allocator
I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim of our dirty buffers and relies on
reclaim via kswapd. As a result, a single pass of direct reclaim is
unreliable when i915 occupies the majority of available memory, and the
only means of effectively waiting on kswapd to amke progress is by not
setting the __GFP_NORETRY flag and lopping. That leaves us with the
dilemma of invoking the oomkiller instead of propagating the allocation
failure back to userspace where it can be handled more gracefully (one
hopes). In the future we may have __GFP_MAYFAIL to allow repeats up until
we genuinely run out of memory and the oomkiller would have been invoked.
Until then, let the oomkiller wreck havoc.
v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
v3: Update comments that direct reclaim only appears to be ignoring our
dirty buffers!
Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Michal Hocko <mhocko@suse.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Fri, 9 Jun 2017 11:03:46 +0000 (12:03 +0100)]
drm/i915: Encourage our shrinker more when our shmemfs allocations fails
Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") made the bold decision to try and
avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
failed after attempting to free enough buffer objects. In short, it
appears we were giving up too easily (even before we start wondering if
one pass of reclaim is as strong as we would like). Part of the problem
is that if we only shrink just enough pages for our expected allocation,
the likelihood of those pages becoming available to us is less than 100%
To counter-act that we ask for twice the number of pages to be made
available. Furthermore, we allow the shrinker to pull pages from the
active list in later passes.
v2: Be a little more cautious in paging out gfx buffers, and leave that
to a more balanced approach from shrink_slab(). Important when combined
with "drm/i915: Start writeback from the shrinker" as anything shrunk is
immediately swapped out and so should be more conservative.
Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk
Kahola, Mika [Fri, 9 Jun 2017 22:26:15 +0000 (15:26 -0700)]
drm/i915/cnl: Enable wrpll computation for CNL
Enable wrpll computation for Cannonlake platform to support
pll's required for HDMI output. The patch contains the following features
- compute Cannonlake port clock programming
dividers P, Q, and K.
- compute PLL parameters for Cannonlake. These parameters
set the values on DPLL registers.
- find the register values to program wrpll for Cannonlake.
The reference clock can be either 19.2MHz or 24MHz.
v2: rebase
v3: squash wrpll patches into one (Rodrigo)
v4: switch order of getting even dividers (Paulo)
update divider register values for PDiv and KDiv (Paulo)
update wrpll computation algorithm (Paulo)
v5: Remove ref clock division by 1000. (Rodrigo)
v6: Rodrigo rebasing on top of latest code.
Clint Taylor [Fri, 9 Jun 2017 22:26:09 +0000 (15:26 -0700)]
drm/i915/cnl: Enable loadgen_select bit for vswing sequence
vswing programming sequence step 2 requires the Loadgen_select bit to
be set in PORT_TX_DW4 lane reigsters per table defined by Bit rate and
lane width. Implemented the change that was marked as FIXME in the
driver.
Rodrigo Vivi [Fri, 9 Jun 2017 22:26:08 +0000 (15:26 -0700)]
drm/i915/cnl: Implement voltage swing sequence.
This is an important part of the DDI initalization as well as
for changing the voltage during DisplayPort link training.
This new sequence for Cannonlake is more like Broxton style
but still with different registers, different table and
different steps.
v2: Do not write to DW4_GRP to avoid overwrite individual loadgen.
Fix PORT_CL_DW5 SUS Clock Config set.
v3: As previous platforms use only eDP table if low voltage was
requested.
v4: fix Werror:maybe uninitialized (Paulo)
v5: Rebase on top of dw2_swing_sel changes
on previous patches.
v6: Using flexible SCALING_MODE_SEL(x).
Rodrigo Vivi [Fri, 9 Jun 2017 22:26:07 +0000 (15:26 -0700)]
drm/i915/cnl: Add DDI Buffer translation tables for Cannonlake.
These tables are used on voltage wswing sequence initialization
on Cannonlake.
It is a complete new format now in use by the voltage swing team,
not following any other standard in use by any other platform.
Also the registers are different as well. So let's redefine
the translation table for Cannonlake.
The table is huge. So we minimized with the fields that are
different or might be different anytime soon. The common
values will be hardcoded on the voltage swing sequence.
v2: Merge the lower and the upper bits to match the spec table
and make review easier. This was possible with the good
idea for Manasi with a better way to handle it on the bit
macro definition presented on previous patch. Credits-to: Manasi Cc: Manasi Navare <manasi.d.navare@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-10-git-send-email-rodrigo.vivi@intel.com
Rodrigo Vivi [Fri, 9 Jun 2017 22:26:06 +0000 (15:26 -0700)]
drm/i915/cnl: Add registers related to voltage swing sequences.
This are the registers and bits needed for the voltage swing
sequence on Cannonlake.
v2: Remove CL_DW5 that was wrongly defined.
v3: Use (1 << 1) instead of (1<<1) as Paulo suggested
Change DW2 swing sel upper and lower macros to do the
bit selection instead of definint a table that doesn't
match the spec. It is based on a Manasi version of it. Credits-to: Manasi.
v4: Let SCALING_MODE_SEL flexible. (Manasi)
Rodrigo Vivi [Fri, 9 Jun 2017 22:26:05 +0000 (15:26 -0700)]
drm/i915: Add MMIO helper for 6 ports with different offsets.
Also new registers can have different mmio offsets
per different lane per port.
v2: Use _PICK as PORT3 instead of creating a new
macro with if per port.
v3: Use _PICK directly on MMIO_PORT6. While MMIO_PORT
isn't flexible enough let's continue with MMIO_PORT6
as we have MMIO_PORT3.
Rodrigo Vivi [Fri, 9 Jun 2017 22:26:04 +0000 (15:26 -0700)]
drm/i915/cnl: Initialize PLLs
Although CNL follows PLL initialization more like Skylake
than Broxton we have a completely different initialization
sequence and registers used.
One big difference from SKL is that CDCLK PLL is now
exclusive (ADPLL) and for DDIs and MIPI we need to use
DFGPLLs 0, 1 or 2.
v2: Accept all Ander's suggestions and fixes:
- Registers and bits names prefix
- Group pll functions
- bits masks fixes
- remove read and modify on cfgcr1
- fix cfgcr0 setup
v3: Set SSC_ENABLE for DP.
Fix HDMI_MODE cfgcr0.
Avoid touch cfgcr0 on DP.
Add missed else on dpll_mgr definition so we use cnl one, not hsw.
v3: Centra freq should be always set to default and change bits
definitions to (1 << 1) instead of (1<<1). (by Paulo)
v4: Rebased.
Rodrigo Vivi [Fri, 9 Jun 2017 22:26:02 +0000 (15:26 -0700)]
drm/i915/cnl: DDI - PLL mapping
One of the steps for PLL (un)initialization is to (un)map
the correspondent DDI that is actually using that PLL.
So, let's do this step following the places already stablished
and used so far, although spec put this as part of PLL
initialization sequences.
v2: Use proper prefix on bits names as suggested by Ander.
v3: Add missed "~". Without that the logic was inverted
so we were disabling interrupts. Credits-to: Clinton Credits-to: Art
v4: Spec is getting updated to do DDI -> PLL mapping
and clock on in 2 separated reg writes. (Paulo)
Also update bits definitions to use space
(1 << 1) instead of (1<<1). (Paulo)
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Art Runyan <arthur.j.runyan@intel.com> Cc: Clint Taylor <clinton.a.taylor@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Kahola, Mika <mika.kahola@intel.com> Cc: Ander Conselvan De Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Kahola, Mika <mika.kahola@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1497047175-27250-5-git-send-email-rodrigo.vivi@intel.com
Rodrigo Vivi [Fri, 9 Jun 2017 22:26:01 +0000 (15:26 -0700)]
drm/i915/cnl: Allow dynamic cdclk changes on CNL
All the low level cdclk bits are present, so let's add the required
hooks to reconfigure cdclk on the fly.
Cannonlake also needs to adjust the minimal pixel rate
as gen9 platforms. Specially for the Azalia audio case.
v2: Rebase due to cnl_sanitize_cdclk()
v3: Rebased by Rodrigo on top of Ville's cdclk rework.
v4: Rebase moving cnl_calc_cdclk up to follow same order
as previous platforms.
v2: Squash drm/i915/cnl: Adjust min pixel rate. to address
the current limitation where CDCLK cannot be set to 168MHz
if audio is used with 96MHz. (Imre)
v3: adjust some of the clock limits within
bdw_adjust_min_pipe_pixel_rate. (Ville/DK/Imre).
Fix commit message messed by squash.
Implement the CNL display init/uninit sequence as outlined in Bspec.
Quite similar to SKL/BXT. The main complicaiton is probably the extra
procmon setup we must do based on the process/voltage information we
can read out from some register.
v2: s/skl_dbuf/gen9_dbuf/ to follow upstream
bxt needed a cdclk sanitize step, so let's add it for cnl too
v3: s/CHICKEN_MISC_1/CHICKEN_MISC_2/ (Ander)
v4: Rebased by Rodrigo after Ville's cdclk rework
v5: Removed unecessary Aux IO forced enable/disable, Fix DW10 setup
Fix procpon Mask. (Credits-to Paulo and Clint)
Remove A0 workaround.
v6: Rebased on top of recent code (Rodrigo).
v7: Respect the order of sanitize_ after set_
(Done by Rodrigo, Requested by Ville)
v8: Commit message updated to matvh v5 changes besides
Remove unused DW8 and an extra blank line. (all noticed
by Imre).
v9: Remove __attribute__((unused)) added on latest version
of drm/i915/cnl: Implement .set_cdclk() for CNL.
Ville Syrjälä [Fri, 9 Jun 2017 22:25:59 +0000 (15:25 -0700)]
drm/i915/cnl: Implement .set_cdclk() for CNL
Add support for changing the cdclk frequency on CNL. Again, quite
similar to BXT, but there are some annoying differences which means
trying to share more code might not be feasible:
* PLL ratio now lives in the PLL enable register
* pcode came from SKL, not from BXT
We support three cdclk frequencies: 168,336,528 Mhz. The first two
use the same PLL frequency, the last one uses a different one meaning
we once again may need to toggle the PLL off and on when changing
cdclk.
v2: Rebased by Rodrigo on top of Ville's cdclk rework.
v3: Respect order of set_ bellow get_ (Ville)
v4: Added __attribute__((unused)) to avoid broken compilation with Werror.
Ville Syrjälä [Fri, 9 Jun 2017 22:25:58 +0000 (15:25 -0700)]
drm/i915/cnl: Implement .get_display_clock_speed() for CNL
Add support for reading out the cdclk frequency from the hardware on
CNL. Very similar to BXT, with a few new twists and turns:
* the PLL is now called CDCLK PLL, not DE PLL
* reference clock can be 24 MHz in addition to the 19.2 MHz BXT had
* the ratio now lives in the PLL enable register
* Only 1x and 2x CD2X dividers are supported
v2: Deal with PLL lock bit the same way as BXT/SKL do now
v3: DSSM refclk indicator is bit 31 not 24 (Ander)
v4: Rebased by Rodrigo after Ville's cdclk rework.
v5: Set cdclk to the ref clock as previous platforms. (Imre)
drm/i915: Pass atomic state to backlight enable/disable/set callbacks.
Pass crtc_state to the enable callback, and connector_state to all callbacks.
This will eliminate the need to guess for the correct pipe in these
callbacks.
The crtc state is required for pch_enable_backlight to obtain the correct
cpu_transcoder.
intel_dp_aux_backlight's setup function is called before hw readout, so
crtc_state and connector_state->best_encoder are NULL in the enable()
and set() callbacks.
drm/i915: Pass crtc_state and connector state to backlight enable/disable functions
The backlight functions need to determine the pipe and the transcoder the
backlight will be enabled on, so pass crtc_state instead of trying to
dereference the state without holding locks.
Zhenyu Wang [Fri, 9 Jun 2017 07:48:05 +0000 (15:48 +0800)]
drm/i915: Fix GVT-g PVINFO version compatibility check
Current it's strictly checked if PVINFO version matches 1.0
for GVT-g i915 guest which doesn't help for compatibility at
all and forces GVT-g host can't extend PVINFO easily with version
bump for real compatibility check.
This fixes that to check minimal required PVINFO version instead.
v2:
- drop unneeded version macro
- use only major version for sanity check
v3:
- fix up PVInfo value with kernel type
- one indent fix
Geminilake is now included in CI, making it part of the pre-merge
criteria. The support should be in good enough shape, so let's remove
the alpha_support flag.
Rodrigo Vivi [Thu, 8 Jun 2017 15:50:00 +0000 (08:50 -0700)]
drm/i915/cfl: Introduce Display workarounds for Coffee Lake.
The whole Display engine for Coffee Lake is pretty much
identical to the Kabylake. For this reason let's reuse
all display related production workardounds here even though
CFL is not explicit listed at Display workarounds page at Spec.
v2: moved intel_pm.c chunck to this patch in order to address
all display related w/a in a single place.
Rodrigo Vivi [Thu, 8 Jun 2017 15:49:58 +0000 (08:49 -0700)]
drm/i915/cfl: Introduce Coffee Lake platform definition.
Coffee Lake is a Intel® Processor containing Intel® HD Graphics
following Kabylake.
It is Gen9 graphics based platform on top of CNP PCH.
Let's start by adding the platform definition based on previous
platforms but yet as preliminary_hw_support.
On following patches we will start adding PCI IDs and the
platform specific changes.
v2: Also add BS2 ring that is present on GT3. As on KBL, according
spec: "GT3 also has additional media blocks with second instance
of VEBox and VDBox each", i.e. BSD2 ring in our case. Noticed
when reviewing PCI ID patches.
v3: CFL_PLATFORM instead for CFL_FEATURES because it contains
Platform information and no new features when compared to
BDW_FEATURES definition.
Chris Wilson [Thu, 8 Jun 2017 11:14:05 +0000 (12:14 +0100)]
drm/i915: Remove the spin-request during execbuf await_request
Originally we would enable and disable the breadcrumb interrupt
immediately on demand. This was slow enough to have a large impact
(>30%) on tasks that hopped between engines. However, by using a shadow
to keep the irq alive for an extra interrupt (see commit 67b807a89230
("drm/i915: Delay disabling the user interrupt for breadcrumbs")) and
by recently reducing the cost in adding ourselves to the signal tree, we
no longer need to spin-request during await_request to avoid delays in
throughput tests. Without the earlier patches to stop the wakeup when
signaling if the irq was already active, we saw no improvement in
execbuf overhead (and corresponding contention in other clients) despite
the removal of the spinner in a simple test like glxgears. This means
there will be scenarios where now we spend longer enabling the interrupt
than we would have spent spinning, but these are not likely to have as
noticeable an impact as the high frequency test cases (where there
should not be any regression).
Ulterior motive: generalising the engine->sync_to to handle different
types of semaphores and non-semaphores.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170608111405.16466-4-chris@chris-wilson.co.uk
Chris Wilson [Thu, 8 Jun 2017 11:14:04 +0000 (12:14 +0100)]
drm/i915: Skip adding the request to the signal tree is complete
Enabling the interrupt for the signaler takes a finite amount of time (a
few microseconds) during which it is possible for the request to
complete. Check afterwards and skip adding the request to the signal
rbtree if it complete.
Chris Wilson [Thu, 8 Jun 2017 11:14:03 +0000 (12:14 +0100)]
drm/i915: Report back whether the irq was armed when adding the waiter
The important condition that we need to check after enabling the
interrupt for signaling is whether the request completed in the process
(and so we missed that interrupt). A large cost in enabling the
signaling (rather than waiters) is in waking up the auxiliary signaling
thread, but we only need to do so to catch that missed interrupt. If we
know we didn't miss any interrupts (because we didn't arm the interrupt)
then we can skip waking the auxiliary thread.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170608111405.16466-2-chris@chris-wilson.co.uk
Chris Wilson [Thu, 8 Jun 2017 11:14:02 +0000 (12:14 +0100)]
drm/i915: Check signaled state after enabling signaling
Setting up the irq to signal the request completion takes a finite
amount of time, during which it is possible that the request already
completed. Check afterwards, just in case, so that we can respond
immediately.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170608111405.16466-1-chris@chris-wilson.co.uk
Michel Thierry [Mon, 5 Jun 2017 17:12:51 +0000 (10:12 -0700)]
drm/i915/guc: Clear enable_guc_loading in case of init failure
And prevent calling i915_ggtt_disable_guc twice (the first when GuC init
failed, and the second time during driver unload / intel_uc_fini_hw),
and hitting the GEM_BUG_ON.
v2: Clear enable_guc_loading unconditionally (Michal)
Make sure guc_free_load_err_log is still called (Daniele)
Don't shoot the messenger (Chris)
Fixes: 3950bf3dbff10 ("drm/i915/guc: Add onion teardown to the GuC
setup") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170605171251.9905-1-michel.thierry@intel.com
Chris Wilson [Thu, 4 May 2017 11:55:08 +0000 (12:55 +0100)]
drm/i915: Move the unclaimed mmio detection into the powerwell for KMS
Replace the large comment about requiring the powerwell for
intel_uncore_arm_unclaimed_mmio_detection() by moving the arming of the
mmio error detection into the powerwell held for modesetting. Thereby
also accomplishing the goal of only arming the mmio detection after a
full modeset.
Rodrigo Vivi [Tue, 6 Jun 2017 16:06:06 +0000 (09:06 -0700)]
drm/i915: Unify GT* and GT3 definitions
This patch clean up a bit the platform definition block in
a way to avoid duplications and to let clear that GT3 for
the current platform only have the extra Media engine (BSD2).
v2: Kabylake IS_KABYLAKE as Anusha noticed.
v3: Avoid EXTRA_ENGINE_MASK and list rings out on GT3 to
make it more clear.
Rodrigo Vivi [Tue, 6 Jun 2017 20:30:40 +0000 (13:30 -0700)]
drm/i915/cnl: Also need power well sanitize.
The workaround added in
commit c6782b76d31a ("drm/i915/gen9: Reset secondary power well
equests left on by DMC/KVMR")
needs to be applied on Cannonlake as well.
So let's assume any platform using this power well setup
will also need and let's just go ahead and remove if condition.
Ville Syrjälä [Tue, 6 Jun 2017 20:30:39 +0000 (13:30 -0700)]
drm/i915/cnl: Add power wells for CNL
CNL power wells are very similar to SKL, with the exception that the
misc IO well has been split into separate AUX IO wells.
Not sure if DMC is supposed to manage the AUX wells for us or not.
Let's assume so for now.
v2: DDI A power well wants DDI A domains, not DDI B domains
v3: s/BIT/BIT_ULL and add proper Aux IO domains. (Rodrigo)
v4: Remove PW_DDI_E. Not supported on Current CNL SKUs. (Rodrigo).
v5: Removed DDI_E_IO_DOMAINS and moved PORT_DDI_E_IO to DDI_A_IO
for the same reasons as v4 when we found out that current CNL
SKUs don't have the full port E split.
Rodrigo Vivi [Tue, 6 Jun 2017 20:30:36 +0000 (13:30 -0700)]
drm/i915/cnl: Configure EU slice power gating.
Cannonlake also supports slice power gating on devices with more
than one slice as SKL. Let's assume that this is the same for SKL+
and exclude BXT only.
Rodrigo Vivi [Tue, 6 Jun 2017 20:30:32 +0000 (13:30 -0700)]
drm/i915/cnl: Add Cannonlake PCI IDs for U-skus.
Platform enabling and its power-on are organized in different
skus (U x Y x S x H, etc). So instead of organizing it in
GT1 x GT2 x GT3 let's also use the platform sku.
This is also the new Spec style what makes the review much
more easy and straightforward.
v2: Really include the PCI IDs to the picidlist[];
v3: Remove PCI IDs not present in spec.
v4: Rebase.
Ville Syrjälä [Fri, 31 Mar 2017 18:00:56 +0000 (21:00 +0300)]
drm/i915: Fix 90/270 rotated coordinates for FBC
The clipped src coordinates have already been rotated by 270 degrees for
when the plane rotation is 90/270 degrees, hence the FBC code should no
longer swap the width and height.
Cc: stable@vger.kernel.org Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Fixes: b63a16f6cd89 ("drm/i915: Compute display surface offset in the plane check hook for SKL+") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170331180056.14086-4-ville.syrjala@linux.intel.com Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Ville Syrjälä [Fri, 31 Mar 2017 18:00:55 +0000 (21:00 +0300)]
drm/i915: Fix SKL+ watermarks for 90/270 rotation
skl_check_plane_surface() already rotates the clipped plane source
coordinates to match the scanout direction because that's the way
the GTT mapping is set up. Thus we no longer need to rotate the
coordinates in the watermark code.
For cursors we use the non-clipped coordinates which are not rotated
appropriately, but that doesn't actually matter since cursors don't
even support 90/270 degree rotation.
v2: Resolve conflicts from SKL+ wm rework
Cc: stable@vger.kernel.org Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: b63a16f6cd89 ("drm/i915: Compute display surface offset in the plane check hook for SKL+") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170331180056.14086-3-ville.syrjala@linux.intel.com Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Ville Syrjälä [Fri, 31 Mar 2017 18:00:54 +0000 (21:00 +0300)]
drm/i915: Fix scaling check for 90/270 degree plane rotation
Starting from commit b63a16f6cd89 ("drm/i915: Compute display surface
offset in the plane check hook for SKL+") we've already rotated the src
coordinates by 270 degrees by the time we check if a scaler is needed
or not, so we must not account for the rotation a second time.
Previously we did these steps in the opposite order and hence the
scaler check had to deal with rotation itself. The double rotation
handling causes us to enable a scaler pretty much every time 90/270
degree plane rotation is requested, leading to fuzzier fonts and whatnot.
v2: s/unsigned/unsigned int/ to appease checkpatch
v3: s/DRM_ROTATE_0/DRM_MODE_ROTATE_0/
Cc: stable@vger.kernel.org Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: b63a16f6cd89 ("drm/i915: Compute display surface offset in the plane check hook for SKL+") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170331180056.14086-2-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Ville Syrjälä [Tue, 6 Jun 2017 12:43:18 +0000 (15:43 +0300)]
drm/i915: Implement fbc_status "Compressing" info for all platforms
The number of compressed segments has been available ever since
FBC2 was introduced in g4x, it just moved from the STATUS register
into STATUS2 on IVB.
For FBC1 if we really wanted the number of compressed segments we'd
have to trawl through the tags, but in this case since the code just
uses the number of compressed segments as an indicator whether
compression has occurred we can just check the state of the
COMPRESSING and COMPRESSED bits. IIRC the hardware will try to
periodically recompress all uncompressed lines even if they haven't
changed and the COMPRESSED bit will be cleared while the compressor
is running, so just checking the COMPRESSED bit might not give us
the right answer. Hence it seems better to check for both
COMPRESSED and COMPRESSING as that should tell us that the
compressor is at least trying to do something.
While at it move the IVB+ register define to the right place, unify
the naming convention of the compressed segment count masks, and
fix up the mask for g4x.
v2: s/ILK_DPFC_STATUS2/IVB_FBC_STATUS2/ (Paulo)
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> # SNB Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> # ilk+ Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com> # pre-ilk Link: http://patchwork.freedesktop.org/patch/msgid/20170606124318.31755-1-ville.syrjala@linux.intel.com
The scanline counter is bonkers on VLV/CHV DSI. The scanline counter
increment is not lined up with the start of vblank like it is on
every other platform and output type. This causes problems for
both the vblank timestamping and atomic update vblank evasion.
On my FFRD8 machine at least, the scanline counter increment
happens about 1/3 of a scanline ahead of the start of vblank (which
is where all register latching happens still). That means we can't
trust the scanline counter to tell us whether we're in vblank or not
while we're on that particular line. In order to keep vblank
timestamping in working condition when called from the vblank irq,
we'll leave scanline_offset at one, which means that the entire
line containing the start of vblank is considered to be inside
the vblank.
For the vblank evasion we'll need to consider that entire line
to be bad, since we can't tell whether the registers already
got latched or not. And we can't actually use the start of vblank
interrupt to get us past that line as the interrupt would fire
too soon, and then we'd up waiting for the next start of vblank
instead. One way around that would using the frame start
interrupt instead since that wouldn't fire until the next
scanline, but that would require some bigger changes in the
interrupt code. So for simplicity we'll just poll until we get
past the bad line.
v2: Adjust the comments a bit
Cc: stable@vger.kernel.org Cc: Jonas Aaberg <cja@gmx.net> Tested-by: Jonas Aaberg <cja@gmx.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99086 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161215174734.28779-1-ville.syrjala@linux.intel.com Tested-by: Mika Kahola <mika.kahola@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Ville Syrjälä [Thu, 1 Jun 2017 18:30:43 +0000 (21:30 +0300)]
drm/i915: Remove dead code from runtime resume handler
Remove the SNB PCH refclock init call from the runtime resume handler.
I don't think it was actually needed even when we had SNB runtime PM,
and if definitely isn't needed ever since SNB runtime PM was nuked in
commit d4c5636e7447 ("drm/i915: Remove runtime PM for SNB").
Rodrigo Vivi [Fri, 2 Jun 2017 20:06:43 +0000 (13:06 -0700)]
drm/i915/cnp: add CNP gmbus support
On CNP PCH based platforms the gmbus is on the south display that
is on PCH. The existing implementation for previous platforms
already covers the need for CNP expect for the pin pair configuration
that follows similar definitions that we had on BXT.
v2: Don't drop "_BXT" as the indicator of the first platform
supporting this pin numbers. Suggested by Daniel.
v3: Add missing else and fix register table since CNP GPIO_CTL
starts on 0xC5014.
v4: Fix pin number and map according to the current available VBT.
Re-add pin 4 for port D. Lost during some rebase.
v5: Use table as spec. If VBT is wrong it should be ignored.
Rodrigo Vivi [Fri, 2 Jun 2017 20:06:42 +0000 (13:06 -0700)]
drm/i915/cnp: Backlight support for CNP.
Split out BXT and CNP's setup_backlight(),enable_backlight(),
disable_backlight() and hz_to_pwm() into
two separate functions instead of reusing BXT function.
Reuse set_backlight() and get_backlight() since they have
no reference to the utility pin.
v2: Reuse BXT functions with controller 0 instead of
redefining it. (Jani).
Use dev_priv->rawclk_freq instead of getting the value
from SFUSE_STRAP.
v3: Avoid setup backligh controller along with hooks and
fully reuse hooks setup as suggested by Jani.
v4: Clean up commit message.
v5: Implement per PCH instead per platform.
v6: Introduce a new function for CNP.(Jani and Ville)
v7: Squash the all CNP Backlight support patches into a
single patch. (Jani)
v8: Correct indentation, remove unneeded blank lines and
correct mail address (Jani).
v9: Remove unused enum pipe. (by CI)
v10: Remove comment mentioning SFUSE_STRAP in a part of
the code that we don't use it. (Jani)
Make controller = 0 since current CNP has only one
controller and put a comment mentioning why we
reuse the BXT definitions and are keeping the
controller = 0. (DK)
v11: Remove spurious line. (DK)
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Suggested-by: Jani Nikula <jani.nikula@intel.com> Suggested-by: Ville Syrjala <ville.syrjala@intel.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1496434004-29812-4-git-send-email-rodrigo.vivi@intel.com
Rodrigo Vivi [Fri, 2 Jun 2017 20:06:41 +0000 (13:06 -0700)]
drm/i915/cnp: Get/set proper Raw clock frequency on CNP.
RAWCLK_FREQ register has changed for platforms with CNP+.
[29:26] This field provides the denominator for the fractional
part of the microsecond counter divider. The numerator
is fixed at 1. Program this field to the denominator of
the fractional portion of reference frequency minus one.
If the fraction is 0, program to 0.
0100b = Fraction .2 MHz = Fraction 1/5.
0000b = Fraction .0 MHz.
[25:16] This field provides the integer part of the microsecond
counter divider. Program this field to the integer portion
of the reference frequenct minus one.
Also this register tells us that proper raw clock should be read
from SFUSE_STRAP and programmed to this register. Up to this point
on other platforms we are reading instead of programming it so
probably relying on whatever BIOS had configured here.
Now on let's follow the spec and also program this register
fetching the right value from SFUSE_STRAP as Spec tells us to do.
v2: Read from SFUSE_STRAP and Program RAWCLK_FREQ instead of
reading the value relying someone else will program that
for us.
v3: Add missing else. (Jani)
v4: Addressing all Ville's catches:
Use macro for shift bits instead of defining shift.
Remove shift from the cleaning bits with mask that already
has it.
Add missing I915_WRITE to actually write the reg.
Stop using useless DIV_ROUND_* on divider that is exact
dividion and use DIV_ROUND_CLOSEST for the fraction part.
v5: Remove useless Read-Modify-Write on raclk_freq reg. (Ville).
v6: Change is per PCH instead of per platform.
The first two bytes of PCI ID for CNP_LP PCH are the same as that of
SPT_LP. We should really be looking at the first 9 bits instead of the
first 8 to identify platforms, although this seems to have not caused any
problems on earlier platforms. Introduce a 9 bit extended mask for SPT and
CNP while not touching the code for any of the other platforms.
v2: (Rodrigo) Make platform agnostic and fix commit message.
Rodrigo Vivi [Fri, 2 Jun 2017 20:06:39 +0000 (13:06 -0700)]
drm/i915/cnp: Introduce Cannonpoint PCH.
Most of south engine display that is in PCH is still the
same as SPT and KBP, except for this key differences:
- Backlight: Backlight programming changed in CNP PCH.
- Panel Power: Sligh programming changed in CNP PCH.
- GMBUS and GPIO: The pin mapping has changed in CNP PCH.
All of these changes follow more the BXT style.
v2: Update definition to use dev_priv isntead of dev (Tvrtko).
Chris Wilson [Thu, 1 Jun 2017 13:33:29 +0000 (14:33 +0100)]
drm/i915: Allow kswapd to pause the device whilst reaping
In commit 5763ff04dc4e ("drm/i915: Avoid GPU stalls from kswapd") we
stopped direct reclaim and kswapd from triggering GPU/client stalls
whilst running (by restricting the objects they could reap to be idle).
However with abusive GPU usage, it becomes quite easy to starve kswapd
of memory and prevent it from making forward progress towards obtaining
enough free memory (thus driving the system closer to swap exhaustion).
Relax the previous restriction to allow kswapd (but not direct reclaim)
to stall the device whilst reaping purgeable pages.
v2: Also acquire the rpm wakelock to allow kswapd to unbind buffers.
Weinan Li [Wed, 31 May 2017 02:35:52 +0000 (10:35 +0800)]
drm/i915: return the correct usable aperture size under gvt environment
I915_GEM_GET_APERTURE ioctl is used to probe aperture size from userspace.
In gvt environment, each vm only use the ballooned part of aperture, so we
should return the correct available aperture size exclude the reserved part
by balloon.
v2: add 'reserved' in struct i915_address_space to record the reserved size
in ggtt (Chris)
v3: remain aper_size as total, adjust aper_available_size exclude reserved
and pinned. UMD driver need to adjust the max allocation size according to
the available aperture size but not total size. KMD return the correct
usable aperture size any time (Chris, Joonas)
v4: decrease reserved in deballoon (Joonas)
v5: add onion teardown in balloon, add vgt_deballoon_space (Joonas)
v6: change title name (Zhenyu)
v7: code style refine (Joonas)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Weinan Li <weinan.z.li@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1496198152-14175-1-git-send-email-weinan.z.li@intel.com Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Sun, 21 May 2017 12:40:14 +0000 (13:40 +0100)]
drm/i915: Fix logical inversion for gen4 quirking
The assertion that we want to make before disabling the pin of the pages
for the unknown swizzling quirk is that the quirk is indeed active, and
that the quirk is disabled before we do apply it to the pages.
Fixes: 2c3a3f44dc13 ("drm/i915: Fix pages pin counting around swizzle quirk") Fixes: 957870f93412 ("drm/i915: Split out i915_gem_object_set_tiling()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170521124014.27678-1-chris@chris-wilson.co.uk
Reviewed-bhy: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Tue, 30 May 2017 12:13:34 +0000 (13:13 +0100)]
drm/i915: Check the ring is empty when declaring the engines are idle
As another precaution when testing whether the CS engine is actually
idle, also inspect the ring's HEAD/TAIL registers, which should be equal
when there are no commands left to execute by the GPU.
Chris Wilson [Thu, 1 Jun 2017 09:04:46 +0000 (10:04 +0100)]
drm/i915/guc: Assert that we switch between known ggtt->invalidate functions
When we enable the GuC, we enable an alternative mechanism for doing
post-GGTT update invalidation. Likewise, when we disable the GuC, we
restore the previous method. Assert that we change between known
endpoints, so that we can catch if we accidentally clobber some other
gen and if we change the invalidate routine without updating guc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170601090446.1334-1-chris@chris-wilson.co.uk Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Jani Nikula [Wed, 31 May 2017 10:16:31 +0000 (13:16 +0300)]
drm/i915/dvo: fix debug logging on unknown DID
Print DID not VID on the DID error path. Looks like a copy-paste error
from the VID error path. Clarify and clean up error logging, making them
distinguishable from each other, while at it.
Imre Deak [Wed, 31 May 2017 17:05:35 +0000 (20:05 +0300)]
drm/i915/ddi: Avoid long delays during system suspend / eDP disabling
Atm disabling either DP or eDP outputs can generate a spurious short
pulse interrupt. The reason is that after disabling the port the source
will stop sending a valid stream data, while the sink expects either a
valid stream or the idle pattern. Since neither of this is sent the sink
assumes (after an arbitrary delay) that the link is lost and requests
for link retraining with a short pulse.
The spurious pulse is a real problem at least for eDP panels with long
power-off / power-cycle delays: as part of disabling the output we
disable the panel power. The subsequent spurious short pulse handling
will have to turn the power back on, which means the driver has to do a
redundant wait for the power-off and power-cycle delays. During system
suspend this leads to an unnecessary delay up to ~1s on systems with
such panels as reported by Rui.
To fix this put the sink to DPMS D3 state before turning off the port.
According to the DP spec in this state the sink should not request
retraining. This is also what we do already on pre-ddi platforms.
As an alternative I also tried configuring the port to send idle pattern
- which is against BSPec - and leave the port in normal mode before
turning off the port. Neither of these resolved the problem.
Cc: Zhang Rui <rui.zhang@intel.com> Cc: David Weinehall <david.weinehall@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reported-and-tested-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1496250335-7627-1-git-send-email-imre.deak@intel.com
Chris Wilson [Wed, 31 May 2017 19:05:14 +0000 (20:05 +0100)]
drm/i915: Guard against i915_ggtt_disable_guc() being invoked unconditionally
Commit 7c3f86b6dc51 ("drm/i915: Invalidate the guc ggtt TLB upon
insertion") added the restoration of the invalidation routine after the
GuC was disabled, but missed that the GuC was unconditionally disabled
when not used. This then overwrites the invalidate routine for the older
chipsets, causing havoc and breaking resume as the most obvious victim.
We place the guard inside i915_ggtt_disable_guc() to be backport
friendly (the bug was introduced into v4.11) but it would be preferred
to be in more control over when this was guard (i.e. do not try and
teardown the data structures before we have enabled them). That should
be true with the reorganisation of the guc loaders.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 7c3f86b6dc51 ("drm/i915: Invalidate the guc ggtt TLB upon insertion") Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: <stable@vger.kernel.org> # v4.11+ Link: http://patchwork.freedesktop.org/patch/msgid/20170531190514.3691-1-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Mahesh Kumar [Fri, 26 May 2017 15:15:46 +0000 (20:45 +0530)]
drm/i915/skl+: consider max supported plane pixel rate while scaling
A display resolution is only supported if it meets all the restrictions
below for Maximum Pipe Pixel Rate.
The display resolution must fit within the maximum pixel rate output
from the pipe. Make sure that the display pipe is able to feed pixels at
a rate required to support the desired resolution.
For each enabled plane on the pipe {
If plane scaling enabled {
Horizontal down scale amount = Maximum[1, plane horizontal size /
scaler horizontal window size]
Vertical down scale amount = Maximum[1, plane vertical size /
scaler vertical window size]
Plane down scale amount = Horizontal down scale amount *
Vertical down scale amount
Plane Ratio = 1 / Plane down scale amount
}
Else {
Plane Ratio = 1
}
If plane source pixel format is 64 bits per pixel {
Plane Ratio = Plane Ratio * 8/9
}
}
Pipe Ratio = Minimum Plane Ratio of all enabled planes on the pipe
If pipe scaling is enabled {
Horizontal down scale amount = Maximum[1, pipe horizontal source size /
scaler horizontal window size]
Vertical down scale amount = Maximum[1, pipe vertical source size /
scaler vertical window size]
Note: The progressive fetch - interlace display mode is equivalent to a
2.0 vertical down scale
Pipe down scale amount = Horizontal down scale amount *
Vertical down scale amount
Pipe Ratio = Pipe Ratio / Pipe down scale amount
}
Pipe maximum pixel rate = CDCLK frequency * Pipe Ratio
In this patch our calculation is based on pipe downscale amount
(plane max downscale amount * pipe downscale amount) instead of Pipe
Ratio. So,
max supported crtc clock with given scaling = CDCLK / pipe downscale.
Flip will fail if,
current crtc clock > max supported crct clock with given scaling.
Changes since V1:
- separate out fixed_16_16 wrapper API definition
Changes since V2:
- Fix buggy crtc !active condition (Maarten)
- use intel_wm_plane_visible wrapper as per Maarten's suggestion
Changes since V3:
- Change failure return from ERANGE to EINVAL
Changes since V4:
- Rebase based on previous patch changes
Changes since V5:
- return EINVAL instead of continue (Maarten)
Changes since V6:
- Improve commit message
- Address review comment
Changes since V7:
- use !enable instead of !active
- rename config variable for consistency (Maarten)
Kumar, Mahesh [Thu, 1 Jun 2017 05:59:18 +0000 (11:29 +0530)]
drm/i915/skl: New ddb allocation algorithm
This patch implements new DDB allocation algorithm as per HW team
recommendation. This algo takecare of scenario where we allocate less DDB
for the planes with lower relative pixel rate, but they require more DDB
to work.
It also takes care of enabling same watermark level for each
plane in crtc, for efficient power saving.
Changes since v1:
- Rebase on top of Paulo's patch series
Changes since v2:
- Fix the for loop condition to enable WM
Changes since v3:
- Fix crash in cursor i-g-t reported by Maarten
- Rebase after addressing Paulo's comments
- Few other ULT fixes
Changes since v4:
- Rebase on drm-tip
- Added separate function to enable WM levels
Changes since v5:
- Fix a crash identified in skl-6770HQ system
Changes since v6:
- Address review comments from Matt
Changes since v7:
- Fix failure return in skl_compute_plane_wm (Matt)
- fix typo
Changes since v8:
- Always check cursor wm enable irrespective of total_data_rate
Changes since v9:
- fix typo
drm/i915: Always recompute watermarks when distrust_bios_wm is set, v2.
On some systems there can be a race condition in which no crtc state is
added to the first atomic commit. This results in all crtc's having a
null DDB allocation, causing a FIFO underrun on any update until the
first modeset.
Changes since v1:
- Do not take the connection_mutex, this is already done below.
PCI / PM: Avoid resuming PCI devices during system suspend
PCI devices will default to allowing the system suspend complete
optimization where devices are not woken up during system suspend if
they were already runtime suspended. This however breaks the i915/HDA
drivers for two reasons:
- The i915 driver has system suspend specific steps that it needs to
run, that bring the device to a different state than its runtime
suspended state.
- The HDA driver's suspend handler requires power that it will request
from the i915 driver's power domain handler. This in turn requires the
i915 driver to runtime resume itself, but this won't be possible if the
suspend complete optimization is in effect: in this case the i915
runtime PM is disabled and trying to get an RPM reference returns
-EACCESS.
Solve this by requiring the PCI/PM core to resume the device during
system suspend which in effect disables the suspend complete optimization.
Regardless of the above commit the optimization stayed disabled for DRM
devices until
so this patch is in practice a fix for this commit. Another reason for
the bug staying hidden for so long is that the optimization for a device
is disabled if it's disabled for any of its children devices. i915 may
have a backlight device as its child which doesn't support runtime PM
and so doesn't allow the optimization either. So if this backlight
device got registered the bug stayed hidden.
Credits to Marta, Tomi and David who enabled pstore logging,
that caught one instance of this issue across a suspend/
resume-to-ram and Ville who rememberd that the optimization was enabled
for some devices at one point.