Chris Wilson [Thu, 18 May 2017 20:48:11 +0000 (21:48 +0100)]
drm/i915: Reorder media/render reset on g4x
Ville found a reference to WaMediaResetBeforeFullReset which we presume
means that we should simply do the media reset first.
References: https://bugs.freedesktop.org/show_bug.cgi?id=100942 Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170518204811.7408-2-chris@chris-wilson.co.uk
Hans de Goede [Thu, 18 May 2017 11:06:44 +0000 (13:06 +0200)]
drm/i915: Fix new -Wint-in-bool-context gcc compiler warning
This commit fixes the following compiler warning:
drivers/gpu/drm/i915/intel_dsi.c: In function ‘intel_dsi_prepare’:
drivers/gpu/drm/i915/intel_dsi.c:1487:23: warning:
?: using integer constants in boolean context [-Wint-in-bool-context]
PORT_A ? PORT_C : PORT_A),
Kumar, Mahesh [Wed, 17 May 2017 11:58:29 +0000 (17:28 +0530)]
drm/i915/skl+: use linetime latency if ddb size is not available
This patch make changes to use linetime latency if allocated
DDB size during plane watermark calculation is not available.
linetime is the time, display engine takes to fetch one line worth of
pixels with given pixel clock rate.
This is required to implement new DDB allocation algorithm.
In New Algorithm DDB is allocated based on WM values, because of which
number of DDB blocks will not be available during WM calculation,
So this "linetime latency" is suggested by SV/HW team to be used during
switch-case for WM blocks selection.
linetime latency us = pipe horizontal total pixels/adjusted pixel rate MHz
Changes since v1:
- Rebase on top of Paulo's patch series
Changes since v2:
- Fix if-else condition (pointed by Maarten)
Changes since v3:
- Use common function for timetime_us calculation (Paulo)
- rebase on drm-tip
Changes since v4:
- Use consistent name for fixed_point operation
Changes since v5:
- Improve commit message
- rename skl_get_linetime_us to intel_get_linetime_us
- fix watermark result selection (Matt)
Kumar, Mahesh [Wed, 17 May 2017 11:58:28 +0000 (17:28 +0530)]
drm/i915/skl+: Perform wm level calculations in separate function
Instead of iterating over planes & wm levels in a single function use
skl_compute_wm_level function to interate over WM levels.
Change name of function to skl_compute_wm_levels (Matt).
These changes are to clean-up WM code & will help in making only new
ddb algorithm related changes in later patch in series.
Kumar, Mahesh [Wed, 17 May 2017 11:58:27 +0000 (17:28 +0530)]
drm/i915/skl+: Watermark calculation cleanup
This patch cleanup/reorganises the watermark calculation functions.
This patch make use of already available macro
"drm_atomic_crtc_state_for_each_plane_state" to walk through
plane_state list instead of calculating plane_state in function itself.
This restructuring will help later patch for new DDB allocation
algorithm to do only algo related changes.
Changes from V1:
- split the patch in two parts as per Matt's comment
Kumar, Mahesh [Wed, 17 May 2017 11:58:26 +0000 (17:28 +0530)]
drm/i915/skl+: Fail the flip if ddb min requirement exceeds pipe allocation
DDB minimum requirement of crtc configuration (cumulative of all the
enabled planes in crtc) may exceed the allocated DDB for crtc/pipe.
This patch make changes to fail the flip/ioctl if minimum requirement
for pipe exceeds the total ddb allocated to the pipe.
Previously it succeeded but making alloc_size a negative value. Which
will make subsequent calculations for plane ddb allocation bogus & may
lead to screen corruption or system hang.
Changes from V1:
- Improve commit message as per Ander's comment
- Remove extra parentheses (Ander)
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517115831.13830-8-mahesh1.kumar@intel.com
Kumar, Mahesh [Wed, 17 May 2017 11:58:23 +0000 (17:28 +0530)]
drm/i915/skl+: calculate pixel_rate & relative_data_rate in fixed point
This patch make changes to calculate adjusted plane pixel rate &
plane downscale amount using fixed_point functions available.
This patch will give uniformity in code, & will help to avoid mixing of
32bit uint32_t variable for fixed-16.16 with fixed_16_16_t variables in
later patch in the series.
Changes from V1:
- Rebase based on wrapper name change
- Remove unnecessary comment
Kumar, Mahesh [Wed, 17 May 2017 11:58:21 +0000 (17:28 +0530)]
drm/i915: Add more wrapper for fixed_point_16_16 operations
This patch adds few wrapper to perform fixed_point_16_16 operations
mul_round_up_u32_fixed16 : Multiplies u32 and fixed_16_16_t variables
& returns u32 result with rounding-up.
mul_fixed16 : Multiplies two fixed_16_16_t variable & returns fixed_16_16
div_round_up_fixed16 : Perform division operation on fixed_16_16_t
variables & return u32 result with round-off
div_round_up_u32_fixed16 : devide uint32_t variable by fixed_16_16 variable
and round_up the result to uint32_t.
These wrappers will be used by later patches in the series.
Changes from V1:
- Rename wrapper as per Matt's comment
Changes from V2:
- Fix indentation
Kumar, Mahesh [Wed, 17 May 2017 11:58:20 +0000 (17:28 +0530)]
drm/i915: fix naming of fixed_16_16 wrapper.
fixed_16_16_div_round_up(_u64), wrapper for fixed_16_16 division
operation don't really round_up the result. Wrapper round_up only the
fraction part of the result to make it 16-bit.
This patch eliminates round_up keyword from the wrapper.
Later patch will introduce the new wrapper to do rounding-off the result
and give unt32_t output to cleanup mix use of fixed_16_16_t & uint32_t
variables.
Chris Wilson [Wed, 17 May 2017 12:10:07 +0000 (13:10 +0100)]
drm/i915: Don't force serialisation on marking up execlists irq posted
Since we coordinate with the execlists tasklet using a locked schedule
operation that ensures that after we set the engine->irq_posted we
always have an invocation of the tasklet, we do not need to use a locked
operation to set the engine->irq_posted itself.
Chris Wilson [Wed, 17 May 2017 12:10:06 +0000 (13:10 +0100)]
drm/i915: Stop inlining the execlists IRQ handler
As the handler is now quite complex, involving a few atomics, the cost
of the function preamble is negligible in comparison and so we should
leave the function out-of-line for better I$.
Chris Wilson [Wed, 17 May 2017 12:10:05 +0000 (13:10 +0100)]
drm/i915/execlists: Reduce lock contention between schedule/submit_request
If we do not require to perform priority bumping, and we haven't yet
submitted the request, we can update its priority in situ and skip
acquiring the engine locks -- thus avoiding any contention between us
and submit/execute.
v2: Remove the stack element from the list if we can do the early
assignment.
Chris Wilson [Wed, 17 May 2017 12:10:04 +0000 (13:10 +0100)]
drm/i915: Create a kmem_cache to allocate struct i915_priolist from
The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.
Chris Wilson [Wed, 17 May 2017 12:10:03 +0000 (13:10 +0100)]
drm/i915: Split execlist priority queue into rbtree + linked list
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.
Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.
There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).
v2: Avoid use-after-free when deleting a depleted priolist
v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure! Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
Chris Wilson [Wed, 17 May 2017 12:10:01 +0000 (13:10 +0100)]
drm/i915: Don't mark an execlists context-switch when idle
If we *know* that the engine is idle, i.e. we have not more contexts in
flight, we can skip any spurious CSB idle interrupts. These spurious
interrupts seem to arrive long after we assert that the engines are
completely idle, triggering later assertions:
On the other hand, by ignoring the interrupt do we risk running out of
space in CSB ring? Testing for a few hours suggests not, i.e. that we
only seem to get the odd delayed CSB idle notification.
The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).
v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.
Chris Wilson [Wed, 17 May 2017 12:09:59 +0000 (13:09 +0100)]
drm/i915: Redefine ptr_pack_bits() and friends
Rebrand the current (pointer | bits) pack/unpack utility macros as
explicit bit twiddling for PAGE_SIZE so that we can use the more
flexible underlying macros for different bits.
Chris Wilson [Wed, 17 May 2017 12:09:58 +0000 (13:09 +0100)]
drm/i915: Make ptr_unpack_bits() more function-like
ptr_unpack_bits() is a function-like macro, as such it is meant to be
replaceable by a function. In this case, we should be passing in the
out-param as a pointer.
Bizarrely this does affect code generation:
function old new delta
i915_gem_object_pin_map 409 389 -20
An improvement(?) in this case, but one can't help wonder what
strict-aliasing optimisations we are preventing.
The generated code looks identical in using ptr_unpack_bits (no extra
motions to stack, the pointer and bits appear to be kept in registers),
the difference appears to be code ordering and with a reorder it is able
to use smaller forward jumps.
Chris Wilson [Wed, 17 May 2017 12:09:57 +0000 (13:09 +0100)]
drm/i915: Import the kfence selftests for i915_sw_fence
A long time ago, I wrote some selftests for the struct kfence idea. Now
that we have infrastructure in i915/igt for running kselftests, include
some for i915_sw_fence.
Chris Wilson [Wed, 17 May 2017 12:09:56 +0000 (13:09 +0100)]
drm/i915: Remove kref from i915_sw_fence
My original intention was for i915_sw_fence to be the base class and
provide the reference count for the container. This was from starting
with a design to handle async_work. In practice, for i915 we embed
fences into structs which have their own independent reference counting,
making the i915_sw_fence.kref duplicitous. If we remove the kref, we
remove the i915_sw_fence's ability to free itself and its independence,
it can only exist within a container and must be supplied with a
callback to handle its release.
This basically reverts commit 465418c6064c
("drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7")
with small addition - marking it as affecting GLK as well.
It was incorrectly considered fixed in production steppings.
Matthew Auld [Tue, 16 May 2017 08:55:14 +0000 (09:55 +0100)]
drm/i915: use vma->size for appgtt allocate_va_range
For the aliasing ppgtt we clear the va range up to vma->size, but seem
to allocate up to vma->node.size, which is a little inconsistent given
that vma->node.size >= vma->size. Not that is really matters all that
much since we preallocate anyway, but for consistency just use
vma->size.
Fixes: ff685975d97f ("drm/i915: Move allocate_va_range to GTT") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170516085514.5853-1-matthew.auld@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Madhav Chauhan [Tue, 9 May 2017 13:29:24 +0000 (18:59 +0530)]
drm/i915/glk: Calculate high/low switch count for GLK
As per BSPEC, high/low switch count to be programmed in
terms of byteclock using exit_zero_count and prep_count.
For Geminilake exit/prep counts are already calculated
in terms of byteclock. This patch calculates high/low
switch count using counts value in byteclock, old calculation
leads to screen flicker/shift issue while resuming from S3/S4.
Robert Bragg [Thu, 11 May 2017 15:43:31 +0000 (16:43 +0100)]
drm/i915/perf: rate limit spurious oa report notice
This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.
The workaround for the OA unit's tail pointer race condition is what
avoids the primary known cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.
Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.
v2: (Chris) avoid inconsistent warning on throttle with
printk_ratelimit()
v3: (Matt) init and summarise with stream init/close not driver init/fini
This updates the tail pointer race workaround handling to updating the
'aged' pointer before looking to start aging a new one. There's the
possibility that there is already new data available and so we can
immediately start aging a new pointer without having to first wait for a
later hrtimer callback (and then another to age).
Robert Bragg [Thu, 11 May 2017 15:43:28 +0000 (16:43 +0100)]
drm/i915/perf: improve tail race workaround
There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).
Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.
Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.
This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.
The driver now maintains two tail pointers:
1) An 'aging' tail with an associated timestamp that is tracked until we
can trust the corresponding data is visible to the CPU; at which point
it is considered 'aged'.
2) An 'aged' tail that can be used for read()ing.
The two separate pointers let us decouple read()s from tail pointer aging.
The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.
The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.
Robert Bragg [Thu, 11 May 2017 15:43:27 +0000 (16:43 +0100)]
drm/i915/perf: no head/tail ref in gen7_oa_read
This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.
Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.
This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.
Robert Bragg [Thu, 11 May 2017 15:43:26 +0000 (16:43 +0100)]
drm/i915/perf: avoid read back of head register
There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.
This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.
If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.
This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.
drm/i915: Restore brightness level in aux backlight driver
Some panel will default to zero brightness when turning the
panel off and on again. This patch restores last brightness
level back when panel is turning back on.
drm/i915: Correctly enable backlight brightness adjustment via DPCD
intel_dp_aux_enable_backlight() assumed that the register
BACKLIGHT_BRIGHTNESS_CONTROL_MODE can only has value 01
(DP_EDP_BACKLIGHT_CONTROL_MODE_PRESET) when initialize.
This patch fixed that by handling all cases of that register.
Matthew Auld [Fri, 12 May 2017 09:14:23 +0000 (10:14 +0100)]
drm/i915: don't do allocate_va_range again on PIN_UPDATE
If a vma is already bound to a ppgtt, we incorrectly call
allocate_va_range again when doing a PIN_UPDATE, which will result in
over accounting within our paging structures, such that when we do
unbind something we don't actually destroy the structures and end up
inadvertently recycling them. In reality this probably isn't too bad,
but once we start touching PDEs and PDPEs for 64K/2M/1G pages this
apparent recycling will manifest into lots of really, really subtle
bugs.
v2: Fix the testing of vma->flags for aliasing_ppgtt_bind_vma
Chuanxiao Dong [Thu, 11 May 2017 10:07:42 +0000 (18:07 +0800)]
drm/i915: set initialised only when init_context callback is NULL
During execlist_context_deferred_alloc() we presumed that the context is
uninitialised (we only just allocated the state object for it!) and
chose to optimise away the later call to engine->init_context() if
engine->init_context were NULL. This breaks with GVT's contexts that are
marked as pre-initialised to avoid us annoyingly calling
engine->init_context(). The fix is to not override ce->initialised if it
is already true.
Joonas Lahtinen [Wed, 10 May 2017 11:00:40 +0000 (14:00 +0300)]
drm/i915: Do not sync RCU during shrinking
Due to the complex dependencies between workqueues and RCU, which
are not easily detected by lockdep, do not synchronize RCU during
shrinking.
On low-on-memory systems (mem=1G for example), the RCU sync leads
to all system workqueus freezing and unrelated lockdep splats are
displayed according to reports. GIT bisecting done by J. R.
Okajima points to the commit where RCU syncing was extended.
RCU sync gains us very little benefit in real life scenarios
where the amount of memory used by object backing storage is
dominant over the metadata under RCU, so drop it altogether.
" Yeeeaah, if core could just, go ahead and reclaim RCU
queues, that'd be great. "
v2: More information to commit message.
v3: Remove "grep _rcu_" escapee from i915_gem_shrink_all (Andrea)
Fixes: c053b5a506d3 ("drm/i915: Don't call synchronize_rcu_expedited under struct_mutex") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Hugh Dickins <hughd@google.com> Tested-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: <stable@vger.kernel.org> # v4.11+ Link: http://patchwork.freedesktop.org/patch/msgid/1494414040-11160-1-git-send-email-joonas.lahtinen@linux.intel.com
Michal Wajdeczko [Wed, 10 May 2017 12:59:27 +0000 (12:59 +0000)]
drm/i915/guc: Make scratch register base and count flexible
We are using some scratch registers in MMIO based send function.
Make their base and count flexible in preparation of upcoming
GuC firmware/hardware changes. While around, change cmd len
parameter verification from WARN_ON to GEM_BUG_ON as we don't
need this all the time.
v2: call out WARN/GEM_BUG change in the commit msg (Daniele)
v3: don't overqualify the ints (Chris)
v4: rebase and use proper enum
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
drm/i915: Fix hw state verifier access to crtc->state.
We shouldn't inspect crtc->state, instead grab the crtc state.
At this point the hw state verifier should be able to run even if
crtc->state has been updated (which cannot currently happen).
Daniel Vetter [Wed, 10 May 2017 15:19:32 +0000 (17:19 +0200)]
drm/i915: Fix __intel_wait_for_register_fw to not sleep in atomic
The unconditionally fallback to the blocking wait_for resulted in
impressive fireworks at boot-up on my snb here. Make sure if we set
the slow timeout to 0 that we never ever sleep. The tail of the
callchain was
It blew up in intel_crt_detect load detection code on the
ADPA_CRT_HOTPLUG_FORCE_TRIGGER in the ADPA register.
v2: Shut up gcc.
v3: Use uninitialized_var() (Chris).
Fixes: 0564654340e2 ("drm/i915: Acquire uncore.lock over intel_uncore_wait_for_register()") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1494429572-15118-1-git-send-email-daniel.vetter@ffwll.ch
Ville Syrjälä [Mon, 27 Mar 2017 18:55:46 +0000 (21:55 +0300)]
drm/i915: Simplify cursor register write sequence
It looks like simply writing all the cursor register every single
time might be slightly faster than checking to see of each of
them need to be written. So if any other register apart from
CURPOS needs to be written let's just write all the registers.
CURPOS is left as a special case mainly for 845/865 where we have to
disable the cursor to change many of the cursor parameters. This
introduces a slight chance of the cursor flickering when things get
updated (since we're not currently doing the vblank evade for cursor
updates). If we write CURPOS alone then that obviously can't happen.
And let's follow the same pattern in the i9xx code just for symmetry.
I wasn't able to see a singificant performance difference between
this and just writing all the registers unconditionally.
Ville Syrjälä [Mon, 27 Mar 2017 18:55:45 +0000 (21:55 +0300)]
drm/i915: Relax 845/865 CURBASE alignemnt requirement to 32 bytes
Supposedly 845/865 require only 32 byte alignment for CURBASE. Let's
relax the checks to allow that instead of demanding 4KiB alignment.
This will allow cursor panning in 8 pixel units.
Ville Syrjälä [Mon, 27 Mar 2017 18:55:44 +0000 (21:55 +0300)]
drm/i915: Handle fb offset and src coordinates for cursors
The cursor plane doesn't have any kind of source offset register, so
the only form of panning possible is via a the base address register.
The alignment required by CURBASE ranges from 32B to 16KiB depending
on the platform. Let's make sure the user didn't ask for something
we can't do.
Obviously this is impossible to hit via the legacy cursor ioctl since
the src offsets are always 0, but via the plane/atomic ioctls the user
can ask for pretty much anything so we have to deal with this.
Bspec tells us that gen3 platforms need 4KiB alignment for CURBASE
rather than the 256 byte alignment required by i85x. Let's fix that
and pull the code to determine the correct alignment to a helper
function.
Ville Syrjälä [Mon, 27 Mar 2017 18:55:42 +0000 (21:55 +0300)]
drm/i915: Support variable cursor height on ivb+
IVB introduced the CUR_FBC_CTL register which allows reducing the cursor
height down to 8 lines from the otherwise square cursor dimensions.
Implement support for it. CUR_FBC_CTL can't be used when the cursor
is rotated.
Commandeer the otherwise unused cursor->cursor.size to track the
current value of CUR_FBC_CTL to optimize away redundant CUR_FBC_CTL
writes, and to notice when we need to arm the update via CURBASE if
just CUR_FBC_CTL changes.
v2: Reverse the gen check to make it sane
v3: Only enable CUR_FBC_CTL when cursor is enabled, adapt to
earlier code changes which means we now actually turn off
the cursor when we're supposed to unlike v2
v4: Add a comment about rotation vs. CUR_FBC_CTL,
rebase due to 'dirty' (Chris)
v5: Rebase to the atomic world
Handle 180 degree rotation
Add HAS_CUR_FBC()
v6: Rebase
v7: Rebase due to I915_WRITE_FW/uncore.lock
s/size/fbc_ctl/
Ville Syrjälä [Mon, 27 Mar 2017 18:55:41 +0000 (21:55 +0300)]
drm/i915: Use fb->pitches[0] in cursor code
The cursor code currently ignores fb->pitches[0] (except when creating
the fb itself), and just uses the cursor_width*4 as the stride. Let's
make sure fb->pitches[0] actually matches what we expect it to be.
We can also relax the stride vs. cursor width relationship on 845/865
since the stride is programmed separately. The only constraint is that
width*cpp doesn't exceed the stride, and that's already been checked
by the core since it makes sure the entire plane fits within the fb.
We can also drop the bo size check as that's already checked when
we create the fb. That is the fb is guaranteed to fit within the bo.
v2: Rebase due to i845_cursor_ctl() and i9xx_cursor_ctl()
Ville Syrjälä [Mon, 27 Mar 2017 18:55:40 +0000 (21:55 +0300)]
drm/i915: Generalize cursor size checks a bit
We have the maximum cursor dimensions stored in the mode_config, so
let's just consult that information instead of hardcoding the same
information in multiple places.
We still need to keep some per-platform checks as the limitations are
quite diverse.
Ville Syrjälä [Mon, 27 Mar 2017 18:55:39 +0000 (21:55 +0300)]
drm/i915: Split cursor check_plane into i845 and i9xx variants
The 845/865 and 830/855/9xx+ style cursor don't have that
much in common with each other, so let's just split the
.check_plane() hook into two variants as well.
v2: Keep the common stuff in one place (Chris)
v3: s/DRM_FORMAT_MOD_NONE/DRM_FORMAT_MOD_LINEAR/
Ville Syrjälä [Mon, 27 Mar 2017 18:55:37 +0000 (21:55 +0300)]
drm/i915: Move cursor position and base handling into the platform specific functions
Supposedly on some platforms we can get extra atomicity guarantees for
CURPOS if we write it between the CURCNTR and CURBASE. Let's move the
CURPOS handling into the platform specific hooks to make the possible
without having to pass the calculated CURPOS around. And while at it,
do the same for the CURBASE to avoid passing that either.
Ville Syrjälä [Mon, 27 Mar 2017 18:55:36 +0000 (21:55 +0300)]
drm/i915: Refactor CURPOS calculation
Move the CURPOS calculations to seprate function. This will allow
sharing the code between the 845/865 vs. others codepaths when we
otherwise split them apart.
Ville Syrjälä [Mon, 27 Mar 2017 18:55:35 +0000 (21:55 +0300)]
drm/i915: Clean up cursor junk from intel_crtc
Move cursor_base, cursor_cntl, and cursor_size from intel_crtc
into intel_plane so that we don't need the crtc for cursor stuff
so much.
Also entirely nuke cursor_addr which IMO doesn't provide any benefit
since it's not actually used by the cursor code itself. I'm not 100%
sure what the SKL+ DDB is code is after by looking at cursor_addr so
I just make it do its checks unconditionally. If that's not correct
then we should likely replace it with somehting like
plane_state->visible.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:32 +0000 (21:14 +0300)]
drm/i915: Add support for sprites on g4x
Now that the watermarks are in order, it should be safe to enable sprite
planes on g4x. We alreday have the code in fact, we just call it ilk_.
Let's rename to g4x_ and let it loose.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:30 +0000 (21:14 +0300)]
drm/i915: Enable HPLL watermarks on g4x
I don't see why we couldn't use the HPLL watermarks on g4x. So let's
enable them. Let's assume a 35 usec memory latency for the HPLL mode.
That's roughly what PNV uses.
Based on the behaviour of the ELK box I have 35 usec is probably
overkill. Actually all the current latency values used seem overkill as
I can reduce them pretty drastically before I start to see underruns.
But let's play things a bit safe for now.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:29 +0000 (21:14 +0300)]
drm/i915: Two stage watermarks for g4x
Implement proper two stage watermark programming for g4x. As with
other pre-SKL platforms, the watermark registers aren't double
buffered on g4x. Hence we must sequence the watermark update
carefully around plane updates.
The code is quite heavily modelled on the VLV/CHV code, with some
fairly significant differences due to the different hardware
architecture:
* g4x doesn't use inverted watermark values
* CxSR actually affects the watermarks since it controls memory self
refresh in addition to the max FIFO mode
* A further HPLL SR mode is possible with higher memory wakeup
latency
* g4x has FBC2 and so it also has FBC watermarks
* max FIFO mode for primary plane only (cursor is allowed, sprite is not)
* g4x has no manual FIFO repartitioning
* some TLB miss related workarounds are needed for the watermarks
Actually the hardware is quite similar to ILK+ in many ways. The
most visible differences are in the actual watermakr register
layout. ILK revamped that part quite heavily whereas g4x is still
using the layout inherited from earlier platforms.
Note that we didn't previously enable the HPLL SR on g4x. So in order
to not introduce too many functional changes in this patch I've not
actually enabled it here either, even though the code is now fully
ready for it. We'll enable it separately later on.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:28 +0000 (21:14 +0300)]
drm/i915: Apply the g4x TLB miss w/a to SR watermarks as well
The documentation I've seen doesn't actually specify which watermarks
need the TLB miss w/a. Currently we only apply the w/a to the normal
watermarks for both primary and cursor planes. Since the documentation
doesn't explicitly say anything I'm going to assume that the w/a should
equally apply to the SR/HPLL watermarks. So let's do that.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:27 +0000 (21:14 +0300)]
drm/i915: Refactor wm calculations
All platforms until SKL compute their watermarks essentially
using the same method1/small buffer and method2/large buffer
formulas. Most just open code it in slightly different ways.
Let's pull it all into common helpers. This makes it a little
easier to spot the actual differences.
While at it try to add some docs explainign what the formulas
are trying to do.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:25 +0000 (21:14 +0300)]
drm/i915: Fix the g4x watermark TLB miss workaround
The g4x watermark TLB miss workaround requires that we bump up the
watermark by the difference between 8 full lines and the FIFO size.
Unfortunately the way we compute it at the moment ignores the size
of the pixels. The code also used the primary plane width as the
cursor width when computing the TLB miss w/a for the cursor.
Let's fix both problems.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:24 +0000 (21:14 +0300)]
drm/i915: Fix cursor 'cpp' in watermark calculatins for old platforms
The watermark code for the old platforms (g4x and older) uses the
primary plane cpp when computing cursor watermarks. To keep the fix
simple let's just hardcode cpp=4 for the cursor on those platforms
since that's all we support.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:22 +0000 (21:14 +0300)]
drm/i915: Make vlv/chv watermark debug print less cryptic
The magic numbers 0,1,2 aren't all that interesting for users perhaps.
Since we know what these watermark levels mean for VLV/CHV let's print
their names.
Rename the VLV/CHV max_level->num_levels helper to have an intel_
prefix since it's not VLV/CHV specific and I'll want to use it on
other platforms as well.
Ville Syrjälä [Fri, 21 Apr 2017 18:14:19 +0000 (21:14 +0300)]
drm/i915: Drop the debug message from vlv_get_fifo_size()
Seeing the display FIFO sizes at driver load time doesn't really provide
anything useful for us, so let's just drop the debug message. One can
always use eg. intel_watermarks to dump out the hardware settings prior
to loading the driver.
Imre Deak [Wed, 10 May 2017 09:21:53 +0000 (12:21 +0300)]
drm/i915/lvds: Remove magic from PLL programming
This looks like a left-over from enabling work. The spec defines
CH7017_LVDS_PLL_FEEDBACK_DEFAULT_RESERVED as reserved set, so follow
this in the programming.
v2:
- Follow the spec to set CH7017_LVDS_PLL_FEEDBACK_DEFAULT_RESERVED.
(Ville)
Imre Deak [Wed, 10 May 2017 09:21:52 +0000 (12:21 +0300)]
drm/i915: Sanitize stolen memory size calculation
On GEN8+ (not counting CHV) the calculation can in theory result in an
incorrect sign extension with all upper bits set. In practice this is
unlikely to happen since it would require 4GB of stolen memory set
aside. For consistency still prevent the sign extension explicitly
everywhere.
Imre Deak [Wed, 10 May 2017 09:21:51 +0000 (12:21 +0300)]
drm/i915: Check error return when converting pipe to connector
An error from intel_get_pipe_from_connector() would mean a bug somewhere
else, but we still should check for it to prevent some other more
obscure bug later.
v2:
- Fall back to a reasonable default instead of bailing out in case of
error. (Jani)
v3:
- Fix s/PIPE_INVALID/INVALID_PIPE/ typo. (Jani)
v4:
- Fix bogus bracing around WARN() condition. (Ville)
Imre Deak [Wed, 10 May 2017 09:21:49 +0000 (12:21 +0300)]
drm/i915/sdvo: Check error return from intel_sdvo_get_value()
The current code assumes that 'enhancements' won't change in case of an
error, but this isn't guaranteed. Fix things by treating any error as a
lack of the given capability.
v2:
- Remove the now redundant init of enhancements. (Ville)
Imre Deak [Wed, 10 May 2017 09:21:48 +0000 (12:21 +0300)]
drm/i915/dp: Check error return during DPCD capability queries
The assumptions of these users of drm_dp_dpcd_readb() is that the passed
in output buffer won't change in case of error, but this isn't
guaranteed. Fix this by treating any error as the lack of the given
capability.
In case of DP_SINK_DEVICE_AUX_FRAME_SYNC_CAP an error would leave the
buffer uninitialized even with the above assumption.
Mika Kuoppala [Tue, 9 May 2017 10:05:22 +0000 (13:05 +0300)]
drm/i915: Show dmc debug registers on Kabylake
The assumption is that the registers offsets are
identical as with skl. Also all the published
kbl firmwares support the debug registers. So
let kbl show the debug counts.
drm/i915: Move uncore definitions into a separate header
In order to allow use of e.g. forcewake_domains in a other feature headers
included from the top of i915_drv.h, move all uncore related definitions
into their own header.
v2: move __mask_next_bit macro to utils header (Mika)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Ville Syrjälä [Thu, 4 May 2017 18:15:30 +0000 (21:15 +0300)]
drm/i915: Fix rawclk readout for g4x
Turns out our skills in decoding the CLKCFG register weren't good
enough. On this particular elk the answer we got was 400 MHz when
in reality the clock was running at 266 MHz, which then caused us
to program a bogus AUX clock divider that caused all AUX communication
to fail.
Sadly the docs are now in bit heaven, so the fix will have to be based
on empirical evidence. Using another elk machine I was able to frob
the FSB frequency from the BIOS and see how it affects the CLKCFG
register. The machine seesm to use a frequency of 266 MHz by default,
and fortunately it still boot even with the 50% CPU overclock that
we get when we bump the FSB up to 400 MHz.
It turns out the actual FSB frequency and the register have no real
link whatsoever. The register value is based on some straps or something,
but fortunately those too can be configured from the BIOS on this board,
although it doesn't seem to respect the settings 100%. In the end I was
able to derive the following relationship:
So only the 200 and 400 MHz cases actually match how we're currently
decoding that register. But as the comment next to some of the defines
says, we have been just guessing anyway.
So let's fix things up so that at least the 266 MHz case will work
correctly as that is actually the setting used by both the buggy
machine and my test machine.
The fact that 333 and 400 MHz BIOS settings result in the same register
value is a little disappointing, as that means we can't tell them apart.
However, according to the gmch datasheet for both elk and ctg 400 Mhz is
not even a supported FSB frequency, so I'm going to make the assumption
that we should decode it as 333 MHz instead.
Cc: stable@vger.kernel.org Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100926 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504181530.6908-1-ville.syrjala@linux.intel.com Acked-by: Jani Nikula <jani.nikula@intel.com> Tested-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Chris Wilson [Thu, 4 May 2017 13:08:46 +0000 (14:08 +0100)]
drm/i915: Micro-optimise hotpath through intel_ring_begin()
Typically, there is space available within the ring and if not we have
to wait (by definition a slow path). Rearrange the code to reduce the
number of branches and stack size for the hotpath, accomodating a slight
growth for the wait.
v2: Fix the new assert that packets are not larger than the actual ring.
v3: Make the parameters unsigned as well to make usage.
Chris Wilson [Thu, 4 May 2017 13:08:45 +0000 (14:08 +0100)]
drm/i915: Report the ring->space from intel_ring_update_space()
Some callers immediately want to know the current ring->space after
calling intel_ring_update_space(), which we can freely provide via the
return parameter.
Chris Wilson [Thu, 4 May 2017 13:08:44 +0000 (14:08 +0100)]
drm/i915: Avoid the branch in computing intel_ring_space()
Exploit the power-of-two ring size to compute the space across the
wraparound using a mask rather than a if. Convert to unsigned integers
so the operation is well defined.