Yair Shachar [Wed, 20 May 2015 10:43:04 +0000 (13:43 +0300)]
drm/amdkfd: Add static user-mode queues support
This patch adds support for static user-mode queues in QCM.
Queues which are designated as static can NOT be preempted by
the CP microcode when it is executing its scheduling algorithm.
This is needed for supporting the debugger feature, because we
can't allow the CP to preempt queues which are currently being debugged.
The number of queues that can be designated as static is limited by the
number of HQDs (Hardware Queue Descriptors).
Yair Shachar [Sun, 7 Dec 2014 15:05:22 +0000 (17:05 +0200)]
drm/amdkfd: add H/W debugger IOCTL set definitions
This patch adds four new IOCTLs to amdkfd. These IOCTLs expose a H/W
debugger functionality to the userspace.
The IOCTLs are:
- AMDKFD_IOC_DBG_REGISTER:
The purpose of this IOCTL is to notify amdkfd that a process wants to use
GPU debugging facilities on itself only.
It is expected that this IOCTL would be called before any other H/W
debugger requests are sent to amdkfd and for each GPU where the H/W
debugging needs to be enabled. The use of this IOCTL ensures that only
one instance of a debugger is active in the system.
- AMDKFD_IOC_DBG_UNREGISTER:
This IOCTL detaches the debugger/debugged process from the H/W
Debug which was established by the AMDKFD_IOC_DBG_REGISTER IOCTL.
- AMDKFD_IOC_DBG_ADDRESS_WATCH:
This IOCTL allows to set different watchpoints with various conditions as
indicated by the IOCTL's arguments. The available number of watchpoints
is retrieved from topology. This operation is confined to the current
debugged process, which was registered through AMDKFD_IOC_DBG_REGISTER.
- AMDKFD_IOC_DBG_WAVE_CONTROL:
This IOCTL allows to control a wavefront as indicated by the IOCTL's
arguments. For example, you can halt/resume or kill either a
single wavefront or a set of wavefronts. This operation is confined to
the current debugged process, which was registered through
AMDKFD_IOC_DBG_REGISTER.
Because the arguments for the address watch IOCTL and wave control IOCTL
are dynamic, meaning that they could vary in size, the userspace passes a
pointer to a structure (in userspace) that contains the value of the
arguments. The kernel driver is responsible to parse this structure and
validate its contents.
v2: change void* to uint64_t inside ioctl arguments
This patch adds new interface functions to the kfd2kgd interface file. The
new functions allow to perform H/W debugger operations by writing to GPU
registers.
Dave Airlie [Tue, 2 Jun 2015 08:10:50 +0000 (18:10 +1000)]
Merge tag 'drm-intel-next-fixes-2015-05-29' of git://anongit.freedesktop.org/drm-intel into drm-next
Fixes for 4.2. Nothing too serious (given that it's still pre merge
window). With that it's off for 2 weeks of vacation for me and taking care
of 4.2 fixes for Jani.
* tag 'drm-intel-next-fixes-2015-05-29' of git://anongit.freedesktop.org/drm-intel:
drm/i915: limit PPGTT size to 2GB in 32-bit platforms
drm/i915: Another fbdev hack to avoid PSR on fbcon.
drm/i915: Return the frontbuffer flip to enable intel_crtc_enable_planes.
drm/i915: disable IPS while getting the sink CRCs
drm/i915: Disable 12bpc hdmi for now
drm/i915: Adjust sideband locking a bit for CHV/VLV
drm/i915: s/dpio_lock/sb_lock/
drm/i915: Kill intel_flush_primary_plane()
drm/i915: Throw out WIP CHV power well definitions
drm/i915: Use the default 600ns LDO programming sequence delay
drm/i915: Remove unnecessary null check in execlists_context_unqueue
drm/i915: Use spinlocks for checking when to waitboost
drm/i915: Fix the confusing comment about the ioctl limits
Revert "drm/i915: Force clean compilation with -Werror"
drm/ttm: dma: Don't crash on memory in the vmalloc range
dma_alloc_coherent() can return memory in the vmalloc range.
virt_to_page() cannot handle such addresses and crashes. This
patch detects such cases and obtains the struct page * using
vmalloc_to_page() instead.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
drm/i915: Use complete address space in true PPGTT
v2: Prettify code and explain why this is needed. (Chris)
v3: Don't hide the compilation warning in 32-bit. (Chris)
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rodrigo Vivi [Thu, 28 May 2015 17:26:58 +0000 (10:26 -0700)]
drm/i915: Another fbdev hack to avoid PSR on fbcon.
With unified modeset and flip paths introduced recently when switching
to fbcon PSR was being disabled on fb_set_par path but re-enabled on
fb_pan_display one, causing missed screen updates and un unusable
console.
Regression introduced with:
commit bb54662350662815b4bfc2ff4464330a2dbd7041
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date: Tue Apr 21 17:13:13 2015 +0300
drm/i915: Unify modeset and flip paths of intel_crtc_set_config()
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rodrigo Vivi [Thu, 28 May 2015 17:21:16 +0000 (10:21 -0700)]
drm/i915: Return the frontbuffer flip to enable intel_crtc_enable_planes.
Without this frontbuffer flip when enabling planes PSR got compromised
and wasn't being enabled waiting forever on the flush that never
arrived.
Another solution would to create a enable_cursor function and split this
frontbuffer flip among the different plane enable and disable functions.
But if necessary this can be done in a follow up work. For now let's
just fix the regression.
Russell King [Thu, 28 May 2015 09:36:27 +0000 (10:36 +0100)]
drm: clean up drm_mm debugfs output
The drm_mm debugfs output is difficult to read as two different formats
are used for the addresses:
0x00000080000000-0x0000008000b000: 45056: used
0x8000b000-0x80016000: 45056: free
0x00000080016000-0x0000008001b000: 20480: used
0x8001b000-0x817a1000: 24666112: free
0x000000817a1000-0x000000817a8000: 28672: used
0x000000817a8000-0x00000081ba8000: 4194304: used
Fix this by using %#018llx for all addresses, thus making the output:
0x0000000080000000-0x000000008000b000: 45056: used
0x000000008000b000-0x0000000080016000: 45056: free
0x0000000080016000-0x000000008001b000: 20480: used
0x000000008001b000-0x00000000817a1000: 24666112: free
0x00000000817a1000-0x00000000817a8000: 28672: used
0x00000000817a8000-0x0000000081ba8000: 4194304: used
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 28 May 2015 23:11:49 +0000 (09:11 +1000)]
Merge tag 'drm-intel-next-2015-05-22' of git://anongit.freedesktop.org/drm-intel into drm-next
- cpt modeset sequence fixes from Ville
- more rps boosting tuning from Chris
- S3 support for skl (Damien)
- a pile of w/a for bxt from various people
- cleanup of primary plane pixel formats (Damien)
- a big pile of small patches with fixes and cleanups all over
* tag 'drm-intel-next-2015-05-22' of git://anongit.freedesktop.org/drm-intel: (90 commits)
drm/i915: Update DRIVER_DATE to 20150522
drm/i915: Introduce DRM_I915_THROTTLE_JIFFIES
drm/i915: Use the correct destructor for freeing requests on error
drm/i915/skl: don't fail colorkey + scaler request
drm/i915: Enable GTT caching on gen8
drm/i915: Move WaProgramL3SqcReg1Default:bdw to init_clock_gating()
drm/i915: Use ilk_init_lp_watermarks() on BDW
drm/i915: Disable FDI RX/TX before the ports
drm/i915: Disable CRT port after pipe on PCH platforms
drm/i915: Disable SDVO port after the pipe on PCH platforms
drm/i915: Disable HDMI port after the pipe on PCH platforms
drm/i915: Fix the IBX transcoder B workarounds
drm/i915: Write the SDVO reg twice on IBX
drm/i915: Fix DP enhanced framing for CPT
drm/i915: Clean up the CPT DP .get_hw_state() port readout
drm/i915: Clarfify the DP code platform checks
drm/i915: Remove the double register write from intel_disable_hdmi()
drm/i915: Remove a bogus 12bpc "toggle" from intel_disable_hdmi()
drm/i915/skl: Deinit/init the display at suspend/resume
drm/i915: Free RPS boosts for all laggards
...
Dave Airlie [Thu, 28 May 2015 23:10:54 +0000 (09:10 +1000)]
Merge branch 'drm-next-4.2' of git://people.freedesktop.org/~agd5f/linux into drm-next
for amdgpu separately next week. Highlights for radeon:
- VCE1 support
- Bug fixes and misc cleanups
* 'drm-next-4.2' of git://people.freedesktop.org/~agd5f/linux:
radeon: Deinline indirect register accessor functions
drm/radeon: Fix max_vblank_count value for current display engines
drm/radeon: stop using addr to check for BO move
drm/radeon: clean up radeon_audio_enable
drm/radeon: take the mode_config mutex when dealing with hpds (v2)
drm/radeon: make dpcd parameters const
drm/radeon: Use DECLARE_BITMAP
drm/radeon/tn/si: enable/disable vce cg when encoding v2
drm/radeon: add support for vce 1.0 clock gating
drm/radeon: add VCE 1.0 support v4
drm/radeon/dpm: add vce support for SI
drm/radeon/dpm: add vce dpm support for TN
drm/radeon: implement tn_set_vce_clocks
drm/radeon: implement si_set_vce_clocks v2
drm/radeon: allow some more VCE firmware versions
drm/radeon: rework VCE FW size calculation
drm/radeon: add a GPU reset counter queryable by userspace
This patch deinlines indirect register accessor functions.
These functions perform two mmio accesses, framed by spin lock/unlock.
Spin lock/unlock by itself takes more than 50 cycles in ideal case
(if lock is exclusively cached on current CPU).
With this .config: http://busybox.net/~vda/kernel_config,
after uninlining these functions have sizes and callsite counts
as follows:
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Paulo Zanoni [Mon, 25 May 2015 21:52:29 +0000 (18:52 -0300)]
drm/i915: disable IPS while getting the sink CRCs
This commit is the "sink CRC" version of:
commit 8c740dcea254a1472df2c0ac5ac585412a2507ec
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Fri Oct 17 18:42:03 2014 -0300
drm/i915: disable IPS while getting the pipe CRCs.
For some unknown reason, when IPS gets enabled, the sink CRC changes.
Since hsw_enable_ips() doesn't really guarantee to enable IPS (it
depends on package C-states), we can't really predict if IPS is
enabled or disabled while running our CRC tests, so let's just
completely disable IPS while sink CRCs are being used.
If we find a way to make IPS not change the pipe CRC result, we may
want to fix IPS and then revert this patch (and 8c740dcea too). While
this doesn't happen, let's merge this patch, so the IGT tests relying
on sink CRCs can work properly.
This was discovered while developing a new IGT test, which will
probably be called kms_frontbuffer_tracking.
Testcase: igt/kms_frontbuffer_tracking (not on upstream IGT yet) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm/i915: Select starting pipe bpp irrespective or the primary plane
the kernel will try to use it even for the common rgb888 framebuffers.
Ville has patches to fix it all up properly, but unfortunately they're
stuck in review limbo. And since the 4.2 feature cutoff has passed we
need to somehow handle this regression.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <przanoni@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Ville Syrjälä [Tue, 26 May 2015 17:42:31 +0000 (20:42 +0300)]
drm/i915: Adjust sideband locking a bit for CHV/VLV
chv_enable_pll() doesn't need to hold sb_lock for the entire duration of
the function. Drop the lock as soon as possible.
valleyview_set_cdclk() does a potential lock+unlock+lock+unlock cycle
with sb_lock. Grab the lock a few lines earlier so we can make do
with a single lock+unlock cycle always.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 26 May 2015 17:42:30 +0000 (20:42 +0300)]
drm/i915: s/dpio_lock/sb_lock/
Rename dpio_lock to sb_lock to inform the reader that its primary
purpose is to protect the sideband mailbox rather than some DPIO
state.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 26 May 2015 17:27:23 +0000 (20:27 +0300)]
drm/i915: Kill intel_flush_primary_plane()
The primary plane frobbing was removed from the sprite code in
commit ecce87ea3ab55ad0dc64460e6422c357d158a55e
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Tue Apr 21 17:12:50 2015 +0300
drm/i915: Remove implicitly disabling primary plane for now
but the intel_flush_primary_plane() calls were left behind. Replace them
with straight forward POSTING_READ() of the sprite surface address
register.
The other user of intel_flush_primary_plane() is g4x_disable_trickle_feed()
where we can just inline the steps directly.
This allows intel_flush_primary_plane() to be killed off.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 26 May 2015 17:22:39 +0000 (20:22 +0300)]
drm/i915: Throw out WIP CHV power well definitions
Expecting CHV power wells to be just an extended versions of the VLV
power wells, a bunch of commented out power wells were added in
anticipation when Punit folks would implement it all. Turns out they
never did, and instead CHV has fewer power wells than VLV. Rip out all
the #if 0'ed junk that's not needed.
v2: Rename the "pipe-a" well to "display" to match VLV
Clarify the pipe A power well relationship to pipes B and C (Deepak)
Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 26 May 2015 17:22:38 +0000 (20:22 +0300)]
drm/i915: Use the default 600ns LDO programming sequence delay
Not sure which LDO programming sequence delay should be used for the CHV
PHY, but the spec says that 600ns is "Used by default for initial
bringup", and the BIOS seems to use that, so let's do the same.
Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Michel Dänzer [Tue, 26 May 2015 08:53:39 +0000 (17:53 +0900)]
drm/radeon: Fix max_vblank_count value for current display engines
The value was much too low, which could cause the userspace visible
vblank counter to move backwards when the hardware counter wrapped
around.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michel Thierry [Mon, 27 Apr 2015 11:31:44 +0000 (12:31 +0100)]
drm/i915: Remove unnecessary null check in execlists_context_unqueue
commit 53292cdb066950611e5bc2e0eb109c7edb42af78 ("drm/i915: Workaround
to avoid lite restore with HEAD==TAIL") added a check for req0 != null
which is unnecessary.
The only way req0 could be null is if the list was empty, and this is
already addressed at the beginning of execlists_context_unqueue().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm/i915: Deminish contribution of wait-boosting from clients
we removed an atomic timer based check for allowing waitboosting and
moved it below the mutex taken during RPS. However, that mutex can be
held for long periods of time on Vallyview/Cherryview as communication
with the PCU is slow. As clients may frequently wait for results (e.g.
such as tranform feedback) we introduced contention between the client
and the RPS worker. We can take advantage of the RPS worker, by
switching the wait boost decision to use spin locks and defer the
actual reclocking to the worker.
Fixes a regression of up to 45% on Baytrail and Baswell!
v2 (Daniel):
- Use max_freq_softlimit instead of the not-yet-merged boost
frequency.
- Don't inject a fake irq into the boost work, instead treat
client_boost as just another legit waker.
v3: Drop the now unused mask (Chris).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Alex Deucher [Mon, 11 May 2015 20:01:55 +0000 (22:01 +0200)]
drm/radeon/tn/si: enable/disable vce cg when encoding v2
Some of the vce clocks are automatic, others need to
be manually enabled. For ease, just disable cg when
vce is active.
v2: rebased, call vce_v1_0_enable_mgcg directly
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Marek Olšák [Wed, 29 Apr 2015 17:40:33 +0000 (19:40 +0200)]
drm/radeon: add a GPU reset counter queryable by userspace
Userspace will be able to tell whether a GPU reset occured by comparing
an old referece value of the counter with a new value.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Tue, 26 May 2015 06:19:53 +0000 (16:19 +1000)]
Merge branch 'drm/next/rcar-du' of git://linuxtv.org/pinchartl/fbdev into drm-next
rcar-du fixes
* 'drm/next/rcar-du' of git://linuxtv.org/pinchartl/fbdev:
drm: rcar-du: Split planes pre-association 4/4 between CRTCs
drm: rcar-du: Store the number of CRTCs per group in the group structure
drm: rcar-du: Consider plane to CRTC associations in the plane allocator
drm: rcar-du: Keep plane to CRTC associations when disabling a plane
drm: rcar-du: Add plane allocation debugging
drm: rcar-du: Rename to_rcar_du_plane_state to to_rcar_plane_state
drm: rcar-du: Embed rcar_du_planes structure into rcar_du_group
drm: rcar-du: Move properties from rcar_du_planes to rcar_du_device
drm: rcar-du: Document the rcar_du_plane_state structure
drm: rcar-du: Document the rcar_du_crtc structure
It's causing too much trouble when compile-testing for non-i915 folks.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
drm: rcar-du: Split planes pre-association 4/4 between CRTCs
If we have more than one CRTCs in a group pre-associate planes 0-3 with
CRTC 0 and planes 4-7 with CRTC 1 to minimize flicker occurring when the
association is changed. The pre-association could be controlled by a
module parameter if needed.
drm: rcar-du: Store the number of CRTCs per group in the group structure
The number of CRTCs in a group is only used to implement plane
initialization for now, but is also needed to implement pre-association
of planes to CRTCs. Store it in the group structure instead of computing
it on demand.
drm: rcar-du: Consider plane to CRTC associations in the plane allocator
Hardware planes are driven by the timing generator of the CRTC they are
associated to. Changing the association requires restarting the CRTC
group that the plane belongs to, resulting in flicker on the other CRTC.
To avoid flicker as much as possible, try to allocate planes first from
the free planes already associated with the target CRTC. If allocation
fails then fall back to allocation from all free planes.
drm: rcar-du: Keep plane to CRTC associations when disabling a plane
Changing the plane to CRTC associations requires restarting the CRTC
group, creating visible flicker. Mitigate the issue by changing plane
association only when a plane becomes enabled, not when it get disabled.
drm: rcar-du: Embed rcar_du_planes structure into rcar_du_group
The rcar_du_planes structure contains a single field and is only
instantiated in the rcar_du_group structure. Embed it directly and
remove the rcar_du_planes structure.
drm: rcar-du: Move properties from rcar_du_planes to rcar_du_device
The plane property objects are instantiated once per CRTC group, while
they should be instantiated once globally for the device. Fix this and
move them to the rcar_du_device structure.
Laurent Pinchart [Wed, 13 May 2015 21:31:07 +0000 (00:31 +0300)]
drm: adv7511: Fix crash in IRQ handler when no encoder is associated
The ADV7511 is probed before its slave encoder init function associates
it with an encoder. This creates a time window during which hot plug
detection interrupts can occur with an encoder, resulting in a crash in
the IRQ handler.
Fix this by ignoring hot plug detection IRQs when no encoder is
associated yet.
it is better to be explicit when sharing hardcoded values such as
throttle/boost timeouts. Make it so!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Thu, 21 May 2015 20:01:45 +0000 (21:01 +0100)]
drm/i915: Use the correct destructor for freeing requests on error
After allocating from the slab cache, we then need to free the request
back into the slab cache upon error (and not call kfree as that leads
to eventual memory corruption).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a mplayer video failure reported with xv.
This is because there is a request to do both plane scaling
and colorkey. Because skl hw doesn't support plane scaling
and colorkey at the same time, request is failed which is expected
behavior.
To make xv operate, this patch allows colorkey continue to work
without using scaler. Then behavior would be similar to platforms
without plane scaler support.
Signed-off-by: Chandra Konduru <chandra.konduru@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90449
[danvet: change can_scale to bool as requested by Ville.] Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 19 May 2015 17:32:57 +0000 (20:32 +0300)]
drm/i915: Enable GTT caching on gen8
GTT caching was disabled by default on gen8 due to not working with
big pages. Some information suggests that it got fixed, but still
GTT caching has been left disabled by default. Or could be it just
meant that the default was changed to off, and hence the problem
got solved.
Enable GTT caching in the hopes of some performance increase.
Whether or not the big pages issue has been fixed is irrelevant
at this stage since we don't use big pages.
This gives me a 1-2% improvement in xonotic on my BSW. Haven't tried
BDW, but supposedly it has larger TLBs so might not benefit as much.
On HSW GTT caching is enabled by default.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 19 May 2015 17:32:56 +0000 (20:32 +0300)]
drm/i915: Move WaProgramL3SqcReg1Default:bdw to init_clock_gating()
GEN8_L3SQCREG1 isn't saved in the context (verified by going through
a context dump), and so we shouldn't be using the ring w/a code to
initialize it. Also Bspec explicitly talks about MMIO and writing it
with the CPU.
Additionally there's another w/a WaTempDisableDOPClkGating:bdw which
tells us to disable DOP clock gating around the GEN8_L3SQCREG1 write
to make sure everyone notices the change. So let's do that as well.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 19 May 2015 17:32:55 +0000 (20:32 +0300)]
drm/i915: Use ilk_init_lp_watermarks() on BDW
We're not using ilk_init_lp_watermarks() on BDW for some reason.
Probably due to the BDW patches and the relevant WM patches landing
roughlly at the same time. Fix it up.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:36 +0000 (17:17 +0300)]
drm/i915: Disable SDVO port after the pipe on PCH platforms
While at it also remove the redundant/unneeded w/a like done for hdmi
already.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: Mention that this also removes the unneeded w/a, as suggested
by Jesse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:35 +0000 (17:17 +0300)]
drm/i915: Disable HDMI port after the pipe on PCH platforms
BSpec says we should disable all ports after the pipe on PCH
platforms. Do so. Fixes a pipe off timeout on ILK now caused by
the transcoder B workaround.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:34 +0000 (17:17 +0300)]
drm/i915: Fix the IBX transcoder B workarounds
Currently the IBX transcoder B workarounds are not working correctly.
Well, the HDMI one seems to be working somewhat, but the DP one is
definitely busted.
After a bit of experimentation it looks like the best way to make this
work is first disable the port on transcoder B, and then re-enable it
transcoder A, and immediately disable it again.
We can also clean up the code by noting that we can't be called without
a valid crtc. And also note that port A on ILK does not need the
workaround, so let's check for that one too.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:33 +0000 (17:17 +0300)]
drm/i915: Write the SDVO reg twice on IBX
On IBX the SDVO/HDMI register write may be masked when enabling the
port, so it may need to written twice. The HDMI code does this, but
the SDVO code does not. Add the workaround to the SDVO code as well.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:31 +0000 (17:17 +0300)]
drm/i915: Fix DP enhanced framing for CPT
Currently we're always enabling enhanced framing on CPT even if the sink
doesn't support it. Fix this up by actaully looking at what the sink
tells us.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:29 +0000 (17:17 +0300)]
drm/i915: Clarfify the DP code platform checks
intel_dp.c is a mess with all the checks for different
platform/PCH variants and ports. Try to clean it up by recognizing
the following facts:
- IVB port A, and CPT port B/C/D are always the special cases
- VLV/CHV don't have port A
- Using the same kind of logic everywhere makes things much easier to
parse
So let's move the IVB port A and PCH port B/C/D checks to be done first,
and let the other cases fall through, and always check for these things
using the same logic.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:28 +0000 (17:17 +0300)]
drm/i915: Remove the double register write from intel_disable_hdmi()
IBX can have problems with the first write to the port register getting
masked when enabling the port. We are trying to apply the workaround
also when disabling the port where it's not needed, and we also try
to apply it for CPT/PPT as well which don't need it. Just kill it.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: Resolve conflict with the remove CHV if block.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Tue, 5 May 2015 14:17:27 +0000 (17:17 +0300)]
drm/i915: Remove a bogus 12bpc "toggle" from intel_disable_hdmi()
The IBX 12bpc port enable toggle is only relevant when enabling
the port, not when disabling it. Also this code doesn't actually
toggle anything, and essentially just writes the port register
one extra time. Furthermore CPT/PPT don't need such workarounds
and yet we include them. Just kill it.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Damien Lespiau [Thu, 21 May 2015 15:37:48 +0000 (16:37 +0100)]
drm/i915/skl: Deinit/init the display at suspend/resume
We need to re-init the display hardware when going out of suspend. This
includes:
- Hooking the PCH to the reset logic
- Restoring CDCDLK
- Enabling the DDB power
Among those, only the CDCDLK one is a bit tricky. There's some
complexity in that:
- DPLL0 (which is the source for CDCLK) has two VCOs, each with a set
of supported frequencies. As eDP also uses DPLL0 for its link rate,
once DPLL0 is on, we restrict the possible eDP link rates the chosen
VCO.
- CDCLK also limits the bandwidth available to push pixels.
So, as a first step, this commit restore what the BIOS set, until I can
do more testing.
In case that's of interest for the reviewer, I've unit tested the
function that derives the decimal frequency field:
for (i = 0; i < ARRAY_SIZE(freqs); i++)
test_freq(&freqs[i]);
return 0;
}
v2:
- Rebase on top of -nightly
- Use (freq - 1000) / 500 for the decimal frequency (Ville)
- Fix setting the enable bit of HSW_NDE_RSTWRN_OPT (Ville)
- Rename skl_display_{resume,suspend} to skl_{init,uninit}_cdclk to
be consistent with the BXT code (Ville)
- Store boot CDCLK in ddi_pll_init (Ville)
- Merge dev_priv's skl_boot_cdclk into cdclk_freq
- Use LCPLL_PLL_LOCK instead of (1 << 30) (Ville)
- Replace various '0' by SKL_DPLL0 to be a bit more explicit that
we're programming DPLL0
- Busy poll the PCU before doing the frequency change. It takes about
3/4 cycles, each separated by 10us, to get the ACK from the CPU
(Ville)
v3:
- Restore dev_priv->skl_boot_cdclk, leaving unification with
dev_priv->cdclk_freq for a later patch (Daniel, Ville)
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Mon, 27 Apr 2015 12:41:24 +0000 (13:41 +0100)]
drm/i915: Free RPS boosts for all laggards
If the client stalls on a congested request, chosen to be 20ms old to
match throttling, allow the client a free RPS boost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
[danvet: s/0/NULL/ reported by 0-day build] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Mon, 27 Apr 2015 12:41:23 +0000 (13:41 +0100)]
drm/i915: Don't downclock whilst we have clients waiting for GPU results
If we have clients stalled waiting for requests, ignore the GPU if it
signals that it should downclock due to low load. This helps prevent
the automatic timeout from causing extremely long running batches from
taking even longer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Mon, 27 Apr 2015 12:41:22 +0000 (13:41 +0100)]
drm/i915: Convert RPS tracking to a intel_rps_client struct
Now that we have internal clients, rather than faking a whole
drm_i915_file_private just for tracking RPS boosts, create a new struct
intel_rps_client and pass it along when waiting.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Mon, 27 Apr 2015 12:41:21 +0000 (13:41 +0100)]
drm/i915: Limit mmio flip RPS boosts
Since we will often pageflip to an active surface, we will often have to
wait for the surface to be written before issuing the flip. Also we are
likely to wait on that surface in plenty of time before the vblank.
Since we have a mechanism for boosting when a flip misses the expected
vblank, curtain the number of times we RPS boost when simply waiting for
mmioflip.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Mon, 27 Apr 2015 12:41:20 +0000 (13:41 +0100)]
drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts
Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).
v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.
Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k: 275.794µs
After: Time to read-read 1024k: 123.260µs
hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k: 230.433µs
After: Time to read-read 1024k: 124.593µs
bdw-u (w/o semaphores): Before After
Time to read-read 1x1: 26.274µs 10.350µs
Time to read-read 128x128: 40.097µs 21.366µs
Time to read-read 256x256: 77.087µs 42.608µs
Time to read-read 512x512: 281.999µs 181.155µs
Time to read-read 1024x1024: 1196.141µs 1118.223µs
Time to read-read 2048x2048: 5639.072µs 5225.837µs
Time to read-read 4096x4096: 22401.662µs 21137.067µs
Time to read-read 8192x8192: 89617.735µs 85637.681µs
Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter [Thu, 21 May 2015 12:21:25 +0000 (14:21 +0200)]
drm/i915: s/\<rq\>/req/g
The merged seqno->request conversion from John called request
variables req, but some (not all) of Chris' recent patches changed
those to just rq. We've had a lenghty (and inconclusive) discussion on
irc which is the more meaningful name with maybe at most a slight bias
towards req.
Given that the "don't change names without good reason to avoid
conflicts" rule applies, so lets go back to a req everywhere for
consistency. I'll sed any patches for which this will cause conflicts
before applying.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com>
[danvet: s/origina/merged/ as pointed out by Chris - the first
mass-conversion patch was from Chris, the merged one from John.] Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
v2:
- set the override disable flag too on stepping F0 (mika)
Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Spotted-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Wed, 20 May 2015 13:12:47 +0000 (14:12 +0100)]
drm/i915: Force clean compilation with -Werror
Our driver compiles clean (nowadays thanks to 0day) but for me, at least,
it would be beneficial if the compiler threw an error rather than a
warning when it found a piece of suspect code. (I use this to
compile-check patch series and want to break on the first compiler error
in order to fix the patch.)
v2: Kick off a new "Debugging" submenu for i915.ko
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Add "DRM i915" to the menu name as requested by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Russell King [Wed, 20 May 2015 19:51:41 +0000 (20:51 +0100)]
drm/i2c: tda998x: fix compiler warning for ssize_t
Stephen Rothwell reports that he sees a compiler warning on x86_64:
drivers/gpu/drm/i2c/tda998x_drv.c: In function 'tda998x_write_avi':
drivers/gpu/drm/i2c/tda998x_drv.c:647:3: warning: format '%d' expects argument of type 'int', but argument 3 has type 'ssize_t' [-Wformat=]
dev_err(&priv->hdmi->dev, "hdmi_avi_infoframe_pack() failed: %d\n", len);
^
Fix this by using the appropriate length modifier.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sonika Jindal [Wed, 20 May 2015 08:10:48 +0000 (13:40 +0530)]
drm/i915/skl: Swapping 90 and 270 to be compliant with Xrandr
Since DRM_ROTATE is counter clockwise (which is compliant with Xrandr),
and HW rotation is clockwise, swapping 90/270 to work as expected from
userspace.
v2: Rebased
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Sonika Jindal <sonika.jindal@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Damien Lespiau [Tue, 19 May 2015 11:29:16 +0000 (12:29 +0100)]
drm/i915: Tighten the exposure ARGB/ABGR 8888 formats
ARGB8888 is used for cursors on all platforms so we need to allow it
everywhere.
ABGR8888 is currently only honoured:
- on VLV/CHV in sprite planes
- on SKL+ for primary and sprite planes
so only allow it for those platforms.
Note that we only support ARGB8888/ABGR8888 on the primary plane for
SKL/BXT because we have in line of sight the pipe bottom color on those
platforms and because the primary plane programming on VLV/CHV doesn't
anything different for those formats today.
v2: Fix the logic to forbid the creation ABGR2101010 fbs (Ville)
v3: Still allow the creation of ARGB8888 fbs now that cursor planes use
real fb objects (found by PRTS).
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Vandana Kannan [Wed, 13 May 2015 06:48:52 +0000 (12:18 +0530)]
drm/i915/bxt: Port PLL programming BUN
BUN 1: prop_coeff, int_coeff, tdctargetcnt programming updated and tied to
VCO frequencies. Program i_lockthresh in PORT_PLL_9.
VCO calculated based on the formula:
Desired Output = Port bit rate in MHz (DisplayPort HBR2 is 5400 MHz)
Fast Clock = Desired Output / 2
VCO = Fast Clock * P1 * P2
Prop_coeff, int_coeff, and tdctargetcnt modified according to above
calculation.
BUN 2: Port PLLs require additional programming at certain frequencies -
DCO amplitude in PORT_PLL_10
Review comments from Siva which were addressed in the initial version of the
patch.
- Change PORT_PLL_LOCK_THRESHOLD to PORT_PLL_LOCK_THRESHOLD_MASK
- Calculate for HDMI
- Correct values for vco = 5.4
- return in case of invalid vco range
v2: Imre's review comments addressed
- change dcoampovr_en to dcoampovr_en_h
- change PORT_PLL_DCO_AMP_OVR_EN to PORT_PLL_DCO_AMP_OVR_EN_H
- Correct lane stagger value for 324MHz
- Make coef common for HDMI and DP
- remove superfluous comments
v3: Imre's comments addressed
- Remove Prop_coeff, int_coeff, tdctargetcnt, dcoampovr_en, gain_ctl,
dcoampovr_en_h from bxt_clk_div and make them local variables.