git.karo-electronics.de Git - linux-beck.git/log

drm/i915: fix forcewake related hangs on snb

... by adding seemingly redudant posting reads.

This little dragon lair exploded the first time around when we've
refactored the code a bit to use the common wait_for_atomic_us in
"drm/i915: Group the GT routines together in both code and vtable",
which caused QA to file fdo bug #51738.

Chris Wilson entertained a few approaches to fixing #51738: Replacing
the udelay(1) with the previously-used udelay(10) (or any other
"sufficiently larger" delay), adding a posting read, or ditching the
delay completely and using cpu_relax. We went with the cpu_relax and
"915: Workaround hang with BSD and forcewake on SandyBridge". Which
blew up in fdo bug #52424, but adding the posting read while still
using cpu_relax seems to also fix that, it looks like the
posting read is the important ingriedient to fix these rc6 related
hangs on snb.

Popular theories as to why this is like it is include:
- A herd of pink elephants got royally angered somehow.

- The gpu has internally different functional units and judging by the
  register offsets, the forcewake request register and the forcewake
  ack registers are _not_ in the same functional unit (or at least
  aren't reached through the same routes). Hence the posting read
  syncs up with the wrong block and gets the entire gpu confused.

- ...

As a minimal ducttape fix for 3.6, let's just put these posting reads
into place again. We can try fancier approaches (like adding back the
cpu_relax instead of the udelay) in -next.

This (re-)fixes a regression introduced in

commit 990bbdadabaa51828e475eda86ee5720a4910cc3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jul 2 11:51:02 2012 -0300

    drm/i915: Group the GT routines together in both code and vtable

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Du Yan <yanx.du@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52424
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51738u
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

i915: Remove silly test

drv_priv->gmbus is an array. Comparing it with NULL is somewhat less useful
than a chocolate teapot.

Possibly we should be testing bus != NULL each iteration of the loop
instead ?

gcc could help by warning too!

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

i915: fix error path leak in intel_sdvo_write_cmd

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

vlv: it might be wise if we initialised the flag value...

Otherwise our initial behaviour is "randomly save a bogus PLL
choice" as far as I can see.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: unbreak lastclose for failed driver init

We now refuse to load on gen6+ if kms is not enabled:

commit 26394d9251879231b85e6c8cf899fa43e75c68f1
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Mar 26 21:33:18 2012 +0200

    drm/i915: refuse to load on gen6+ without kms

Which results in the drm core calling our lastclose function to clean
up the mess, but that one is neatly broken for such failure cases
since kms has been introduced in

commit 79e539453b34e35f39299a899d263b0a1f1670bd
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Nov 7 14:24:08 2008 -0800

    DRM: i915: add mode setting support

Reported-and-tested-by: Paulo Zanoni <przanoni@gmail.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Set the context before setting up regs for the context.

Fixes failures in transform feedback on gen7 because our SOL_RESET
flag was setting the transform feedback offsets in the old context
(occasionally happened to be ours) instead of the new context.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: constify mode in crtc_mode_fixup

Laurent Pinchart missed this when sending in is giant constify patch:

commit e811f5ae19043b2ac2c28e147a4274038e655598
Author: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Date: Tue Jul 17 17:56:50 2012 +0200

drm: Make the .mode_fixup() operations mode argument a const pointer

Acked-by; Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/lvds: ditch ->prepare special case

LVDS is the first output where dpms on/off and prepare/commit don't
perfectly match. Now the idea behind this special case seems to be
that for simple resolution changes on the LVDS we don't need to stop
the pipe, because (at least on newer chips) we can adjust the panel
fitter on the fly.

There are a few problems with the current code though:
- We still stop and restart the pipe unconditionally, because the crtc
  helper code isn't flexible enough.
- We show some ugly flickering, especially when changing crtcs (this
  the crtc helper would actually take into account, but we don't
  implement the encoder->get_crtc callback required to make this work
  properly).

So it doesn't even work as advertised. I agree that it would be nice
to do resolution changes on LVDS (and also eDP) whithout blacking the
screen where the panel fitter allows to do that. But imo we should
implement this as a special case a few layers up in the mode set code,
akin to how we already detect simple framebuffer changes (and only
update the required registers with ->mode_set_base).

Until this is all in place, make our lives easier and just rip it out.

Also note that this seems to fix actual bugs with enabling the lvds
output, see:

http://lists.freedesktop.org/archives/intel-gfx/2012-July/018614.html

Cc: Takashi Iwai <tiwai@suse.de>
Cc: Giacomo Comes <comes@naic.edu>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: dereferencing an error pointer

We need to check that "ctx" is a valid pointer before dereferencing it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix invalid reference handling of the default ctx obj

Otherwise we end up trying to unpin a freed object and BUG.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Add -EIO to the list of known errors for __wait_seqno

This prevents a WARN introduced with

  commit de2b998552c1534e87bfbc51ec5734b02bc89020
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Wed Jul 4 22:52:50 2012 +0200

      drm/i915: don't return a spurious -EIO from intel_ring_begin

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Flush the context object from the CPU caches upon switching

The issue is that we stale data in the CPU caches, when we come to
swap-out the object, the CPU may short-circuit the reads from those
cacheline and so corrupt the context object.

Secondary, leaving the context object as being marked in the CPU write
domain whilst on the GPU active list is a bad idea and will throw
warnings later.

Note: Thanks to calling set_to_gtt_domain with write = false and not
setting any gpu write domain when putting a context object onto the
active list (when we switch away from it) the set_to_gtt_domain call
won't block.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added a note to the commit message and a comment in the code
to explain the clever non-blocking trick.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Make the lock for pageflips interruptible

As we take the struct_mutex lock to access the command-stream, there is
a possibility that we may need to wait for a GPU hang and so should make
the lock both interruptible and error-checking.

References: https://bugs.freedesktop.org/show_bug.cgi?id=50069
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't forget the PCH backlight registers

When we enable/disable the CPU backlight registers we can't forget to
enable/disable the PCH backlight registers. Since we're using the CPU
registers we should also unset the override bit.

Fixes a regression on the following commit:
drm/i915: properly enable the blc controller on the right pipe

The commit just deleted the code that sets the PCH registers, so it
was relying on the values set by the BIOS. I told my BIOS to boot on
the DVI monitor instead of the LVDS panel, so I noticed the bug.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Insert a flush between batches if the breadcrumb was dropped

If we drop the breadcrumb request after a batch due to a signal for
example we aim to fix it up at the next opportunity. In this case we
emit a second batchbuffer with no waits upon the first and so no
opportunity to insert the missing request, so we need to emit the
missing flush for coherency. (Note that that invalidating the render
cache is the same as flushing it, so there should have been no
observable corruption.)

Note that beside simply adding the missing flush, avoiding potential
render corruption, this will also fix at least parts of the problem
introduced by some funny interaction of these two commits:

commit de2b998552c1534e87bfbc51ec5734b02bc89020
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jul 4 22:52:50 2012 +0200

    drm/i915: don't return a spurious -EIO from intel_ring_begin

which allowed intel_ring_begin to return -ERESTARTSYS and

commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which essentially disabled the flushing list.

The issue happens when we submit a batch & emit it, but get
interrupted (thanks to the first patch) while trying to emit the
flush. On the next batch we still assume that the full gpu domain
handling is in effect and hence compute the invalidate&flushing
domains. But thanks to the 2nd patch we totally ignore these and only
invalidate all gpu domains, presuming that any required flushes have
been issued already.  Which is wrong and eventually results in us
updating the new write_domain values with the computed
pending_write_domain values, which leaves an object with write_domain
== 0 on the gpu_write_list.

As soon as we try to unbind that object, things blow up.

Fix this by emitting the missing flush according to the new
ring->gpu_caches_dirty flag.

Note that this does _not_ fix all the current cases where we end up
with an object on the flushing_list that can't be flushed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add bug explanation to commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: missing error case in init status page

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: mask tiled bit when updating ILK sprites

Or going from tiled to untiled may break.

Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: try to train DP even harder

While debugging Haswell link train failures I observed that we never
try the maximum voltage configuration more than once consecutively. We
start the training, the monitor keeps telling us to increase the
voltage, then when we reach the maximum we just go back to the start
(because of the "memset" above "voltage_tries = 0"). When we reach
this point, we keep alternating between the maximum and the minimum
voltages until we give up.

The DP spec suggests that we should try the same voltage 5 times
before giving up. This patch makes us try the maximum voltage at
least 5 times before going back to the minimum voltages.

This patch does not fix any particular bug I'm aware of.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: kill intel_ddc_probe

We have way too much lying hardware to rely on a simple "does someone
answer on the ddc i2c address?" check. And now it's unused, so just
kill it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: check whether we actually received an edid in detect_ddc

Somehow detect_ddc manages to fall through all checks when we think
that something responds on the ddc i2c address, but the edid read
failed. Fix this up by explicitly checking for this case.

This fixes a regression on newer chips because since

commit aaa377302b2994fcc2c66741b47da33feb489dca
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sat Jun 16 15:30:32 2012 +0200

drm/i915/crt: Do not rely upon the HPD presence pin

we use ddc detection also on hotplug capable platforms. And one of
these reads all 0s for any i2c transaction if nothing is connected to
the vga port.

v2: Implement Chris Wilson's review:
- simplify logic, default to "nothing detected"
- kill stale comment
- BUG_ON(!crt->type != ANALOG)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51900
Tested-by: Yang Guang <guang.a.yang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix up PCH backlight #define mixup

I so totally suck.

This can cause a black screen if (for whatever reason) the bios
hasn't set this bit itself.

This regression has been introduced in

commit 7cf4160148136deb31ee5f2802857dd935a38529
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jun 5 10:07:09 2012 +0200

drm/i915: clear up backlight #define confusion on gen4+

Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Add comments to explain the BSD tail write workaround

Having had to dive into the bspec to understand what each stage of the
workaround meant, and how that the ring broadcasting IDLE corresponded
with the GT powering down the ring (i.e. rc6) add comments to aide
the next reader.

And since the register "is used to control all aspects of PSMI and power
saving functions" that makes it quite interesting to inspect with
regards to RC6 hangs, so add it to the error-state.

v2: Rediscover the piece of magic, set the RNCID to 0 before waiting for
the ring to wake up.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Disable the BLT on pre-production SNB hardware

It never quite worked despite the numerous workarounds, yet I still see
people trying to use this hardware and filing bug reports. As we no
longer even try to implement the workarounds, since 6a233c78878
(drm/i915/ringbuffer: kill snb blt workaround), simply disable the ring.

v2: Add a message to inform the user about the limited capabilities of
their pre-production hardware.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: initialize power wells in modeset_init_hw

This initializes power wells within the modeset_init_hw routine.
Testing has shown that this works for both driver load time and for
suspend-resume code paths.

Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Only request PM interrupts for the events we handled

There is little point waking up every 10ms to service an interrupt which
we then promptly ignore. So only program the the PMIER to enable
interrupts for those events which we do handle, not all of them!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eugeni Dodonov <eugeni.dodonov@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/context: Add missing IVB context sizes

There were some fields missed. Daniel pointed this out in review, and I
know I fixed it, but something happened somehow and some time.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/context/: s/CTX/CXT

*sigh* the docs had it spelled wrong, corrected it, and then proceeded
to re-do the original error. The original code preserved this history,
and this patch attempts to keep in sync with the current docs.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/sis: fixup sis_mm ioctl structs

Userspace uses long in quite a few places more than the kernel. Which
gives me neat proof that I'm the only guy on this side of the galaxy
who ever tried to run glxgears on a 64bit machine with sis graphics on
linux.

Note that the longs in drm_sis_mem_t aren't aligned properly, so this
won't even work with 32bit userspace on 64bit kernel as-is. Hence the
patch can't break that, either.

Nope, I'm not nuts enough to write the 32bit ioctl compat layer for
this and test it with some wine app. Even though hunting the ebay
dungeons for a sis card actually supported by the mesa drivers casts
some doubts on this ...

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: kill i915/i830 ids from drm_pciids.h

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: unconditionally clean up dma buffers of closing clients

With the last patch to ditch DMA_QUEUE support, we should be able
to call the dma cleanup uncoditionally, even when the master has
disappeared.

Do so because it just makes more sense.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: kill dma queue support

Absolutely unused. All the values are only ever initialized and
then used at most in some debug printout functions.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: ditch strange DRIVER_DMA_QUEUE only error bail-out

Only one driver (i810) even sets that flag. Now the actual locking
code uncoditionally promotes lock->context to an unsigned int.

Closer inspection of the userspace reveals that the drm lock context
is defined as an unsigned int (at least on linux). I suspect we just
have a strange case of signedness confusion going on.

Tested on my i815, doesn't seem to break anything.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: kill reclaim_buffers callback

All leftover users either haven't set DRIVER_HAVE_DMA, in which
case this will never be called, or use the drm_core implementation.

Call that directly in the only callsite.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/savage: clean up reclaim_buffers

The reclaim_buffers function of the savage driver actually wants to run
with the hw_lock held - at least there are printks in the call-chain
to that effect. But the drm core only calls reclaim_buffers as used
by savage _after_ forcefully dropping the hwlock (in case it's still
hold by the closing fd).

So do the same idlelock dance as for the other dma drivers and hope
that papers over any issues.

v2: Don't let the idlelock linger around.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Tormod Volden <debian.tormod@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: kill reclaim_buffers_locked

i810 was the last user of this code, with that gone, kill it.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Revert "Revert "drm/i810: cleanup reclaim_buffers""

This reverts commit 6e877b576ddf7cde5db2e9a6dcb56fef0ea77e64,
reinstating the original commit:

commit 87499ffdcb1c70f66988cd8febc4ead0ba2f9118
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Oct 25 23:51:24 2011 +0200

    drm/i810: cleanup reclaim_buffers

    My dear old i815 always hits the deadlocked on reclaim_buffers
    warning. Switch over to the idlelock duct-tape on hope that
    works better. I've fired up my i815 and now closing glxgears doesn't
    take 5 seconds anymore. \o/

The original problem with that was that I've moved it ahead in the
series so that it could be included despite some patches not being
ready quite yet. The little problem is that this patch required some
of the previous rework to work correctly.

Now that everything is in the right order again, this actually works
on my i810 and does speed up closing gl apps as the original commit
claimed. Without hanging the machine, as the revert says.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: kill reclaim_buffers_idlelocked functions

The only two users are now folded into the drivers preclose functions,
so this is unused.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/sis: clean up reclaim_buffers

Like for via.

v2: Actually drop the idlelock again if taken.

v3: Fixup.

v4: Fixup the "has master" vs. "is master" confusion the refactor
introduced.

v5: Drop the idlelock in the early return path.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/via: clean up reclaim_buffers

A few things
- kill reclaim_buffers, it's never ever called because via does not set
  DRIVER_HAVE_DMA
- inline the idlelock dance into the buffer reclaim logic and make it
  a simple preclose cleanup function
- directly call the the dma_quiescent function and kill the needless
  if check.

v2: Actually drop the idlelock when we take it. Reported by James
Simmons.

v3: Rebased onto latest drm-next.

v4: Fixup the refactor.

v5: More fixup the refactor - I've accidentally changed the check for
any master to checking whether the closing fd is the master.

v6: Don't forget to drop the idlelock in the early return path, too.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/udl: port over blanking code from udlfb.

This ports over the dpms code from udlfb, and should mean
a better chance of turning on some udl devices.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: auto detect pcie link speed from root port

This check the root ports supported link speeds and enables
GEN2 mode if the 5.0 GT link speed is available.

The first 3.0 cards are SI so they will probably need more investigation.

Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/pci: add support for getting the supported link bw.

This should work for PCIE3.0 as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>

pci_regs: define LNKSTA2 pcie cap + bits.

We need these for detecting the max link speed for drm drivers.

Acked-by: Bjorn Helgaas <bhelgass@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: improve GPU lockup debugging info on r6xx/r7xx/r8xx/r9xx

Print various CP register that have valuable informations regarding
GPU lockup.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/mgag200: fix null pointer dereference

we are referencing the pointer after doing alloc_apertures,
as alloc_apertures kzallocs, the kzalloc may fail and we get a NULL.

so we need to check for NULL before we dereference this pointer

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns.

This could previously fail if either of the enabled displays was using a
horizontal resolution that is a multiple of 128, and only the leftmost column
of the cursor was (supposed to be) visible at the right edge of that display.

The solution is to move the cursor one pixel to the left in that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33183

Cc: stable@vger.kernel.org
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: Make the .mode_fixup() operations mode argument a const pointer

The passed mode must not be modified by the operation, make it const.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: remove the list_head from drm_mode_set

It's unused. At it confused me quite a bit until I've discovered that.

Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/fb helper: don't call drm_crtc_helper_set_config

Go through the interface vtable instead, because not everyone might be
using the crtc helper code.

Cc: dri-devel@lists.freedesktop.org
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/fb-helper: delay hotplug handling when partially bound

Ok, this requires quite a dance to actually hit:
1) We plug in a 2nd screen, enable it in both X and (by vt-switching)
in the fbcon.
2) We disable that screen again in with xrandr.
3) We vt-switch again, so that fbcon displays on the 2nd screen, but X
on the first screen. This obviously needs a driver that doesn't switch
off unused functions when regaining the VT.
3) When X controls the vt, we unplug that screen.

Now drm_fb_helper_hotplug_event we noticed that that some crtcs are
bound, but because we still have the fbcon on the 2nd screeen we also
have bound set. Which means the fbcon wrongly assumes it's in control
of everything an happily disables the output on the 2nd screen, but
enables its fb on the first screen.

Work around this issue by counting how many crtcs are bound and how
many are bound to fbcon and assuming that when fbcon isn't bound to
all of them, it better not touch the output configuration.

Conceptually this is the same as only restoring the fbcon output
configuration on the driver's ->lastclose, when we're sure that no one
else is using kms. So this should be consistent with existing kms
drivers.

Chris has created a separate patch for the intel ddx, but I think we
should fix this issue here regardless - the fbcon messing with the
output config while it's not fully in control simply isn't a too
polite behaviour.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50772
Tested-by: Maxim A. Nikulin <M.A.Nikulin@gmail.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'next' of git://people.freedesktop.org/~deathsimple/linux into drm-next

This contains all the radeon documentation rebased on top of the ib fixes.

* 'next' of git://people.freedesktop.org/~deathsimple/linux:
  drm/radeon: fix SS setup for DCPLL
  drm/radeon: fix up pll selection on DCE5/6
  drm/radeon: start to document evergreen.c
  drm/radeon: start to document the functions r100.c
  drm/radeon: document VM functions in radeon_gart.c (v3)
  drm/radeon: document non-VM functions in radeon_gart.c (v2)
  drm/radeon: document radeon_ring.c (v4)
  drm/radeon: document radeon_fence.c (v2)
  drm/radeon: document radeon_asic.c
  drm/radeon: document radeon_irq_kms.c
  drm/radeon: document radeon_kms.c
  drm/radeon: document radeon_device.c (v2)
  drm/radeon: add rptr save support for r1xx-r5xx
  drm/radeon: update rptr saving logic for memory buffers
  drm/radeon: remove radeon_ring_index()
  drm/radeon: update ib_execute for SI (v2)
  drm/radeon: fix const IB handling v2
  drm/radeon: let sa manager block for fences to wait for v2
  drm/radeon: return an error if there is nothing to wait for

drm/radeon: fix SS setup for DCPLL

Need to actually set the SS parameters rather than just 0.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: fix up pll selection on DCE5/6

Selecting ATOM_PPLL_INVALID should be equivalent as the
DCPLL or PPLL0 are already programmed for the DISPCLK, but
the preferred method is to always specify the PLL selected.
SetPixelClock will check the parameters and skip the
programming if the PLL is already set up.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: start to document evergreen.c

Still a lot to do.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: start to document the functions r100.c

Still a lot more to do.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document VM functions in radeon_gart.c (v3)

Document the VM functions in radeon_gart.c

v2: adjust per Christian's suggestions
v3: adjust to Christians's latest changes

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document non-VM functions in radeon_gart.c (v2)

Document the non-VM functions in radeon_gart.c

v2: adjust per Christian's suggestions

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document radeon_ring.c (v4)

Adds documentation to most of the functions in
radeon_ring.c

v2: adjust per Christian's suggestions
v3: adjust per Christian's latest patches
v4: adjust per my latest changes

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document radeon_fence.c (v2)

Adds documentation to most of the functions in
radeon_fence.c

v2: address Christian's comments:
- split common concept description into it's own comment
- fix description of intr parameter
- Improve description of -EDEADLK error

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document radeon_asic.c

Adds documentation to most of the functions in
radeon_asic.c

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document radeon_irq_kms.c

Adds documentation to most of the functions in
radeon_irq_kms.c

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document radeon_kms.c

Adds documentation to most of the functions in
radeon_kms.c

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: document radeon_device.c (v2)

Adds documentation to most of the functions in
radeon_device.c

v2: split out general descriptions as per Christian's
comments.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: add rptr save support for r1xx-r5xx

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: update rptr saving logic for memory buffers

Add support for using memory buffers rather than
scratch registers. Some rings may not be able to
write to scratch registers.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: remove radeon_ring_index()

Just store the index in the ring structure.
Idea taken from one of Jerome's wip rptr patches.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: update ib_execute for SI (v2)

When submitting a CONST_IB, emit a SWITCH_BUFFER
packet before the CONST_IB.  This isn't strictly necessary
(the driver will work fine without it), but is good practice
and allows for more flexible DE/CE sychronization options
in the future.  Current userspace drivers do not take
advantage of the CE yet.

v2: - clean up code flow a bit
    - no need to flush caches for CONST IB

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

drm/radeon: fix const IB handling v2

Const IBs are executed on the CE not the CP, so we can't
fence them in the normal way.

So submit them directly before the IB instead, just as
the documentation says.

v2: keep the extra documentation

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: let sa manager block for fences to wait for v2

Otherwise we can encounter out of memory situations under extreme load.

v2: add documentation for the new function

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: return an error if there is nothing to wait for

Otherwise the sa managers out of memory
handling doesn't work.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm: Disallow DRM_IOCTL_MODESET_CTL for KMS drivers

DRM_IOCTL_MODESET_CTL must only be used for UMS drivers. Make it a no-op
for KMS drivers.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Dave Airlie <airlied@gmail.com>

Merge branch 'next' of git://people.freedesktop.org/~deathsimple/linux into drm-next

This merges Christian work that has been hanging around on the list.

drm/radeon: implement ring saving on reset v4

Try to save whatever is on the rings when
we encounter an lockup.

v2: Fix spelling error. Free saved ring data if reset fails.
Add documentation for the new functions.
v3: Some more spelling fixes
v4: It doesn't make sense to save anything if all fences
are signaled

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: record what is next valid wptr for each ring v4

Before emitting any indirect buffer, emit the offset of the next
valid ring content if any. This allow code that want to resume
ring to resume ring right after ib that caused GPU lockup.

v2: use scratch registers instead of storing it into memory
v3: skip over the surface sync for ni and si as well
v4: use SET_CONFIG_REG instead of PACKET0

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: move radeon_ib_ring_tests out of chipset code

Making it easier to control when it is executed.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: remove vm_manager start/suspend

Just restore the page table instead. Addressing three
problem with this change:

1. Calling vm_manager_suspend in the suspend path is
   problematic cause it wants to wait for the VM use
   to end, which in case of a lockup never happens.

2. In case of a locked up memory controller
   unbinding the VM seems to make it even more
   unstable, creating an unrecoverable lockup
   in the end.

3. If we want to backup/restore the leftover ring
   content we must not unbind VMs in between.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: remove r600_blit_suspend

Just reinitialize the shader content on resume instead.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: remove ip_pool start/suspend

The IB pool is in gart memory, so it is completely
superfluous to unpin / repin it on suspend / resume.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: make cp init on cayman more robust

It's not critical, but the current code isn't
100% correct.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: remove FIXME comment from chipset suspend

For a normal suspend/resume we allready wait for
the rings to be empty, and for a suspend/reasume
in case of a lockup we REALLY don't want to wait
for anything.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix fence init after resume

Start with last signaled fence number instead
of last emitted one.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix fence value access

It is possible that radeon_fence_process is called
after writeback is disabled for suspend, leading
to an invalid read of register 0x0.

This fixes a problem for me where the fence value
is temporary incremented by 0x100000000 on
suspend/resume.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix ring commit padding

We don't need to pad anything if the number of dwords
written to the ring already matches the requirements.

Fixes some "writting more dword to ring than expected"
warnings.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: add an exclusive lock for GPU reset v2

GPU reset need to be exclusive, one happening at a time. For this
add a rw semaphore so that any path that trigger GPU activities
have to take the semaphore as a reader thus allowing concurency.

The GPU reset path take the semaphore as a writer ensuring that
no concurrent reset take place.

v2: init rw semaphore

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix fence related segfault in CS

Don't return success if scheduling the IB fails, otherwise
we end up with an oops in ttm_eu_fence_buffer_objects.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/radeon: add error handling to radeon_vm_unbind_locked

Waiting for a fence can fail for different reasons,
the most common is a deadlock.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: add error handling to fence_wait_empty_locked

Instead of returning the error handle it directly
and while at it fix the comments about the ring lock.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

drm: Add colouring to the range allocator

In order to support snoopable memory on non-LLC architectures (so that
we can bind vgem objects into the i915 GATT for example), we have to
avoid the prefetcher on the GPU from crossing memory domains and so
prevent allocation of a snoopable PTE immediately following an uncached
PTE. To do that, we need to extend the range allocator with support for
tracking and segregating different node colours.

This will be used by i915 to segregate memory domains within the GTT.

v2: Now with more drm_mm helpers and less driver interference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>

drm: fail gracefully when proc isn't setup.

If drm can't find proc it should fail more gracefully, than just
oopsing, this tests drm_class is NULL, and sets it to NULL in the
fail paths.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge tag 'drm-intel-next-2012-07-06' of git://people.freedesktop.org/~danvet/drm-intel into drm-next

Daniel writes:
New pull for -next. Highlights:
- rc6/turbo support for hsw (Eugeni)
- improve corner-case of the reset handling code - gpu reset handling
  should be rock-solid now
- support for fb offset > 4096 pixels on gen4+ (yeah, you need some fairly
  big screens to hit that)
- the "Flush Me Harder" patch to fix the gen6+ fallout from disabling the
  flushing_list
- no more /dev/agpgart on gen6+!
- HAS_PCH_xxx improvements from Paulo
- a few minor bits&pieces all over, most of it in thew hsw code

* tag 'drm-intel-next-2012-07-06' of git://people.freedesktop.org/~danvet/drm-intel: (40 commits)
  drm/i915: program FDI_RX TP and FDI delays
  drm/i915: introduce for_each_encoder_on_crtc
  drm/i915: adjust framebuffer base address on gen4+
  drm/i915: introduce crtc->dspaddr_offset
  drm/i915: Reject page flips with changed format/offset/pitch
  drm/i915: Zero initialize mode_cmd
  drm/i915: don't return a spurious -EIO from intel_ring_begin
  drm/i915: properly SIGBUS on I/O errors
  drm/i915: don't hang userspace when the gpu reset is stuck
  drm/i915: non-interruptible sleeps can't handle -EAGAIN
  drm/i915: don't trylock in the gpu reset code
  drm/i915: fix PIPE_DDI_PORT_MASK
  drm/i915: prevent bogus intel_update_fbc notifications
  drm/i915: re-initialize DDI buffer translations after resume
  drm/i915: don't ironlake_init_pch_refclk() on LPT
  drm/i915: get rid of dev_priv->info->has_pch_split
  drm/i915: add PCH_NONE to enum intel_pch
  drm/i915: prefer wide & slow to fast & narrow in DP configs
  drm/i915: fix up ilk rc6 disabling confusion
  drm/i915: move force wake support into intel_pm
  ...

drm/i915: program FDI_RX TP and FDI delays

This is required for a stable FDI connection.

v2: fix and simplify the FDI_RX_MISC bits as noticed by Paulo Zanoni.

CC: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: introduce for_each_encoder_on_crtc

We already have this pattern at quite a few places, and moving part of
the modeset helper stuff into the driver will add more.

v2: Don't clobber the crtc struct name with the macro parameter ...

v3: Convert two more places noticed by Paulo Zanoni.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: adjust framebuffer base address on gen4+

The tileoffset register only supports a limited offset in x/y of 4096,
so for giant screen configuration with a shared fb we wrap around.

Fix this by computing a linear offset in tiles (pages) and only use
the tileoffset register to offset within the tile.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: introduce crtc->dspaddr_offset

To avoid recomputing the display framebuffer offset on gen2/3
pageflips. This is also prep work to do similar trickery on gen4+

Also:
- kill "Start", such upper-case remnants from the ddx must surely die.
- rename "Offset" to linear_offset, to make it clearer that on gen4+
  this is only used by the hw for linear buffers, for tiled buffers it
  uses the TILEOFF register.
- call DSAPADDR DSPLINOFF on gen4+ for the same reason (and because
  the documentation really renamed the register).

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Reject page flips with changed format/offset/pitch

MI display flips can't handle some changes in the framebuffer
format or layout. Return an error in such cases.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Zero initialize mode_cmd

Zero initialize the mode_cmd structure when creating the kernel
framebuffer. Avoids having uninitialized data in offsets[0] for
instance.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't return a spurious -EIO from intel_ring_begin

The issue with this check is that it results in userspace receiving an
-EIO while the gpu reset hasn't completed, resulting in fallback to sw
rendering or worse.

Now there's also a stern comment in intel_ring_wait_seqno saying that
intel_ring_begin should not return -EAGAIN, ever, because some callers
can't handle that. But after an audit of the callsites I don't see any
issues. I guess the last problematic spot disappeared with the removal
of the pipelined fencing code.

So do the right thing and call check_wedge, which should properly
decide whether an -EAGAIN or -EIO is appropriate if wedged is set.

Note that the early check for a wedged gpu before touching the ring is
rather important (and it took me quite some time of acting like the
densest doofus to figure that out): If we don't do that and the gpu
died for good, not having been resurrect by the reset code, userspace
can merrily fill up the entire ring until it notices that something is
amiss.

Allowing userspace to emit more render, despite that we know that it
will fail can't lead to anything good (and by experience can lead to
all sorts of havoc, including angering the OOM gods and hard-hanging
the hw for good).

v2: Fix EAGAIN mispell, noticed by Chris Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: properly SIGBUS on I/O errors

... instead of looping endless with no hope of ever serving that
page-fault. We only need to break out of this loop when the gpu died,
to run the reset work (and hopefully resurrect it).

To clarify questions Chris raised on irc: This is about handling I/O
errors not from our own code, but e.g. when the disk died when trying
to swap in a gem bo. So this patch remidies the issue that the current
handling only handles gpu-death-induced cases of -EIO. Admittedly,
dying disks are much rarer than hanging gpus ...To clarify questions
Chris raised on irc: This is about handling I/O errors not from our
own code, but e.g. when the disk died when trying to swap in a gem bo.
So this patch remidies the issue that the current handling only
handles gpu-death-induced cases of -EIO. Admittedly, dying disks are
much rarer than hanging gpus ...

This seems to have been lost in:

commit d9bc7e9f32716901c617e1f0fb6ce0f74f172686
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Feb 7 13:09:31 2011 +0000

drm/i915: Fix infinite loop regression from 21dd3734

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't hang userspace when the gpu reset is stuck

With the gpu reset no longer using a trylock we've increased the
chances of userspace getting stuck quite a bit. To make that
(hopefully) rare case more paletable time out when waiting for the gpu
reset code to complete and signal this little issue to the caller by
returning -EIO.

This should help userspace to somewhat gracefully fall back and
hopefully allow the user to grab some logs and reboot the machine
(instead of staring at a frozen X screen in agony).

Suggested by Chris Wilson because I've been stubborn about allowing
the gpu reset code no to fail, ever (by removing the trylock).

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: non-interruptible sleeps can't handle -EAGAIN

So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
-EIO instead. Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not
even -EAGAIN. Instead of adding proper failure paths so that we could
restart these ioctls we've opted for the cheap way out of sleeping
non-interruptibly.  Which works everywhere but when the gpu dies,
which this patch fixes.

So essentially interruptible == false means 'wait for the gpu or die
trying'.'

This patch is a bit ugly because intel_ring_begin is all non-interruptible
and hence only returns -EIO. But as the comment in there says,
auditing all the callsites would be a pain.

To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
and intel_wait_ring_buffer. Also use the opportunity to clarify the
different cases in i915_gem_check_wedge a bit with comments.

v2: Don't access dev_priv->mm.interruptible from check_wedge - we
might not hold dev->struct_mutex, making this racy. Instead pass
interruptible in as a parameter. I've noticed this because I've hit a
BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
added in

commit b4aca0106c466b5a0329318203f65bac2d91b682
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Apr 25 20:50:12 2012 -0700

    drm/i915: extract some common olr+wedge code

although that commit is missing any justification for this. I guess
it's just copy&paste, because the same commit add the same BUG_ON
check to check_olr, where it indeed makes sense.

But in check_wedge everything we access is protected by other means,
so this is superflous. And because it now gets in the way (we add a
new caller in __wait_seqno, which can be called without
dev->struct_mutext) let's just remove it.

v3: Group all the i915_gem_check_wedge refactoring into this patch, so
that this patch here is all about not returning -EAGAIN to callsites
that can't handle syscall restarting.

v4: Add clarification what interuptible == fales means in our code,
requested by Ben Widawsky.

v5: Fix EAGAIN mispell noticed by Chris Wilson.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>