Francisco Jerez [Sun, 24 Oct 2010 14:14:41 +0000 (16:14 +0200)]
drm/nouveau: Rework tile region handling.
The point is to share more code between the PFB/PGRAPH tile region
hooks, and give the hardware specific functions a chance to allocate
per-region resources.
Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Francisco Jerez [Mon, 25 Oct 2010 21:38:59 +0000 (23:38 +0200)]
drm/nouveau: Add a separate class for the kernel channel mutex.
nouveau_bo_move_m2mf() needs to lock the kernel channel, and it may be
called from the pushbuf IOCTL with an user channel already locked. Use
a separate subclass for the kernel channel mutex because this is
legitimate mutex nesting.
Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Francisco Jerez [Sun, 24 Oct 2010 12:15:58 +0000 (14:15 +0200)]
drm/nv50: Keep track of the head a channel is vsync'ing to.
In a multihead setup vblank interrupts may end up enabled in both
heads. In that case we want to ignore the vblank interrupts coming
from the wrong CRTC to avoid tearing and unbalanced calls to
drm_vblank_get/put (fdo bug 31074).
Reported-by: Felix Leimbach <felix.leimbach@gmx.net> Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Marcin Slusarz [Wed, 20 Oct 2010 19:50:24 +0000 (21:50 +0200)]
drm/nouveau: fix annoying nouveau_fence type issue
nouveau_fence_* functions are not type safe, which could lead to bugs.
Additionally every use of nouveau_fence_unref had to cast struct
nouveau_fence to void **.
Fix it by renaming old functions and creating static inline functions with
new prototypes. We still need old functions, because we pass function
pointers to ttm.
As we are wrapping functions, drop unused "void *arg" parameter where possible.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 21 Oct 2010 04:07:03 +0000 (14:07 +1000)]
drm/nouveau: add support for MSI
Only supported on NV50+ so far, and disabled by default currently. The
module parameter "msi=1" will enable it.
There's a kernel bug which will cause this to fail if the module (or the
NVIDIA binary driver) has ever been loaded before loading nouveau with
MSI enabled. As such, this is only safe to enable if you have nouveau
load on boot, and don't wish to ever reload it.
The workaround is to "echo 0 > /sys/bus/pci/devices/<device>/enable"
until the enable count reads 0. Then you should be able to load nouveau
with MSI enabled.
Ben Skeggs [Wed, 20 Oct 2010 01:47:09 +0000 (11:47 +1000)]
drm/nv50: create graph and crypt contexts on demand
This really needs cleaning up somehow, and probably investigate what's
needed to do this on earlier generations. NVIDIA do something similar
there too.
Francisco Jerez [Mon, 18 Oct 2010 01:56:40 +0000 (03:56 +0200)]
drm/nouveau: Avoid race in the interchannel sync code.
It needs a "strong" channel reference because it actually writes to
the channel pushbuf, otherwise the corresponding FIFO context could
get kicked off in the middle of nouveau_fence_sync().
Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Francisco Jerez [Mon, 18 Oct 2010 01:56:14 +0000 (03:56 +0200)]
drm/nouveau: Make fences take a weak channel reference.
Fences didn't increment the channel reference count, and the fenced
channel could go away at any time. Fixes a potential race in
nouveau_fence_update().
Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Francisco Jerez [Mon, 18 Oct 2010 01:55:48 +0000 (03:55 +0200)]
drm/nouveau: Implement weak channel references.
nouveau_channel_ref() takes a "weak" channel reference that doesn't
prevent the hardware channel resources from being released, it just
keeps the channel data structure alive.
Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Francisco Jerez [Mon, 18 Oct 2010 01:54:33 +0000 (03:54 +0200)]
drm/nouveau: Fix race condition in channel refcount handling.
nouveau_channel_put() can be executed after the 'refcount == 0' check
in nouveau_channel_get() and before the channel reference count is
incremented. In that case CPU0 will take the context down while CPU1
thinks it owns the channel and 'refcount == 1'.
Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Francisco Jerez [Mon, 18 Oct 2010 01:53:39 +0000 (03:53 +0200)]
drm/nouveau: Refactor context destruction to avoid a lock ordering issue.
The destroy_context() engine hooks call gpuobj management functions to
release the channel resources, these functions use HARDIRQ-unsafe locks
whereas destroy_context() is called with the HARDIRQ-safe
context_switch_lock held, that's a lock ordering violation.
Push the engine-specific channel destruction logic into destroy_context()
and let the hardware-specific code lock and unlock when it's actually
needed. Change the engine destruction order to avoid a race in the small
gap between pgraph and pfifo context uninitialization.
Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Tue, 5 Oct 2010 06:53:48 +0000 (16:53 +1000)]
drm/nouveau: add per-channel mutex, use to lock access to drm's channel
This fixes a race condition between fbcon acceleration and TTM buffer
moves. To reproduce:
- start X
- switch to vt and "while (true); do dmesg; done"
- switch to another vt and "sleep 2 && cat /path/to/debugfs/dri/0/evict_vram"
- switch back to vt running dmesg
We don't make use of this on any other channel yet, they're currently
protected by drm_global_mutex. This will change in the near future.
Ben Skeggs [Tue, 5 Oct 2010 06:41:29 +0000 (16:41 +1000)]
drm/nouveau: disallow fbcon accel if running in interrupt context
A future commit will add locking to the DRM's channel, and there's numerous
problems that come up if we allow printk from an interrupt context to be
accelerated. It seems saner to just disallow it completely.
As a nice side-effect, all the "to accel or not to accel" logic gets moved
out of the chipset-specific code.
Dave Airlie [Fri, 3 Dec 2010 04:01:08 +0000 (14:01 +1000)]
Merge branch 'drm-radeon-fusion' of ../drm-radeon-next into drm-core-next
* 'drm-radeon-fusion' of ../drm-radeon-next:
drm/radeon/kms: add Ontario APU ucode loading support
drm/radeon/kms: add Ontario Fusion APU pci ids
drm/radeon/kms: enable MSIs on fusion APUs
drm/radeon/kms: add power table parsing support for Ontario fusion APUs
drm/radeon/kms: refactor atombios power state fetching
drm/radeon/kms: add bo blit support for Ontario fusion APUs
drm/radeon/kms: add thermal sensor support for fusion APUs
drm/radeon/kms: fill in GPU init for AMD Ontario Fusion APUs
drm/radeon/kms: add radeon_asic struct for AMD Ontario fusion APUs
drm/radeon/kms: evergreen.c updates for fusion
drm/radeon/kms: MC setup changes for fusion APUs
drm/radeon/kms: move r7xx/evergreen to its own vram_gtt setup function
drm/radeon/kms: add support for ss overrides on Fusion APUs
drm/radeon/kms: Add support for external encoders on fusion APUs
drm/radeon/kms: atom changes for DCE4.1 devices
drm/radeon/kms: add new family id for AMD Ontario APUs
drm/radeon/kms: upstream power table updates
drm/radeon/kms: upstream atombios.h updates
drm/radeon/kms: upstream ObjectID.h updates
drm/radeon/kms: setup mc chremap properly on r7xx/evergreen
Dave Airlie [Fri, 3 Dec 2010 04:00:52 +0000 (14:00 +1000)]
Merge branch 'drm-radeon-next' of ../drm-radeon-next into drm-core-next
* 'drm-radeon-next' of ../drm-radeon-next:
drm/radeon/kms: improve pflip precision on r1xx-r4xx
drm/kms/radeon: Use high precision timestamps for pageflip completion events.
drm/kms/radeon: Reorder vblank and pageflip interrupt handling.
drm/radeon/kms: add pageflip ioctl support (v3)
drm/kms/radeon: Add support for precise vblank timestamping.
Dave Airlie [Fri, 3 Dec 2010 03:59:36 +0000 (13:59 +1000)]
Merge branch 'drm-ttm-next' into drm-core-next
* drm-ttm-next:
drm/radeon: Use the ttm execbuf utilities
drm/ttm: Fix up io_mem_reserve / io_mem_free calling
drm/ttm/vmwgfx: Have TTM manage the validation sequence.
drm/ttm: Improved fencing of buffer object lists
drm/ttm/radeon/nouveau: Kill the bo lock in favour of a bo device fence_lock
drm/ttm: Don't deadlock on recursive multi-bo reservations
drm/ttm: Optimize ttm_eu_backoff_reservation
drm/ttm: Use kref_sub instead of repeatedly calling kref_put
kref: Add a kref_sub function
drm/ttm: Add a bo list reserve fastpath (v2)
Alex Deucher [Mon, 22 Nov 2010 22:56:25 +0000 (17:56 -0500)]
drm/radeon/kms: add support for ss overrides on Fusion APUs
System specific spread spectrum overrides can be specified in
the integrated system info table for Fusion APUs. This adds
support for using those overrides.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Hellstrom [Wed, 17 Nov 2010 12:38:32 +0000 (12:38 +0000)]
drm/radeon: Use the ttm execbuf utilities
Rather than re-implementing in the Radeon driver,
Use the execbuf / cs / pushbuf utilities that comes with TTM.
This comes with an even greater benefit now that many spinlocks have been
optimized away...
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Hellstrom [Thu, 11 Nov 2010 08:41:57 +0000 (09:41 +0100)]
drm/ttm: Fix up io_mem_reserve / io_mem_free calling
This patch attempts to fix up shortcomings with the current calling
sequences.
1) There's a fastpath where no locking occurs and only io_mem_reserved is
called to obtain needed info for mapping. The fastpath is set per
memory type manager.
2) If the fastpath is disabled, io_mem_reserve and io_mem_free will be exactly
balanced and not called recursively for the same struct ttm_mem_reg.
3) Optionally the driver can choose to enable a per memory type manager LRU
eviction mechanism that, when io_mem_reserve returns -EAGAIN will attempt
to kill user-space mappings of memory in that manager to free up needed
resources
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Hellstrom [Wed, 17 Nov 2010 12:28:29 +0000 (12:28 +0000)]
drm/ttm/radeon/nouveau: Kill the bo lock in favour of a bo device fence_lock
The bo lock used only to protect the bo sync object members, and since it
is a per bo lock, fencing a buffer list will see a lot of locks and unlocks.
Replace it with a per-device lock that protects the sync object members on
*all* bos. Reading and setting these members will always be very quick, so
the risc of heavy lock contention is microscopic. Note that waiting for
sync objects will always take place outside of this lock.
The bo device fence lock will eventually be replaced with a seqlock /
rcu mechanism so we can determine that a bo is idle under a
rcu / read seqlock.
However this change will allow us to batch fencing and unreserving of
buffers with a minimal amount of locking.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jerome Glisse <j.glisse@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Hellstrom [Tue, 16 Nov 2010 14:21:07 +0000 (15:21 +0100)]
kref: Add a kref_sub function
Makes it possible to optimize batched multiple unrefs.
Initial user will be drivers/gpu/ttm which accumulates unrefs to be
processed outside of atomic code.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 22 Nov 2010 03:24:40 +0000 (13:24 +1000)]
drm/ttm: Add a bo list reserve fastpath (v2)
Makes it possible to reserve a list of buffer objects with a single
spin lock / unlock if there is no contention.
Should improve cpu usage on SMP kernels.
v2: Initialize private list members on reserve and don't call
ttm_bo_list_ref_sub() with zero put_count.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Mario Kleiner [Sun, 21 Nov 2010 15:59:03 +0000 (10:59 -0500)]
drm/kms/radeon: Use high precision timestamps for pageflip completion events.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Mario Kleiner [Sun, 21 Nov 2010 15:59:02 +0000 (10:59 -0500)]
drm/kms/radeon: Reorder vblank and pageflip interrupt handling.
In the vblank irq handler, calls to actual vblank handling,
or at least drm_handle_vblank(), need to happen before
calls to radeon_crtc_handle_flip().
Reason: The high precision pageflip timestamping
and some other pageflip optimizations will need the updated
vblank count and timestamps for the current vblank interval.
These are calculated in drm_handle_vblank(), therefore it
must go first.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Sun, 21 Nov 2010 15:59:01 +0000 (10:59 -0500)]
drm/radeon/kms: add pageflip ioctl support (v3)
This adds support for dri2 pageflipping.
v2: precision updates from Mario Kleiner.
v3: Multihead fixes from Mario Kleiner; missing crtc offset
add note about update pending bit on pre-avivo chips
Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
Mario Kleiner [Sat, 23 Oct 2010 02:42:17 +0000 (04:42 +0200)]
drm/kms/radeon: Add support for precise vblank timestamping.
This patch adds new functions for use by the drm core:
.get_vblank_timestamp() provides a precise timestamp
for the end of the most recent (or current) vblank
interval of a given crtc, as needed for the DRI2
implementation of the OML_sync_control extension.
It is a thin wrapper around the drm function
drm_calc_vbltimestamp_from_scanoutpos() which does
almost all the work and is shared across drivers.
.get_scanout_position() provides the current horizontal
and vertical video scanout position and "in vblank"
status of a given crtc, as needed by the drm for use by
drm_calc_vbltimestamp_from_scanoutpos().
The function is also used by the dynamic gpu reclocking
code to determine when it is safe to reclock inside vblank.
For that purpose radeon_pm_in_vbl() is modified to
accomodate a small change in the function prototype of
the radeon_get_crtc_scanoutpos() which is hooked up to
.get_scanout_position().
This code has been tested on AVIVO hardware, a RV530
(ATI Mobility Radeon X1600) in a Intel Core-2 Duo MacBookPro
and some R600 variant (FireGL V7600) in a single cpu
AMD Athlon 64 PC.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Reviewed-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Mario Kleiner [Sat, 23 Oct 2010 02:20:23 +0000 (04:20 +0200)]
drm/vblank: Add support for precise vblank timestamping.
The DRI2 swap & sync implementation needs precise
vblank counts and precise timestamps corresponding
to those vblank counts. For conformance to the OpenML
OML_sync_control extension specification the DRM
timestamp associated with a vblank count should
correspond to the start of video scanout of the first
scanline of the video frame following the vblank
interval for that vblank count.
Therefore we need to carry around precise timestamps
for vblanks. Currently the DRM and KMS drivers generate
timestamps ad-hoc via do_gettimeofday() in some
places. The resulting timestamps are sometimes not
very precise due to interrupt handling delays, they
don't conform to OML_sync_control and some are wrong,
as they aren't taken synchronized to the vblank.
This patch implements support inside the drm core
for precise and robust timestamping. It consists
of the following interrelated pieces.
1. Vblank timestamp caching:
A per-crtc ringbuffer stores the most recent vblank
timestamps corresponding to vblank counts.
The ringbuffer can be read out lock-free via the
accessor function:
The function returns the current vblank count and
the corresponding timestamp for start of video
scanout following the vblank interval. It can be
used anywhere between enclosing drm_vblank_get(dev, crtcid)
and drm_vblank_put(dev,crtcid) statements. It is used
inside the drmWaitVblank ioctl and in the vblank event
queueing and handling. It should be used by kms drivers for
timestamping of bufferswap completion.
The timestamp ringbuffer is reinitialized each time
vblank irq's get reenabled in drm_vblank_get()/
drm_update_vblank_count(). It is invalidated when
vblank irq's get disabled.
The ringbuffer is updated inside drm_handle_vblank()
at each vblank irq.
2. Calculation of precise vblank timestamps:
drm_get_last_vbltimestamp() is used to compute the
timestamp for the end of the most recent vblank (if
inside active scanout), or the expected end of the
current vblank interval (if called inside a vblank
interval). The function calls into a new optional kms
driver entry point dev->driver->get_vblank_timestamp()
which is supposed to provide the precise timestamp.
If a kms driver doesn't implement the entry point or
if the call fails, a simple do_gettimeofday() timestamp
is returned as crude approximation of the true vblank time.
A new drm module parameter drm.timestamp_precision_usec
allows to disable high precision timestamps (if set to
zero) or to specify the maximum acceptable error in
the timestamps in microseconds.
Kms drivers could implement their get_vblank_timestamp()
function in a gpu specific way, as long as returned
timestamps conform to OML_sync_control, e.g., by use
of gpu specific hardware timestamps.
Optionally, kms drivers can simply wrap and use the new
utility function drm_calc_vbltimestamp_from_scanoutpos().
This function calls a new optional kms driver function
dev->driver->get_scanout_position() which returns the
current horizontal and vertical video scanout position
of the crtc. The scanout position together with the
drm_display_timing of the current video mode is used
to calculate elapsed time relative to start of active scanout
for the current video frame. This elapsed time is subtracted
from the current do_gettimeofday() time to get the timestamp
corresponding to start of video scanout. Currently
non-interlaced, non-doublescan video modes, with or
without panel scaling are handled correctly. Interlaced/
doublescan modes are tbd in a future patch.
3. Filtering of redundant vblank irq's and removal of
some race-conditions in the vblank irq enable/disable path:
Some gpu's (e.g., Radeon R500/R600) send spurious vblank
irq's outside the vblank if vblank irq's get reenabled.
These get detected by use of the vblank timestamps and
filtered out to avoid miscounting of vblanks.
Some race-conditions between the vblank irq enable/disable
functions, the vblank irq handler and the gpu itself (updating
its hardware vblank counter in the "wrong" moment) are
fixed inside vblank_disable_and_save() and
drm_update_vblank_count() by use of the vblank timestamps and
a new spinlock dev->vblank_time_lock.
The time until vblank irq disable is now configurable via
a new drm module parameter drm.vblankoffdelay to allow
experimentation with timeouts that are much shorter than
the current 5 seconds and should allow longer vblank off
periods for better power savings.
Followup patches will use these new functions to
implement precise timestamping for the intel and radeon
kms drivers.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Sat, 20 Nov 2010 03:46:45 +0000 (19:46 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard
fs: Do not dispatch FITRIM through separate super_operation
ext4: ext4_fill_super shouldn't return 0 on corruption
jbd2: fix /proc/fs/jbd2/<dev> when using an external journal
ext4: missing unlock in ext4_clear_request_list()
ext4: fix setting random pages PageUptodate
Lukas Czerner [Sat, 20 Nov 2010 02:47:07 +0000 (21:47 -0500)]
ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard
Filesystem independent ioctl was rejected as not common enough to be in
core vfs ioctl. Since we still need to access to this functionality this
commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch
ext4_trim_fs().
It takes fstrim_range structure as an argument. fstrim_range is definec in
the include/linux/fs.h and its definition is as follows.
start - first Byte to trim
len - number of Bytes to trim from start
minlen - minimum extent length to trim, free extents shorter than this
number of Bytes will be ignored. This will be rounded up to fs
block size.
After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.