if we unplug the child's cpu after the sanity check when the child
gets attached to the task_list but before wake_up_new_task() shit
will meet with fan.
Solve all these issues by moving fork cpu selection into
wake_up_new_task().
Reported-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1264106190.4283.1314.camel@laptop> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The hot-unplug kstopmachine usage does a wakeup after
deactivating the cpu, hence we cannot use cpu_active()
here but must rely on the good olde online.
Reported-by: Sachin Sant <sachinp@in.ibm.com> Reported-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Jens Axboe <jens.axboe@oracle.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <1261326987.4314.24.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In order to remove the cfs_rq dependency from set_task_cpu() we
need to ensure the task is cfs_rq invariant for all callsites.
The simple approach is to substract cfs_rq->min_vruntime from
se->vruntime on dequeue, and add cfs_rq->min_vruntime on
enqueue.
However, this has the downside of breaking FAIR_SLEEPERS since
we loose the old vruntime as we only maintain the relative
position.
To solve this, we observe that we only migrate runnable tasks,
we do this using deactivate_task(.sleep=0) and
activate_task(.wakeup=0), therefore we can restrain the
min_vruntime invariance to that state.
The only other case is wakeup balancing, since we want to
maintain the old vruntime we cannot make it relative on dequeue,
but since we don't migrate inactive tasks, we can do so right
before we activate it again.
This is where we need the new pre-wakeup hook, we need to call
this while still holding the old rq->lock. We could fold it into
->select_task_rq(), but since that has multiple callsites and
would obfuscate the locking requirements, that seems like a
fudge.
This leaves the fork() case, simply make sure that ->task_fork()
leaves the ->vruntime in a relative state.
This covers all cases where set_task_cpu() gets called, and
ensures it sees a relative vruntime.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.191697025@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since select_task_rq() is now responsible for guaranteeing
->cpus_allowed and cpu_active_mask, we need to verify this.
select_task_rq_rt() can blindly return
smp_processor_id()/task_cpu() without checking the valid masks,
select_task_rq_fair() can do the same in the rare case that all
SD_flags are disabled.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.961475466@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since we access ->cpus_allowed without holding rq->lock we need
a retry loop to validate the result, this comes for near free
when we merge sched_migrate_task() into sched_exec() since that
already does the needed check.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.884743662@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There's a preemption race in the set_task_cpu() debug check in
that when we get preempted after setting task->state we'd still
be on the rq proper, but fail the test.
Check for preempted tasks, since those are always on the RQ.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20091217121830.137155561@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In order to clean up the set_task_cpu() rq dependencies we need
to ensure it is never called on blocked tasks because such usage
does not pair with consistent rq->lock usage.
This puts the migration burden on ttwu().
Furthermore we need to close a race against changing
->cpus_allowed, since select_task_rq() runs with only preemption
disabled.
For sched_fork() this is safe because the child isn't in the
tasklist yet, for wakeup we fix this by synchronizing
set_cpus_allowed_ptr() against TASK_WAKING, which leaves
sched_exec to be a problem
This also closes a hole in (6ad4c1888 sched: Fix balance vs
hotplug race) where ->select_task_rq() doesn't validate the
result against the sched_domain/root_domain.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.807938893@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sachin found cpu hotplug test failures on powerpc, which made
the kernel hang on his POWER box.
The problem is that we fail to re-activate a cpu when a
hot-unplug fails. Fix this by moving the de-activation into
_cpu_down after doing the initial checks.
Remove the synchronize_sched() calls and rely on those implied
by rebuilding the sched domains using the new mask.
Reported-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.500272612@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
read_lock(&tasklist_lock) does not protect
sys_sched_get_rr_param() against a concurrent update of the
policy or scheduler parameters as do_sched_scheduler() does not
take the tasklist_lock.
The access to task->sched_class->get_rr_interval is protected by
task_rq_lock(task).
Use rcu_read_lock() to protect find_task_by_vpid() and prevent
the task struct from going away.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091209100706.862897167@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
tasklist_lock is held read locked to protect the
find_task_by_vpid() call and to prevent the task going away.
sched_setaffinity acquires a task struct ref and drops tasklist
lock right away. The access to the cpus_allowed mask is
protected by rq->lock.
rcu_read_lock() provides the same protection here.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091209100706.789059966@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
read_lock(&tasklist_lock) does not protect
sys_sched_getscheduler and sys_sched_getparam() against a
concurrent update of the policy or scheduler parameters as
do_sched_setscheduler() does not take the tasklist_lock. The
accessed integers can be retrieved w/o locking and are snapshots
anyway.
Using rcu_read_lock() to protect find_task_by_vpid() and prevent
the task struct from going away is not changing the above
situation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091209100706.753790977@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern noticed that all the wakeup side (and atomic) variants of the
completion APIs should be irq safe, but the newly introduced
completion_done() and try_wait_for_completion() aren't. The use of the
irq unsafe variants in IRQ contexts can cause crashes/hangs.
Fix the problem by making them use spin_lock_irqsave() and
spin_lock_irqrestore().
Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: pm list <linux-pm@lists.linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Chinner <david@fromorbit.com> Cc: Lachlan McIlroy <lachlan@sgi.com>
LKML-Reference: <200912130007.30541.rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently we try to do task placement in wake_up_new_task() after we do
the load-balance pass in sched_fork(). This yields complicated semantics
in that we have to deal with tasks on different RQs and the
set_task_cpu() calls in copy_process() and sched_fork()
Rename ->task_new() to ->task_fork() and call it from sched_fork()
before the balancing, this gives the policy a clear point to place the
task.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
set_task_cpu() should be rq invariant and only touch task state, it
currently fails to do so, which opens up a few races, since not all
callers hold both rq->locks.
Remove the relyance on rq->clock, as any site calling set_task_cpu()
should also do a remote clock update, which should ensure the observed
time between these two cpus is monotonic, as per
kernel/sched_clock.c:sched_clock_remote().
Therefore we can simply remove the clock_offset bits and be happy.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sched_rr_get_param calls
task->sched_class->get_rr_interval(task) without protection
against a concurrent sched_setscheduler() call which modifies
task->sched_class.
Serialize the access with task_rq_lock(task) and hand the rq
pointer into get_rr_interval() as it's needed at least in the
sched_fair implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <alpine.LFD.2.00.0912090930120.3089@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In commit d4d6715, we reopened an old hole for a 64-bit ptracer touching a
32-bit tracee in system call entry. A %rax value set via ptrace at the
entry tracing stop gets used whole as a 32-bit syscall number, while we
only check the low 32 bits for validity.
Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
in addition to testing the full 64 bits as has already been added.
Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area. A missing call could
introduce problems on some architectures.
This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.
This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures. This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.
Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On 64 bits, we always, by necessity, jump through the system call
table via %rax. For 32-bit system calls, in theory the system call
number is stored in %eax, and the code was testing %eax for a valid
system call number. At one point we loaded the stored value back from
the stack to enforce zero-extension, but that was removed in checkin d4d67150165df8bf1cc05e532f6efca96f907cab. An actual 32-bit process
will not be able to introduce a non-zero-extended number, but it can
happen via ptrace.
Instead of re-introducing the zero-extension, test what we are
actually going to use, i.e. %rax. This only adds a handful of REX
prefixes to the code.
Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Wireless extensions have an unfortunate, undocumented
requirement which requires drivers to always fill
iwp->length when returning a successful status. When
a driver doesn't do this, it leads to a kernel heap
content leak when userspace offers a larger buffer
than would have been necessary.
Arguably, this is a driver bug, as it should, if it
returns 0, fill iwp->length, even if it separately
indicated that the buffer contents was not valid.
However, we can also at least avoid the memory content
leak if the driver doesn't do this by setting the iwp
length to max_tokens, which then reflects how big the
buffer is that the driver may fill, regardless of how
big the userspace buffer is.
To illustrate the point, this patch also fixes a
corresponding cfg80211 bug (since this requirement
isn't documented nor was ever pointed out by anyone
during code review, I don't trust all drivers nor
all cfg80211 handlers to implement it correctly).
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the WARN condition is hit in ieee80211_get_tx_rate, it will return
NULL. So, we need to check the return value and avoid dereferencing it
in that case.
Signed-off-by: John W. Linville <linville@tuxdriver.com> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michael reported that p54* never really entered power
save mode, even tough it was enabled.
It turned out that upon a power save mode change the
firmware will set a special flag onto the last outgoing
frame tx status (which in this case is almost always the
designated PSM nullfunc frame). This flag confused the
driver; It erroneously reported transmission failures
to the stack, which then generated the next nullfunc.
and so on...
Reported-by: Michael Buesch <mb@bu3sch.de> Tested-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Each histogram entry has a callchain root that stores the
callchain samples. However we forgot to initialize the
tracking of children hits of these roots, which then got
random values on their creation.
The root children hits is multiplied by the minimum percentage
of hits provided by the user, and the result becomes the minimum
hits expected from children branches. If the random value due
to the uninitialization is big enough, then this minimum number
of hits can be huge and eventually filter every children branches.
The end result was invisible callchains. All we need to
fix this is to initialize the children hits of the root.
Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
next_active_pageblock() is for finding next _used_ freeblock. It skips
several blocks when it finds there are a chunk of free pages lager than
pageblock. But it has 2 bugs.
1. We have no lock. page_order(page) - pageblock_order can be minus.
2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.
We need to call platform_device_unregister(i8042_platform_device)
before calling platform_driver_unregister() because i8042_remove()
resets i8042_platform_device to NULL. This leaves the platform device
instance behind and prevents driver reload.
Commit 74641f584da ("alpha: binfmt_aout fix") (May 2009) introduced a
regression - binfmt_misc is now consulted after binfmt_elf, which will
unfortunately break ia32el. ia32 ELF binaries on ia64 used to be matched
using binfmt_misc and executed using wrapper. As 32bit binaries are now
matched by binfmt_elf before bindmt_misc kicks in, the wrapper is ignored.
The fix increases precedence of binfmt_misc to the original state.
Signed-off-by: Jan Sembera <jsembera@suse.cz> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I have been seeing problems on Tegra 2 (ARMv7 SMP) systems with HIGHMEM
enabled on 2.6.35 (plus some patches targetted at 2.6.36 to perform cache
maintenance lazily), and the root cause appears to be that the mm bouncing
code is calling flush_dcache_page before it copies the bounce buffer into
the bio.
The bounced page needs to be flushed after data is copied into it, to
ensure that architecture implementations can synchronize instruction and
data caches if necessary.
Signed-off-by: Gary King <gking@nvidia.com> Cc: Tejun Heo <tj@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kunmap_atomic() takes the cookie, returned by the kmap_atomic() as its
argument and not the page address, used as an argument to kmap_atomic().
This patch fixes the compile error:
In file included from drivers/mmc/host/tmio_mmc.c:37:
drivers/mmc/host/tmio_mmc.h: In function 'tmio_mmc_kunmap_atomic':
drivers/mmc/host/tmio_mmc.h:192: error: negative width in bit-field '<anonymous>'
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Eric Miao <eric.y.miao@gmail.com> Tested-by: Magnus Damm <damm@opensource.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Previously, it was possible for ack_mmc_irqs() to clear pending interrupt
bits in the CTL_STATUS register, even though the interrupt handler had not
been called. This was because of a race that existed when doing a
read-modify-write sequence on CTL_STATUS. After the read step in this
sequence, if an interrupt occurred (causing one of the bits in CTL_STATUS
to be set) the write step would inadvertently clear it.
Observed with the TMIO_STAT_RXRDY bit together with CMD53 on AR6002 and
BCM4318 SDIO cards in polled mode.
This patch eliminates this race by only writing to CTL_STATUS and clearing
the interrupts that were passed as an argument to ack_mmc_irqs()."
[matt@console-pimps.org: rewrote changelog] Signed-off-by: Yusuke Goda <yusuke.goda.sx@renesas.com> Acked-by: Magnus Damm <damm@opensource.se> Tested-by: Arnd Hannemann <arnd@arndnet.de> Acked-by: Ian Molton <ian@mnementh.co.uk> Cc: Matt Fleming <matt@console-pimps.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The gcov-kernel infrastructure expects that each object file is loaded
only once. This may not be true, e.g. when loading multiple kernel
modules which are linked to the same object file. As a result, loading
such kernel modules will result in incorrect gcov results while unloading
will cause a null-pointer dereference.
This patch fixes these problems by changing the gcov-kernel infrastructure
so that multiple profiling data sets can be associated with one debugfs
entry. It applies to 2.6.36-rc1.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reported-by: Werner Spies <werner.spies@thalesgroup.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is an off by one. We would go past the end when we NUL terminate
the "value" string at end of the function. The "value" buffer is
allocated in irlan_client_parse_response() or
irlan_provider_parse_command().
CC: stable@kernel.org Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
having properly started the iterator to iterate the hash. This case is
degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
to misuse a pointer. This causes a NULL ptr deref with possible security
implications. Tracked as CVE-2010-3079.
Cc: Robert Swiecki <swiecki@google.com> Cc: Eugene Teo <eugene@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Reading the file set_ftrace_filter does three things.
1) shows whether or not filters are set for the function tracer
2) shows what functions are set for the function tracer
3) shows what triggers are set on any functions
3 is independent from 1 and 2.
The way this file currently works is that it is a state machine,
and as you read it, it may change state. But this assumption breaks
when you use lseek() on the file. The state machine gets out of sync
and the t_show() may use the wrong pointer and cause a kernel oops.
Luckily, this will only kill the app that does the lseek, but the app
dies while holding a mutex. This prevents anyone else from using the
set_ftrace_filter file (or any other function tracing file for that matter).
A real fix for this is to rewrite the code, but that is too much for
a -rc release or stable. This patch simply disables llseek on the
set_ftrace_filter() file for now, and we can do the proper fix for the
next major release.
Reported-by: Robert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend. As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.
Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Although the usbhid driver allocates its usbhid structure in the probe
routine, several critical fields in that structure don't get
initialized until usbhid_start(). However if report descriptor
parsing fails then usbhid_start() is never called. This leads to
problems during system suspend -- the system will freeze.
This patch (as1378) fixes the bug by moving the initialization
statements up into usbhid_probe().
On failure init_sysfs() might not properly free resources. The error
code of the function is not checked. And, when reinitializing the exit
function might be called twice. This patch fixes all this.
Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a crash during shutdown reported below. The crash is
caused by accessing already freed task structs. The fix changes the
order for registering and unregistering notifier callbacks.
All notifiers must be initialized before buffers start working. To
stop buffer synchronization we cancel all workqueues, unregister the
notifier callback and then flush all buffers. After all of this we
finally can free all tasks listed.
This should avoid accessing freed tasks.
On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
> So the initial observation is a spinlock bad magic followed by a crash
> in the spinlock debug code:
>
> [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
>
> Backtrace looks like:
>
> spin_bug+0x74/0xd4
> ._raw_spin_lock+0x48/0x184
> ._spin_lock+0x10/0x24
> .get_task_mm+0x28/0x8c
> .sync_buffer+0x1b4/0x598
> .wq_sync_buffer+0xa0/0xdc
> .worker_thread+0x1d8/0x2a8
> .kthread+0xa8/0xb4
> .kernel_thread+0x54/0x70
>
> So we are accessing a freed task struct in the work queue when
> processing the samples.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ben Hutchings [Thu, 9 Sep 2010 04:21:16 +0000 (05:21 +0100)]
tun: Don't add sysfs attributes to devices without sysfs directories
This applies to 2.6.32 *only*. It has not been applied upstream since
the limitation no longer exists.
Prior to Linux 2.6.35, net devices outside the initial net namespace
did not have sysfs directories. Attempting to add attributes to
them will trigger a BUG().
Reported-and-tested-by: Russell Stuart <russell-debian@stuart.id.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The error handling in snd_seq_oss_open() has several bad codes that
do dereferecing released pointers and double-free of kmalloc'ed data.
The object dp is release in free_devinfo() that is called via
private_free callback. The rest shouldn't touch this object any more.
The patch changes delete_port() to call kfree() in any case, and gets
rid of unnecessary calls of destructors in snd_seq_oss_open().
Certain USB devices, such as the Nokia X6 mobile phone, don't expose any
endpoint descriptors on some of their interfaces. If the ACM driver is forced
to probe all interfaces on a device the a NULL pointer dereference will occur
when the ACM driver attempts to use the endpoint of the alternative settings.
One way to get the ACM driver to probe all the interfaces is by using the
/sys/bus/usb/drivers/cdc_acm/new_id interface.
This patch checks that the endpoint pointer for the current alternate settings
is non-NULL before using it.
cdc-acm.c : Manage pseudo-modem without AT commands capabilities
Enable to drive electronic simple gadgets based on microcontrolers.
The Interface descriptor is like this:
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 0 None
Signed-off-by: Philippe Corbes <philippe.corbes@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
S60 phones from Nokia and Samsung expose two ACM channels. The first is a modem
with a standard AT-command interface, which is picked up correctly by CDC-ACM.
The second ACM port is marked as having a vendor-specific protocol. This means
that the ACM driver will not claim the second channel by default.
This adds support for the second ACM channel for the following devices:
Nokia E63
Nokia E75
Nokia 6760 Slide
Nokia E52
Nokia E55
Nokia E72
Nokia X6
Nokia N97 Mini
Nokia 5800 Xpressmusic
Nokia E90
Samsung GTi8510 (INNOV8)
Nokia S60 phones expose two ACM channels. The first is
a modem, the second is 'vendor-specific' but is treated
as a serial device at the S60 end, so we want to expose
it on Linux too.
Nokia S60 phones expose two ACM channels. The first is a modem and is picked
up by the standard AT-command interface information in the CDC-ACM driver. The
second is marked as having a vendor-specific protocol. Normally, we don't
expose those as ttys. (On some other devices, they may be claimed by the
rndis_host driver and used as a network interface).
But on S60 this second ACM channel is the way that third-party S60 application
developers are expected to communicate over USB. It acts as a serial device
at the S60 end, and so it should on Linux too.
The list of devices is largely derived from:
http://wiki.forum.nokia.com/index.php/S60_Platform_and_device_identification_codes
http://wiki.forum.nokia.com/index.php/Nokia_USB_Product_IDs
and includes only the S60 3rd Edition+ devices documented there.
There are many devices for which the USB device ID is not documented,
including:
Nokia 6290
Nokia E63
Nokia 5630 XpressMusic
Nokia 5730 XpressMusic
Nokia 6710 Navigator
Nokia 6720 classic
Nokia 6730 Classic
Nokia 6760 slide
Nokia 6790 slide
Nokia 6790 Surge
Nokia E52
Nokia E55
Nokia E71x (AT&T)
Nokia E72
Nokia E75
Nokia E75 US+LTA variant
Nokia N79
Nokia N86 8MP
Nokia 5230 (RM-588)
Nokia 5230 (RM-594)
Nokia 5530 XpressMusic
Nokia 5530 XpressMusic (china)
Nokia 5800 XM
Nokia N97 (RM-506)
Nokia N97 mini
Nokia X6
It would be good to add those subsequently.
Signed-off-by: Adrian Taylor <aat@realvnc.com> Acked-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add the USB IDs needed to support the B&B USOPTL4-4P, USO9ML2-2P, and
USO9ML2-4P. This patch expands and corrects a typo in the patch sent
on 08-31-2010.
Signed-off-by: Dave Ludlow <dave.ludlow@bay.ws> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is the cable between an H3000 navigation unit and a multi-function display.
http://www.bandg.com/en/Products/H3000/Spares-and-Accessories/Cables/H3000-CPU-USB-Cable-Pack/
Signed-off-by: Jason Detring <jason.detring@navico.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The iounmap(ehci->ohci_hcctrl_reg); should be the first thing we do
because the ioremap() was the last thing we did. Also if we hit any of
the goto statements in the original code then it would have led to a
NULL dereference of "ehci". This bug was introduced in: 796bcae7361c
"USB: powerpc: Workaround for the PPC440EPX USBH_23 errata [take 3]"
I modified the few lines in front a little so that my code didn't
obscure the return success code path.
Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For local mounts, ocfs2_read_locked_inode() calls ocfs2_read_blocks_sync() to
read the inode off the disk. The latter first checks to see if that block is
cached in the journal, and, if so, returns that block. That is ok.
But ocfs2_read_locked_inode() goes wrong when it tries to validate the checksum
of such blocks. Blocks that are cached in the journal may not have had their
checksum computed as yet. We should not validate the checksums of such blocks.
The 5 GHz CTL indexes were not being read for all hardware
devices due to the masking out through the CTL_MODE_M mask
being one bit too short. Without this the calibrated regulatory
maximum values were not being picked up when devices operate
on 5 GHz in HT40 mode. The final output power used for Atheros
devices is the minimum between the calibrated CTL values and
what CRDA provides.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Bartly reported that fuse can hang in fuse_get_req_nofail() when
the connection to the filesystem server is no longer active.
If bg_queue is not empty then flush_bg_queue() called from
request_end() can put more requests on to the pending queue. If this
happens while ending requests on the processing queue then those
background requests will be queued to the pending list and never
ended.
Another problem is that fuse_dev_release() didn't wake up processes
sleeping on blocked_waitq.
Solve this by:
a) flushing the background queue before calling end_requests() on the
pending and processing queues
b) setting blocked = 0 and waking up processes waiting on
blocked_waitq()
Increased storvsc ringbuffer and max_io_requests. This now more
closely mimics the numbers on Hyper-V. And will allow more IO requests
to take place for the SCSI driver.
Max_IO is set to double from what it was before, Hyper-V allows it and
we have had appliance builder requests to see if it was a problem to
increase the number.
Ringbuffer size for storvsc is now increased because I have seen A few buffer
problems on extremely busy systems. They were Set pretty low before.
And since max_io_requests is increased I Really needed to increase the buffer
as well.
Fixed the value of the 64bit-hole inside ring buffer, this
caused a problem on Hyper-V when running checked Windows builds.
Checked builds of Windows are used internally and given to external
system integrators at times. They are builds that for example that all
elements in a structure follow the definition of that Structure. The bug
this fixed was for a field that we did not fill in at all (Because we do
Not use it on the Linux side), and the checked build of windows gives
errors on it internally to the Windows logs.
Fixed bounce offset kmap problem by using correct index.
The symptom of the problem is that in some NAS appliances this problem
represents Itself by a unresponsive VM under a load with many clients writing
small files.
Fix missing functions for net_device_ops.
It's a bug when porting the drivers from 2.6.27 to 2.6.32. In 2.6.27,
the default functions for Ethernet, like eth_change_mtu(), were assigned
by ether_setup(). But in 2.6.32, these function pointers moved to
net_device_ops structure and no longer be assigned in ether_setup(). So
we need to set these functions in our driver code. It will ensure the
MTU won't be set beyond 1500. Otherwise, this can cause an error on the
server side, because the HyperV linux driver doesn't support jumbo frame
yet.
Mike Galbraith [Thu, 26 Aug 2010 03:29:16 +0000 (05:29 +0200)]
sched: revert stable c6fc81a sched: Fix a race between ttwu() and migrate_task()
This commit does not appear to have been meant for 32-stable, and causes ltp's
cpusets testcases to fail, revert it.
Original commit text:
sched: Fix a race between ttwu() and migrate_task()
Based on commit e2912009fb7b715728311b0d8fe327a1432b3f79 upstream, but
done differently as this issue is not present in .33 or .34 kernels due
to rework in this area.
If a task is in the TASK_WAITING state, then try_to_wake_up() is working
on it, and it will place it on the correct cpu.
This commit ensures that neither migrate_task() nor __migrate_task()
calls set_task_cpu(p) while p is in the TASK_WAKING state. Otherwise,
there could be two concurrent calls to set_task_cpu(p), resulting in
the task's cfs_rq being inconsistent with its cpu.
Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Amit Arora [Wed, 19 May 2010 09:05:57 +0000 (14:35 +0530)]
sched: kill migration thread in CPU_POST_DEAD instead of CPU_DEAD
[Fixed in a different manner upstream, due to rewrites in this area]
Problem : In a stress test where some heavy tests were running along with
regular CPU offlining and onlining, a hang was observed. The system seems to
be hung at a point where migration_call() tries to kill the migration_thread
of the dying CPU, which just got moved to the current CPU. This migration
thread does not get a chance to run (and die) since rt_throttled is set to 1
on current, and it doesn't get cleared as the hrtimer which is supposed to
reset the rt bandwidth (sched_rt_period_timer) is tied to the CPU which we just
marked dead!
Solution : This patch pushes the killing of migration thread to "CPU_POST_DEAD"
event. By then all the timers (including sched_rt_period_timer) should have got
migrated (along with other callbacks).
Signed-off-by: Amit Arora <amitarora@in.ibm.com> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 "PCI: MSI: Remove
unsafe and unnecessary hardware access" changed read_msi_msg_desc() to
return the last MSI message written instead of reading it from the
device, since it may be called while the device is in a reduced
power state.
However, the pSeries platform code really does need to read messages
from the device, since they are initially written by firmware.
Therefore:
- Restore the previous behaviour of read_msi_msg_desc()
- Add new functions get_cached_msi_msg{,_desc}() which return the
last MSI message written
- Use the new functions where appropriate
Acked-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
During suspend on an SMP system, {read,write}_msi_msg_desc() may be
called to mask and unmask interrupts on a device that is already in a
reduced power state. At this point memory-mapped registers including
MSI-X tables are not accessible, and config space may not be fully
functional either.
While a device is in a reduced power state its interrupts are
effectively masked and its MSI(-X) state will be restored when it is
brought back to D0. Therefore these functions can simply read and
write msi_desc::msg for devices not in D0.
Further, read_msi_msg_desc() should only ever be used to update a
previously written message, so it can always read msi_desc::msg
and never needs to touch the hardware.
TSC's get reset after suspend/resume (even on cpu's with invariant TSC
which runs at a constant rate across ACPI P-, C- and T-states). And in
some systems BIOS seem to reinit TSC to arbitrary large value (still
sync'd across cpu's) during resume.
This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
than rq->age_stamp (introduced in 2.6.32). This leads to a big value
returned by scale_rt_power() and the resulting big group power set by the
update_group_power() is causing improper load balancing between busy and
idle cpu's after suspend/resume.
This resulted in multi-threaded workloads (like kernel-compilation) go
slower after suspend/resume cycle on core i5 laptops.
Fix this by recomputing cyc2ns_offset's during resume, so that
sched_clock() continues from the point where it was left off during
suspend.
Fix DSM/TRIM commands in sata_mv (v2).
These need to be issued using old-school "BM DMA",
rather than via the EDMA host queue.
Since the chips don't have proper BM DMA status,
we need to be more careful with setting the ATA_DMA_INTR bit,
since DSM/TRIM often has a long delay between "DMA complete"
and "command complete".
GEN_I chips don't have BM DMA, so no TRIM for them.
Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The non-standard name "iMic" makes PulseAudio ignore the microphone. BugLink: https://launchpad.net/bugs/605101 Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
IPIs and VIRQs are inherently per-cpu event types, so treat them as such:
- use a specific percpu irq_chip implementation, and
- handle them with handle_percpu_irq
This makes the path for delivering these interrupts more efficient
(no masking/unmasking, no locks), and it avoid problems with attempts
to migrate them.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Xen events are logically edge triggered, as Xen only calls the event
upcall when an event is newly set, but not continuously as it remains set.
As a result, use handle_edge_irq rather than handle_level_irq.
This has the important side-effect of fixing a long-standing bug of
events getting lost if:
- an event's interrupt handler is running
- the event is migrated to a different vcpu
- the event is re-triggered
The most noticable symptom of these lost events is occasional lockups
of blkfront.
Many thanks to Tom Kopec and Daniel Stodden in tracking this down.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Tom Kopec <tek@acm.org> Cc: Daniel Stodden <daniel.stodden@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 8bf0223ed515be24de0c671eedaff49e78bebc9c (hwmon, k8temp: Fix
temperature reporting for ASB1 processor revisions) fixed temperature
reporting for ASB1 CPUs. But those CPU models (model 0x6b, 0x6f, 0x7f)
were packaged both as AM2 (desktop) and ASB1 (mobile). Thus the commit
leads to wrong temperature reporting for AM2 CPU parts.
The solution is to determine the package type for models 0x6b, 0x6f,
0x7f.
This is done using BrandId from CPUID Fn8000_0001_EBX[15:0]. See
"Constructing the processor Name String" in "Revision Guide for AMD
NPT Family 0Fh Processors" (Rev. 3.46).
Cc: Rudolf Marek <r.marek@assembler.cz> Reported-by: Vladislav Guberinic <neosisani@gmail.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the SMP kernel decides to crash_kexec() the local APICs may have
pending interrupts in their vector tables.
The setup routine for the local APIC has a deficient mechanism for
clearing these interrupts, it only handles interrupts that has already
been dispatched to the local core for servicing (the ISR register) safely,
it doesn't consider lower prioritized queued interrupts stored in the IRR
register.
If you have more than one pending interrupt within the same 32 bit word in
the LAPIC vector table registers you may find yourself entering the IO
APIC setup with pending interrupts left in the LAPIC. This is a situation
for wich the IO APIC setup is not prepared. Depending of what/which
interrupt vector/vectors are stuck in the APIC tables your system may show
various degrees of malfunctioning. That was the reason why the
check_timer() failed in our system, the timer interrupts was blocked by
pending interrupts from the old kernel when routed trough the IO APIC.
Additional comment from Jiri Bohac:
==============
If this should go into stable release,
I'd add some kind of limit on the number of iterations, just to be safe from
hard to debug lock-ups:
with MAX_LOOPS something like 1E9 this would leave plenty of time for the
pending IRQs to be cleared and would and still cause at most a second of delay
if the loop were to lock-up for whatever reason.
[trenn@suse.de:
V2: Use tsc if avail to bail out after 1 sec due to possible virtual
apic_read calls which may take rather long (suggested by: Avi Kivity
<avi@redhat.com>) If no tsc is available bail out quickly after
cpu_khz, if we broke out too early and still have irqs pending (which
should never happen?) we still get a WARN_ON...
V3: - Fixed indentation -> checkpatch clean
- max_loops must be signed
V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case]
Cc: <jbohac@novell.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Kerstin Jonsson <kerstin.jonsson@ericsson.com> Cc: Avi Kivity <avi@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Thomas Renninger <trenn@suse.de>
LKML-Reference: <201005241913.o4OJDGWM010865@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add ftdi product ID for Lenz LI-USB, a model train interface. This
was NOT tested against 2.6.35, but a similar patch was tested with the
CentOS 2.6.18-194.11.1.el5 kernel. It wasn't clear to me what
ordering is being used in ftdi_sio.c, so I inserted the ID after another
model train entry(SPROG_II).
The code to increment the TRB pointer has a slight ambiguity that could
lead to a bug on different compilers. The ANSI C specification does not
specify the precedence of the assignment operator over the postfix
operator. gcc 4.4 produced the correct code (increment the pointer and
assign the value), but a MIPS compiler that one of John's clients used
assigned the old (unincremented) value.
Remove the unnecessary assignment to make all compilers produce the
correct assembly.
Signed-off-by: John Youn <johnyoun@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If we can't read the firmware for a device from the disk, and yet the
device already has a valid firmware image in it, we don't want to
replace the firmware with something invalid. So check the version
number to be less than the current one to verify this is the correct
thing to do.
Reported-by: Chris Beauchamp <chris@chillibean.tv> Tested-by: Chris Beauchamp <chris@chillibean.tv> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add support for the Zeagle N2iTiON3 dive computer interface. Since
Zeagle devices are actually manufactured by Seiko, this patch will
support other Seiko based models as well.
>Try the navman driver instead. You can either add the device id to the
> driver and rebuild it, or do this before you plug the device in:
> modprobe navman
> echo -n "0x0df7 0x0900" > /sys/bus/usb-serial/drivers/navman/new_id
>
> and then plug your device in and see if that works.
I can confirm that the navman driver works with the right device IDs on
my i-gotU GT-600, which has the same device IDs. Attached is a patch
adding the IDs.
From: Ross Burton <ross@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Userspace controls the amount of memory to be allocate, so it can
get the ioctl to allocate more memory than the kernel uses, and get
access to kernel stack. This can only be done for processes authenticated
to the X server for DRI access, and if the user has DRI access.
Fix is to just memset the data to 0 if the user doesn't copy into
it in the first place.