git.karo-electronics.de Git - linux-beck.git/log

sched: add CFS documentation

add Documentation/sched-design-CFS.txt

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: scheduler debugging, enable in Kconfig

enable CONFIG_SCHED_DEBUG in lib/Kconfig.debug.

the runtime overhead of this option is very small.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: scheduler debugging, core

scheduler debugging core: implement /proc/sched_debug and
/proc/<PID>/sched files for scheduler debugging.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add CFS debug sysctls

add CFS debug sysctls: only tweakable if SCHED_DEBUG is enabled.
This allows for faster debugging of scheduler problems.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove old cpu accounting field

remove the old cpu-accounting field from signal_struct, now
that the code is using CFS's stats.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove unused rq types from sched.c

remove unused rq types from sched.c, now that we switched
over to CFS.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove batch_task()

batch_task() in sched.h is now unused - remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove interactivity types from sched.h

remove now-unused types/fields used by the old scheduler.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove interactivity types

remove now unused interactivity-heuristics related defined and
types of the old scheduler.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: clean up include files in sched.c

clean up include files in sched.c, they were still old-style <asm/>.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: clean up fastcall uses of sched_fork()/sched_exit()

sched_fork()/sched_exit() does not need to specify fastcall anymore,
as the x86 kernel defaults to regparm3, and no assembly code calls
these functions.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: update delay-accounting to use CFS's precise stats

update delay-accounting to use CFS's precise stats.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: make use of precise accounting for /proc task stats

make use of CFS's precise accounting to drive /proc/<pid>/stat statistics.

this code was co-authored by:

Balbir Singh <balbir@linux.vnet.ibm.com>
Dmitry Adamushko <dmitry.adamushko@gmail.com>
Ingo Molnar <mingo@elte.hu>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>

sched: turn on the use of unstable events

make use of sched-clock-unstable events.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: x86, track TSC-unstable events

track TSC-unstable events and propagate it to the scheduler code.
Also allow sched_clock() to be used when the TSC is unstable,
the rq_clock() wrapper creates a reliable clock out of it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cfs core code

apply the CFS core code.

this change switches over the scheduler core to CFS's modular
design and makes use of kernel/sched_fair/rt/idletask.c to implement
Linux's scheduling policies.

thanks to Andrew Morton and Thomas Gleixner for lots of detailed review
feedback and for fixlets.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

sched: remove the sleep-bonus interactivity code

remove the sleep-bonus interactivity code from the core scheduler.

scheduling policy is implemented in the policy modules, and CFS does
not need such type of heuristics.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove expired_starving()

remove the expired_starving() heuristics from the core scheduler.

CFS does not need it, and this did not really work well in practice
anyway, due to the rq->nr_running multiplier to STARVATION_LIMIT.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove sleep_type

remove the sleep_type heuristics from the core scheduler - scheduling
policy is implemented in the scheduling-policy modules. (and CFS does
not use this type of sleep-type heuristics)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cfs, add load-calculation methods

add the new load-calculation methods of CFS.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: clean up __normal_prio() position

clean up: move __normal_prio() in head of normal_prio().

no code changed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cleanup: move dequeue/enqueue_task()

cleanup: move dequeue/enqueue_task() to a more logical place, to
not split up __normal_prio()/normal_prio().

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: move around resched_task()

move resched_task()/resched_cpu() into the 'public interfaces'
section of sched.c, for use by kernel/sched_fair/rt/idletask.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove the SleepAVG field

remove the SleepAVG field from /proc/<pid>/status, as
with the removal of the sleep-average code this value
no longer makes sense.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: clean up the rt priority macros

clean up the rt priority macros, pointed out by Andrew Morton.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add cfs_rq ops

add the set_task_cfs_rq() abstraction needed by CONFIG_FAIR_GROUP_SCHED.

(not activated yet)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: make posix-cpu-timers use CFS's accounting information

update the posix-cpu-timers code to use CFS's CPU accounting information.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add rq_clock()/__rq_clock()

add rq_clock()/__rq_clock(), a robust wrapper around sched_clock(),
used by CFS. It protects against common type of sched_clock() problems
(caused by hardware): time warps forwards and backwards.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cfs rq data types

add the CFS rq data types to sched.c.

(the old scheduler fields are still intact, they are removed
by a later patch)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cfs, core data types

add the CFS data types to sched.h.

(the old scheduler is still fully intact.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cfs core, kernel/sched_idletask.c

add kernel/sched_idletask.c - which implements the idle thread
scheduling class. This further simplifies sched.c (under CFS),
for example a number of 'if (p == rq->idle)' type of special-cases
can be removed from sched.c, and schedule() gets simpler too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cfs core, kernel/sched_rt.c

add kernel/sched_rt.c: SCHED_FIFO/SCHED_RR support. The behavior
and semantics of SCHED_FIFO/SCHED_RR tasks is unchanged.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cfs core, kernel/sched_fair.c

add kernel/sched_fair.c - which implements the bulk of CFS's
behavioral changes for SCHED_OTHER tasks.

see Documentation/sched-design-CFS.txt about details.

Authors:

Ingo Molnar <mingo@elte.hu>
Dmitry Adamushko <dmitry.adamushko@gmail.com>
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Mike Galbraith <efault@gmx.de>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

sched: increase the resolution of smpnice

increase SMP-nice's resolution. This is needed by CFS to
implement SCHED_IDLE and cleaned up nice level support.

no behavioral changes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: move code into kernel/sched_stats.h

create sched_stats.h and move sched.c schedstats code into it.
This cleans up sched.c a bit.

no code changes are caused by this patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add init_idle_bootup_task()

add the init_idle_bootup_task() callback to the bootup thread,
unused at the moment. (CFS will use it to switch the scheduling
class of the boot thread to the idle class)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add in_atomic_preempt_off()

add in_atomic_preempt_off() - debugging helper that will
simplify schedule().

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove sched_exit()

remove sched_exit(): the elaborate dance of us trying to recover
timeslices given to child tasks never really worked.

CFS does not need it either.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: uninline set_task_cpu()

uninline set_task_cpu(): CFS will add more code to it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: zap the migration init / cache-hot balancing code

the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.

this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.

(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)

under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add SCHED_IDLE policy

this patch adds the SCHED_IDLE policy to sched.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: rename idle_type/SCHED_IDLE

enum idle_type (used by the load-balancer) clashes with the
SCHED_IDLE name that we want to introduce. 'CPU_IDLE' instead
of 'SCHED_IDLE' is more descriptive as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Linux 2.6.22

Woo-hoo. I'm sure somebody will report a "this doesn't compile, and
I have a new root exploit" five minutes after release, but it still
feels good ;)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6:
qd65xx: fix PIO mode selection
sis5513: adding PCI-ID

Fix permission checking for the new utimensat() system call

Commit 1c710c896eb461895d3c399e15bb5f20b39c9073 added the utimensat()
system call, but didn't handle the case of checking for the writability
of the target right, when the target was a file descriptor, not a
filename.

We cannot use vfs_permission(MAY_WRITE) for that case, and need to
simply check whether the file descriptor is writable. The oops from
using the wrong function was noticed and narrowed down by Markus
Trippelsdorf.

Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: double mark_page_accessed() in read_cache_page_async()

Fix a post-2.6.21 regression.

read_cache_page_async() has two invocations of mark_page_accessed() which will
launch pages right onto the active list.

Remove the first one, keeping the latter one. This avoids marking unwanted
pages active (in the retry loop).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

qd65xx: fix PIO mode selection

PIO4 is a maximum PIO mode supported by a driver. Using "255" as a max_mode
argument to ide_get_best_pio_mode() could result in wrong timings being used
by a driver (for "pio" equal to 5) or OOPS (for "pio" values > 5 && < 255).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Reviewed-by: Alan Cox <alan@lxorguk.ukuu.org.uk>

sis5513: adding PCI-ID

The SiS966 has one additional PCI-ID 1180.

If the chipset is using this PCI-ID, the primary channel is connected to the
first PATA-port. The secondary channel is connected to SATA-ports in IDE
emulation mode. The legacy IO-ports are used.

The including of the PCI-ID into pata_sis is not sufficient, because the legacy
driver in drivers/ide is initialized before pata_sis.

Signed-off-by: Uwe Koziolek <uwe.koziolek@gmx.net>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

DLM must depend on SYSFS

The dependency of DLM on SYSFS got lost in
commit 6ed7257b46709e87d79ac2b6b819b7e0c9184998 resulting in the
following compile error with CONFIG_DLM=y, CONFIG_SYSFS=n:

<--  snip  -->

...
  LD      .tmp_vmlinux1
fs/built-in.o: In function `dlm_lockspace_init':
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/dlm/lockspace.c:231: undefined reference to `kernel_subsys'
fs/built-in.o: In function `configfs_init':
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/configfs/mount.c:143: undefined reference to `kernel_subsys'
make[1]: *** [.tmp_vmlinux1] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Clean up E7520/7320/7525 quirk printk.

The printk level in this printk is bogus, as the previous printk
didn't have a terminating \n resulting in ..

Intel E7520/7320/7525 detected.<6>Disabling irq balancing and affinity

It also never printed a \n at all in the case where we didn't do
the quirk.

Change it to only make noise if it actually does something useful.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/linux/kallsyms.h must #include <linux/errno.h>

This patch fixes the following 2.6.22 regression with CONFIG_KALLSYMS=n:

<--  snip  -->

...
  CC      arch/m32r/kernel/traps.o
In file included from /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/arch/m32r/kernel/traps.c:14:
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h: In function 'lookup_symbol_name':
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: 'ERANGE' undeclared (first use in this function)
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: (Each undeclared identifier is reported only once
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:66: error: for each function it appears in.)
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h: In function 'lookup_symbol_attrs':
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/include/linux/kallsyms.h:71: error: 'ERANGE' undeclared (first use in this function)
make[2]: *** [arch/m32r/kernel/traps.o] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Fix use-after-free oops in Bluetooth HID.

When cleaning up HIDP sessions, we currently close the ACL connection
before deregistering the input device. Closing the ACL connection
schedules a workqueue to remove the associated objects from sysfs, but
the input device still refers to them -- and if the workqueue happens to
run before the input device removal, the kernel will oops when trying to
look up PHYSDEVPATH for the removed input device.

Fix this by deregistering the input device before closing the
connections.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

slub: remove useless EXPORT_SYMBOL

kmem_cache_open is static. EXPORT_SYMBOL was leftover from some earlier
time period where kmem_cache_open was usable outside of slub.

(Fixes powerpc build error)

Signed-off-by: Chrsitoph Lameter <clameter@sgi.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS new kernel janitors ml

davem kindly moved the list from osdl to vger.

Signed-of-by: maximilian attems <max@stro.at>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

GEODE: reboot fixup for geode machines with CS5536 boards

Writing to MSR 0x51400017 forces a hard reset on CS5536-based machines,
this has the reboot fixup do just that if such a board is detected.

Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NETPOLL]: Fixups for 'fix soft lockup when removing module'
  [NET]: net/core/netevent.c should #include <net/netevent.h>
  [NETFILTER]: nf_conntrack_h323: add checking of out-of-range on choices' index values
  [NET] skbuff: remove export of static symbol
  SCTP: Add scope_id validation for link-local binds
  SCTP: Check to make sure file is valid before setting timeout
  SCTP: Fix thinko in sctp_copy_laddrs()

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Fix scheduling latency issue on 24K, 34K and 74K cores
  [MIPS] Add macros to encode processor revisions.
  [MIPS] RM7000: Enable ICACHE_REFILLS_WORKAROUND_WAR.
  [MIPS] SMTC: Fix cut'n'paste bug in Kconfig.debug
  [MIPS] Change libgcc-style functions from lib-y to obj-y
  [MIPS] Fix timer/performance interrupt detection
  [MIPS] AP/SP: Avoid triggering the 34K E125 performance issue
  [MIPS] 64-bit TO_PHYS_MASK macro for RM9000 processors

mm: fixup /proc/vmstat output

Line up the vmstat_text with zone_stat_item

enum zone_stat_item {
/* First 128 byte cacheline (assuming 64 bit words) */
NR_FREE_PAGES,
NR_INACTIVE,
NR_ACTIVE,

We current have nr_active and nr_inactive reversed.

[ "OK with patch, though using initializers canbe handy to prevent such
things in future:

static const char * const vmstat_text[] = {
[NR_FREE_PAGES] = "nr_free_pages",
..."
- Alexey ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

potential compiler error, irqfunc caller sites update

In 7d12e780e003f93433d49ce78cfedf4b4c52adc5 David Howells performed
this evolution:
"IRQ: Maintain regs pointer globally rather than passing to IRQ handlers"

He correctly updated many of the function definitions that were using this
extra regs pointer parameter but forgot to update some caller sites of
those functions.  The reason the modifications was not properly done on all
drivers is that some drivers were rarely compiled because they are for
AMIGA, or that some code sites were inside #ifdefs where the option is not
set or inside #if 0.

Here is the semantic patch that found the occurences
and fixed the problem.

@ rule1 @
identifier fn;
identifier irq, dev_id;
typedef irqreturn_t;
@@

static irqreturn_t fn(int irq, void *dev_id)
{
   ...
}

@@
identifier rule1.fn;
expression E1, E2, E3;
@@

fn(E1, E2
-   ,E3
   )

Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i386: es7000 build breakage fix

o Commit 1833d6bc72893265f22addd79cf52e6987496e0f broke the build if
compiled with CONFIG_ES7000=y and CONFIG_X86_GENERICARCH=n

arch/i386/kernel/built-in.o(.init.text+0x4fa9): In function `acpi_parse_madt':
: undefined reference to `acpi_madt_oem_check'
arch/i386/kernel/built-in.o(.init.text+0x7406): In function `smp_read_mpc':
: undefined reference to `mps_oem_check'
arch/i386/kernel/built-in.o(.init.text+0x8990): In function
`connect_bsp_APIC':
: undefined reference to `enable_apic_mode'
make: *** [.tmp_vmlinux1] Error 1

o Fix the build issue. Provided the definitions of missing functions.

o Don't have ES7000 machine. Only compile tested.

Cc: Len Brown <lenb@kernel.org>
Cc: Natalie Protasevich <protasnb@gmail.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

PNP SMCf010 quirk: work around Toshiba Portege 4000 ACPI issues

When we enable the SMCf010 IR device, the Toshiba Portege 4000 BIOS claims
the device is working, but it really isn't configured correctly.  The BIOS
*will* configure it, but only if we call _SRS after (1) reversing the order
of the SIR and FIR I/O port regions and (2) changing the IRQ from
active-high to active-low.

This patch addresses the 2.6.22 regression:
    "no irda0 interface (2.6.21 was OK), smsc does not find chip"

I tested this on a Portege 4000.  The smsc-ircc2 driver correctly detects
the device, and "irattach irda0 -s && irdadump" shows transmitted and
received packets.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: "Linus Walleij (LD/EAB)" <linus.walleij@ericsson.com>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fix logic error in ipc compat semctl()

When calling a semctl(IPC_STAT) without IPC_64 the check if the memory is
unevaluated. This patch fixes this.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86_64: fix headers_install

A bug in headers_install for ARCH=x86_64 yields an asm/ directory full of
files all of which are using the same #ifdef guard, "__ASM_STUB_" with no
postfix. So the second and later asm files #included in the same C file
(often through standard headers like ioctl.h) yields no symbols.

Strangeness with the Ubuntu 'tell me if I support something that's not
explcitly mentioned in POSIX, and I'll strip it out' shell, I believe.

We don't need the 'export' but we do need a semicolon at the end of the
FNAME line:

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MTRR: Fix race causing set_mtrr to go into infinite loop

Processors synchronization in set_mtrr requires the .gate field to be set
after .count field is properly initialized. Without an explicit barrier,
the compiler was reordering those memory stores. That was sometimes
causing a processor (in ipi_handler) to see the .gate change and decrement
.count before the latter is set by set_mtrr() (which then hangs in a
infinite loop with irqs disabled).

Signed-off-by: Loic Prylli <loic@myri.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

i386: fix regression, endless loop in ptrace singlestep over an int80

The commit 635cf99a80f4ebee59d70eb64bb85ce829e4591f introduced a
regression. Executing a ptrace single step after certain int80
accesses will infinitely loop and never advance the PC.

The TIF_SINGLESTEP check should be done on the return from the syscall
and not before it.

I loops on each single step on the pop right after the int80 which writes out
to the console. At that point you can issue as many single steps as you want
and it will not advance any further.

The test case is below:

/* Test whether singlestep through an int80 syscall works.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/user.h>
#include <string.h>

static int child, status;
static struct user_regs_struct regs;

static void do_child()
{
char str[80] = "child: int80 test\n";

ptrace(PTRACE_TRACEME, 0, 0, 0);
kill(getpid(), SIGUSR1);
write(fileno(stdout),str,strlen(str));
asm ("int $0x80" : : "a" (20)); /* getpid */
}

static void do_parent()
{
unsigned long eip, expected = 0;
again:
waitpid(child, &status, 0);
if (WIFEXITED(status) || WIFSIGNALED(status))
return;

if (WIFSTOPPED(status)) {
ptrace(PTRACE_GETREGS, child, 0, &regs);
eip = regs.eip;
if (expected)
fprintf(stderr, "child stop @ %08lx, expected %08lx %s\n",
eip, expected,
eip == expected ? "" : " <== ERROR");

if (*(unsigned short *)eip == 0x80cd) {
fprintf(stderr, "int 0x80 at %08x\n", (unsigned int)eip);
expected = eip + 2;
} else
expected = 0;

ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
}
goto again;
}

int main(int argc, char * const argv[])
{
child = fork();
if (child)
do_parent();
else
do_child();
return 0;
}

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: <stable@kernel.org>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Fix elf_core_dump() when writing arch specific notes (spu coredumps)

elf_core_dump() supports dumping arch specific ELF notes, via the #define
ELF_CORE_WRITE_EXTRA_NOTES.  Currently the only user of this is the powerpc
spu coredump code.

There is a bug in the handling of foffset WRT the arch notes, which causes
us to erroneously increment foffset by the size of the arch notes, leaving
a block of zeroes in the file, and causing all subsequent data in the file
to be at <supposed position> + <arch note size>.  eg:

  LOAD  0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000

Tells us we should have a chunk of data at 0x50000.  The truth is the data
is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes).

This bug prevents gdb from reading the core file correctly.

The simplest fix is to simply remember the size of the arch notes, and add
it to foffset after we've written the arch notes.  The only drawback is
that if the arch code doesn't write as many bytes as it said it would, we
end up with a broken core dump again.  For now I think that's a reasonable
requirement.

Tested on a Cell blade, gdb no longer complains about the core file being
bogus.

While I'm here I should point out that the spu coredump code does not work
if we're dumping to a pipe - we'll have to wait for 23 to fix that.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[MIPS] Fix scheduling latency issue on 24K, 34K and 74K cores

The idle loop goes to sleep using the WAIT instruction if !need_resched().
This has is suffering from from a race condition that if if just after
need_resched has returned 0 an interrupt might set TIF_NEED_RESCHED but
we've just completed the test so go to sleep anyway.  This would be
trivial to fix by just disabling interrupts during that sequence as in:

        local_irq_disable();
        if (!need_resched())
                __asm__("wait");
        local_irq_enable();

but the processor architecture leaves it undefined if a processor calling
WAIT with interrupts disabled will ever restart its pipeline and indeed
some processors have made use of the freedom provided by the architecture
definition.  This has been resolved and the Config7.WII bit indicates that
the use of WAIT is safe on 24K, 24KE and 34K cores.  It also is safe on
74K starting revision 2.1.0 so enable the use of WAIT with interrupts
disabled for 74K based on a c0_prid of at least that.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Add macros to encode processor revisions.

Older processors used to encode processor version and revision in two
4-bit bitfields, the 4K seems to simply count up and even newer MTI cores
have switched to use the 8-bits as 3:3:2 bitfield with the last field as
the patch number.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] RM7000: Enable ICACHE_REFILLS_WORKAROUND_WAR.

The RM7000 processors and the E9000 cores have a bug (though PMC-Sierra
opposes it being called that) where invalid instructions in the same
I-cache line worth of instructions being fetched may case spurious
exceptions.

The workaround for this was only enabled for E9000 cores; enable it also
for all RM7000-based platforms.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] SMTC: Fix cut'n'paste bug in Kconfig.debug

This effectivly turned the SMTC_IDLE_HOOK_DEBUG debug option into a no-op.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Change libgcc-style functions from lib-y to obj-y

Reported by Eugene Surovegin <ebs@ebshome.net>.

If only modules were users of these functions they did not get linked into
the kernel proper, so later module loads would fail as well.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix timer/performance interrupt detection

Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] AP/SP: Avoid triggering the 34K E125 performance issue

C0_status doesn't need to be initialized at this point anyway; the register
will be initialized later.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] 64-bit TO_PHYS_MASK macro for RM9000 processors

Signed-off-by: Andrew Sharp <tigerand@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[NETPOLL]: Fixups for 'fix soft lockup when removing module'

>From my recent patch:

> >    #1
> >    Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
> >    required a work function should always (unconditionally) rearm with
> >    delay > 0 - otherwise it would endlessly loop. This patch replaces
> >    this function with cancel_delayed_work(). Later kernel versions don't
> >    require this, so here it's only for uniformity.

But Oleg Nesterov <oleg@tv-sign.ru> found:

> But 2.6.22 doesn't need this change, why it was merged?
>
> In fact, I suspect this change adds a race,
...

His description was right (thanks), so this patch reverts #1.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NET]: net/core/netevent.c should #include <net/netevent.h>

Every file should include the headers containing the prototypes for
its global functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NETFILTER]: nf_conntrack_h323: add checking of out-of-range on choices' index values

Choices' index values may be out of range while still encoded in the fixed
length bit-field. This bug may cause access to undefined types (NULL
pointers) and thus crashes (Reported by Zhongling Wen).

This patch also adds checking of decode flag when decoding SEQUENCEs.

Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

[NET] skbuff: remove export of static symbol

skb_clone_fraglist is static so it shouldn't be exported.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

SCTP: Add scope_id validation for link-local binds

SCTP currently permits users to bind to link-local addresses,
but doesn't verify that the scope id specified at bind matches
the interface that the address is configured on. It was report
that this can hang a system.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

SCTP: Check to make sure file is valid before setting timeout

In-kernel sockets created with sock_create_kern don't usually
have a file and file descriptor allocated to them. As a result,
when SCTP tries to check the non-blocking flag, we Oops when
dereferencing a NULL file pointer.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

SCTP: Fix thinko in sctp_copy_laddrs()

Correctly dereference bytes_copied in sctp_copy_laddrs().
I totally must have spaced when doing this.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] always allow dump_stack() to produce a backtrace
  [ARM] Fix non-page aligned boot time mappings
  [ARM] 4458/1: pxa: Fix CKEN usage and hence fix pxa suspend/resume
  [ARM] 4454/1: Use word accesses in Versatile PCI config reads

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: document some of keycodes
  Input: add a new EV_SW SW_RADIO event, for radio switches on laptops
  Input: serio - take drv_mutex in serio_cleanup()
  Input: atkbd - use printk_ratelimit for spurious ACK messages
  Input: atkbd - throttle LED switching
  Input: i8042 - add HP Pavilion ZT1000 to the MUX blacklist

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Update defconfigs
  [POWERPC] Uninline and export virq_to_hw() for the pasemi_mac driver
  [POWERPC] Fix PMI breakage in cbe_cbufreq driver
  [POWERPC] Disable old EMAC driver in arch/powerpc

Fix slab redzone alignment

Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 fixed a couple of bugs
by switching the redzone to 64 bits. Unfortunately, it neglected to
ensure that the _second_ redzone, after the slab object, is aligned
correctly. This caused illegal instruction faults on sparc32, which for
some reason not entirely clear to me are not trapped and fixed up.

Two things need to be done to fix this:
  - increase the object size, rounding up to alignof(long long) so
    that the second redzone can be aligned correctly.
  - If SLAB_STORE_USER is set but alignof(long long)==8, allow a
    full 64 bits of space for the user word at the end of the buffer,
    even though we may not _use_ the whole 64 bits.

This patch should be a no-op on any 64-bit architecture or any 32-bit
architecture where alignof(long long) == 4. Of the others, it's tested
on ppc32 by myself and a very similar patch was tested on sparc32 by
Mark Fortescue, who reported the new problem.

Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
the new sizes. Again noticed by Mark.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[ARM] always allow dump_stack() to produce a backtrace

Don't make this dependent on CONFIG_DEBUG_KERNEL - if we hit a WARN_ON
we need the stack trace to work out how we got to that point.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Remove the blink driver

Yeah, we could have just disabled it, but there's work on a new one that
isn't as fundamentally broken, so there really doesn't seem to be any
point in keeping it around.

The recent timer cleanup broke the only valid use, and when I say
"valid", I obviously mean "totally broken". So it's not like it works,
or really even can work in the current format that uses the unsafe
"panic" LED blinking routines..

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[ARM] Fix non-page aligned boot time mappings

AT91SAM9260 stopped booting with the recent changes to MM
initialisation - it was asking for a non-aligned virtual address
which caused loops to be non-terminal. Fix this by rounding
virtual addresses down, but remember to include the offset in
the length, and round the length up to the following page.

This means that asking for a mapping of 4K starting at 2K into
a page maps two pages as one would expect.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] VSMP: Fix initialization ordering bug.
  [MIPS] Add whitelists for checksyscalls.sh
  [MIPS] die(): Properly declare as non-returning
  [MIPS] Fix include wrapper symbol definitions in IP32 code.

[MIPS] VSMP: Fix initialization ordering bug.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Add whitelists for checksyscalls.sh

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] die(): Properly declare as non-returning

This marks the declaration of die() correctly, removing "control reaches
end of non-void function" warnings from non-void functions that die() at
the end.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix include wrapper symbol definitions in IP32 code.

Some IP35 defines snuck into some IP32-specific code during the DMA re-write.

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[JFFS2] Fix readinode failure when read_dnode() detects CRC failure.

We should have stopped returning 1 from read_dnode() to indicate
failure. We can just mark the damn thing obsolete immediately. But I
missed a case where we don't.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>

Remove some unused variables

When Andi reverted the HPET resource reservation (in commit
0f8dc2f06560e2ca126d1670a24126ba08357d38), he didn't remove the now
unused variables, which just causes gcc to be noisy.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dio: remove bogus refcounting BUG_ON

Badari Pulavarty reported a case of this BUG_ON is triggering during
testing.  It's completely bogus and should be removed.

It's trying to notice if we left references to the dio hanging around in
the sync case.  They should have been dropped as IO completed while this
path was in dio_await_completion().  This condition will also be
checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
lines lower.  So to start this BUG_ON() is redundant.

More fatally, it's dereferencing dio-> after having dropped its
reference.  It's only safe to dereference the dio after releasing the
lock if the final reference was just dropped.  Another CPU might free
the dio in bio completion and reuse the memory after this path drops the
dio lock but before the BUG_ON() is evaluated.

This patch passed aio+dio regression unit tests and aio-stress on ext3.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Revert perfctr reservation to 2.6.21 state

With this change it works again when the nmi watchdog is disabled.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Revert HPET resource reservation

Matthias Lenk reports that the PCI subsystem would move the HPET on
SB400/SB600-based systems, where the HPET is in BAR1 of the SMbus
controller.

The reason? The ACPI layer registered the PCI MMIO range as being busy
too early, before PCI enumeration had happened, causing the PCI layer to
decide that it should relocate the resources somewhere else.

Firmware resources should be marked busy _after_ the PCI enumeration and
probing has happened, not before.

Remove the too-early reservation, we'll fix it up to do it properly
later. In the meantime, this solves the regression.

Tested-by: Matthias Lenk <matthias.lenk@amd.com>
Cc: Aaron Durbin <adurbin@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6:
  ide: ide_scan_pcibus(): check __pci_register_driver return value
  ide: pdc202xx_new PLL input clock fix
  it821x: fix incorrect SWDMA mask
  amd74xx: resume fix
  hpt366: use correct enablebits for HPT36x
  hpt366: blacklist MAXTOR STM3320620A for UltraDMA/66
  ide: Fix a theoretical Ooops case
  ide: never called printk statement in ide-taskfile.c::wait_drive_not_busy

Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb

* 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
  V4L/DVB (5822): Fix the return value in ttpci_budget_init()
  V4L/DVB (5818): CinergyT2: fix flush_workqueue() vs work->func() deadlock
  V4L/DVB (5816): Cx88-blackbird: fix vidioc_g_tuner never ending list of tuners
  V4L/DVB (5808): Bttv: fix v4l1 breaking the driver