This patch provides a debugfs knob to turn kprobes on/off
o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
not (default enabled)
o Echoing 0 to this file will disarm all installed probes
o Any new probe registration when disabled will register the probe but
not arm it. A message will be printed out in such a case.
o When a value 1 is echoed to the file, all probes (including ones
registered in the intervening period) will be enabled
o Unregistration will happen irrespective of whether probes are globally
enabled or not.
o Update Documentation/kprobes.txt to reflect these changes. While there
also update the doc to make it current.
We are also looking at providing sysrq key support to tie to the disabling
feature provided by this patch.
[akpm@linux-foundation.org: Use bool like a bool!]
[akpm@linux-foundation.org: add printk facility levels]
[cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390] Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- consolidate duplicate code in all arch_prepare_kretprobe instances
into common code
- replace various odd helpers that use hlist_for_each_entry to get
the first elemenet of a list with either a hlist_for_each_entry_save
or an opencoded access to the first element in the caller
- inline add_rp_inst into it's only remaining caller
- use kretprobe_inst_table_head instead of opencoding it
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marko Vrh [Tue, 8 May 2007 07:34:09 +0000 (00:34 -0700)]
rtc-cmos: make it load on PNPBIOS systems
Replace CONFIG_PNPACPI with CONFIG_PNP, so it loads on ACPI-less PNPBIOS
systems.
Signed-off-by: Marko Vrh <mvrh@freeshells.ch> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:34:07 +0000 (00:34 -0700)]
rtc: remove "RTC_ALM_SET mode" bugs
This fixes a common glitch in how RTC drivers handle two "set alarm" modes,
by getting rid of the surprising/hidden one that was rarely implemented
correctly (and which could expose nonportable hardware-specific behavior).
The glitch comes from the /dev/rtcX logic implementing the legacy
RTC_ALM_SET (limited to 24 hours, needing RTC_AIE_ON) ioctl on top of the
RTC driver call providing access to the newer RTC_WKALM_SET (without those
limitations) by initializing the day/month/year fields to be invalid ...
that second mode.
Now, since few RTC drivers check those fields, and most hardware misbehaves
when faced with invalid date fields, many RTC drivers will set bogus alarm
times on those RTC_ALM_SET code paths. (Several in-tree drivers have that
issue, and I also noticed it with code reviews on several new RTC drivers.)
This patch ensures that RTC drivers never see such invalid alarm fields, by
moving some logic out of rtc-omap into the RTC_ALM_SET code and adding an
explicit check (which will prevent the issue on other code paths).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Scott Wood <scottwood@freescale.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 8 May 2007 07:34:05 +0000 (00:34 -0700)]
revert "rtc: Add rtc_merge_alarm()"
David says "884b4aaaa242a2db8c8252796f0118164a680ab5 should be reverted. It
added an rtc_merge_alarm() call to the 2.6.20 kernel, which hasn't yet been
used by any in-tree driver; this patch obviates the need for that call, and
uses a more robust approach."
Cc: Scott Wood <scottwood@freescale.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:34:03 +0000 (00:34 -0700)]
workaround rtc-related acpi table bugs
This works around a bug seen in some RTC-related ACPI table entries, and
tweaks related diagnostics to follow the ACPI convention.
The bug prevents misleading boot-time messages: platforms affected by this
bug wrongly report they can support alarms up to one year in the future,
when in fact the longest alarm is just 24 hours. That will surprise anyone
trying to use those extended alarms.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:34:02 +0000 (00:34 -0700)]
ACPI wakeup hooks for rtc-cmos
Remove /proc/acpi/alarm file when the rtc-cmos "wakealarm" file is available.
Instead, provide hooks that rtc-cmos will use.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:34:00 +0000 (00:34 -0700)]
rtc-cmos wakeup interface
I finally got around to testing the updated wakeup event hooks for rtc-cmos,
and they follow in two patches:
- Interface update ... when a simple enable_irq_wake() doesn't suffice,
the platform data can hold suspend/resume callback hooks.
- ACPI implementation ... provides callback hooks to do ACPI magic, and
eliminate the legacy /proc/acpi/alarm file.
The interface update could go into 2.6.21, but that's not essential; they
will be NOPs on most PCs, without the ACPI stuff.
I suspect the ACPI folk may have opinions about how to merge that second
patch, and how to obsolete that legacy procfs file. I'd like to see that
merge into 2.6.22 if possible...
As for how to kick it in ... two ways:
- The appended "rtcwake" program; updated since the last time it was
posted, it deals much better with timezones and DST.
- Write the /sys/class/rtc/.../wakealarm file, then go to sleep.
For some reason RTC wake from "swsusp" stopped working on a system where
it previously worked; the alarm setting appears to get clobbered. But
on the bright side, RTC wake from "standby" worked on a system that had
never been able to resume from that state before ... IDEACPI is my guess
as to why it finally started to work. It's the old "two steps forward,
one step back" dance, I guess.
/*
* rtcwake -- enter a system sleep state until specified wakeup time.
*
* This uses cross-platform Linux interfaces to enter a system sleep state,
* and leave it no later than a specified time. It uses any RTC framework
* driver that supports standard driver model wakeup flags.
*
* This is normally used like the old "apmsleep" utility, to wake from a
* suspend state like ACPI S1 (standby) or S3 (suspend-to-RAM). Most
* platforms can implement those without analogues of BIOS, APM, or ACPI.
*
* On some systems, this can also be used like "nvram-wakeup", waking
* from states like ACPI S4 (suspend to disk). Not all systems have
* persistent media that are appropriate for such suspend modes.
*
* The best way to set the system's RTC is so that it holds the current
* time in UTC. Use the "-l" flag to tell this program that the system
* RTC uses a local timezone instead (maybe you dual-boot MS-Windows).
*/
static char *progname;
#ifdef DEBUG
#define VERSION "1.0 dev (" __DATE__ " " __TIME__ ")"
#else
#define VERSION "0.9"
#endif
static unsigned verbose;
static int rtc_is_utc = -1;
/* this process works in RTC time, except when working
* with the system clock (which always uses UTC).
*/
if (rtc_is_utc)
setenv("TZ", "UTC", 1);
tzset();
/* read rtc and system clocks "at the same time", or as
* precisely (+/- a second) as we can read them.
*/
if (ioctl(fd, RTC_RD_TIME, &rtc) < 0) {
perror("read rtc time");
return 0;
}
sys_time = time(0);
if (sys_time == (time_t)-1) {
perror("read system time");
return 0;
}
/* convert rtc_time to normal arithmetic-friendly form,
* updating tm.tm_wday as used by asctime().
*/
memset(&tm, 0, sizeof tm);
tm.tm_sec = rtc.tm_sec;
tm.tm_min = rtc.tm_min;
tm.tm_hour = rtc.tm_hour;
tm.tm_mday = rtc.tm_mday;
tm.tm_mon = rtc.tm_mon;
tm.tm_year = rtc.tm_year;
tm.tm_isdst = rtc.tm_isdst; /* stays unspecified? */
rtc_time = mktime(&tm);
/* what system power mode to use? for now handle only
* standardized mode names; eventually when systems define
* their own state names, parse /sys/power/state.
*
* "on" is used just to test the RTC alarm mechanism,
* bypassing all the wakeup-from-sleep infrastructure.
*/
case 'm':
if (strcmp(optarg, "standby") == 0
|| strcmp(optarg, "mem") == 0
|| strcmp(optarg, "disk") == 0
|| strcmp(optarg, "on") == 0
) {
suspend = optarg;
break;
}
printf("%s: unrecognized suspend state '%s'\n",
progname, optarg);
goto usage;
/* alarm time, seconds-to-sleep (relative) */
case 's':
t = atoi(optarg);
if (t < 0) {
printf("%s: illegal interval %s seconds\n",
progname, optarg);
goto usage;
}
seconds = t;
break;
/* alarm time, time_t (absolute, seconds since 1/1 1970 UTC) */
case 't':
t = atoi(optarg);
if (t < 0) {
printf("%s: illegal time_t value %s\n",
progname, optarg);
goto usage;
}
alarm = t;
break;
case 'u':
rtc_is_utc = 1;
break;
case 'v':
verbose++;
break;
case 'V':
printf("%s: version %s\n", progname, VERSION);
break;
if (!alarm && !seconds) {
printf("%s: must provide wake time\n", progname);
goto usage;
}
/* REVISIT: if /etc/adjtime exists, read it to see what
* the util-linux version of hwclock assumes.
*/
if (rtc_is_utc == -1) {
printf("%s: assuming RTC uses UTC ...\n", progname);
rtc_is_utc = 1;
}
/* this RTC must exist and (if we'll sleep) be wakeup-enabled */
fd = open(devname, O_RDONLY);
if (fd < 0) {
perror(devname);
return 1;
}
if (strcmp(suspend, "on") != 0 && !may_wakeup(devname)) {
printf("%s: %s not enabled for wakeup events\n",
progname, devname);
return 1;
}
/* relative or absolute alarm time, normalized to time_t */
if (!get_basetimes(fd))
return 1;
if (verbose)
printf("alarm %ld, sys_time %ld, rtc_time %ld, seconds %u\n",
alarm, sys_time, rtc_time, seconds);
if (alarm) {
if (alarm < sys_time) {
printf("%s: time doesn't go backward to %s",
progname, ctime(&alarm));
return 1;
}
alarm += sys_time - rtc_time;
} else
alarm = rtc_time + seconds + 1;
if (setup_alarm(fd, &alarm) < 0)
return 1;
sync();
printf("%s: wakeup from \"%s\" using %s at %s",
progname, suspend, devname,
ctime(&alarm));
fflush(stdout);
usleep(10 * 1000);
if (strcmp(suspend, "on") != 0)
suspend_system(suspend);
else {
unsigned long data;
do {
t = read(fd, &data, sizeof data);
if (t < 0) {
perror("rtc read");
break;
}
if (verbose)
printf("... %s: %03lx\n", devname, data);
} while (!(data & RTC_AF));
}
if (ioctl(fd, RTC_AIE_OFF, 0) < 0)
perror("disable rtc alarm interrupt");
close(fd);
return 0;
}
This patch:
Make rtc-cmos do the relevant magic so this RTC can wake the system from a
sleep state. That magic comes in two basic flavors:
- Straightforward: enable_irq_wake(), the way it'd work on most SOC chips;
or generally with system sleep states which don't disable core IRQ logic.
- Roundabout, using non-IRQ platform hooks. This is needed with ACPI and
one almost-clone chip which uses a special wakeup-only alarm. (That's
the RTC used on Footbridge boards, FWIW, which don't do PM in Linux.)
A separate patch implements those hooks for ACPI platforms, so that rtc_cmos
can issue system wakeup events (and its sysfs "wakealarm" attribute works on
at least some systems).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:33:46 +0000 (00:33 -0700)]
rtc: update to class device removal patches
Fix a goof in the revised classdev support for RTCs: make sure the /dev
node info is ready before the device is registered, not after. Otherwise
the /sys/class/rtc/rtcN/dev attribute won't be created and then udev won't
have the information it needs to create the /dev/rtcN node.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:33:42 +0000 (00:33 -0700)]
rtc: suspend()/resume() restores system clock
RTC class suspend/resume support, re-initializing the system clock on resume
from the clock used to initialize it at boot time.
- The reinit-on-resume is hooked to the existing RTC_HCTOSYS config
option, on the grounds that a clock good enough for init must also
be good enough for re-init.
- Inlining a version of the code used by ARM, to save and restore the
delta between a selected RTC and the current system wall-clock time.
- Removes calls to that ARM code from AT91, OMAP1, and S3C RTCs. This
means that systems using those RTCs across suspend/resume will likely
want to change their kernel configs to enable RTC_HCTOSYS.
If HCTOSYS isn't using a second RTC (with battery?), this changes the
system's initial date from Jan 1970 to the epoch this hardware uses:
1998 for AT91, 2000 for OMAP1 (assuming no split power mode), etc.
This goes on top of the patch series removing "struct class_device" usage
from the RTC framework. That's all needed for class suspend()/resume().
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-By: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:33:38 +0000 (00:33 -0700)]
rtc: simplified /proc/driver/rtc handling
This simplifies the RTC procfs support by removing the class_interface that
hooks it into the rtc core. If it's configured, then sysfs support is now
part of the RTC core, and is never a separate module.
It also removes the class_interface hook, now that its last remaining user is
gone. (That API is usable only with a "struct class_device".)
It's another step towards being able to remove "struct class_device".
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-By: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:33:33 +0000 (00:33 -0700)]
rtc: simplified rtc sysfs attribute handling
This simplifies the RTC sysfs support by removing the class_interface that
hooks it into the rtc core. If it's configured, then sysfs support is now
part of the RTC core, and is never a separate module.
It's another step towards being able to remove "struct class_device".
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-By: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:33:30 +0000 (00:33 -0700)]
rtc: rtc interfaces don't use class_device
This patch removes class_device from the programming interface that the RTC
framework exposes to the rest of the kernel. Now an rtc_device is passed,
which is more type-safe and streamlines all the relevant code.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-By: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:33:27 +0000 (00:33 -0700)]
rtc: remove /sys/class/rtc-dev/*
This simplifies the /dev support by removing a superfluous class_device (the
/sys/class/rtc-dev stuff) and the class_interface that hooks it into the rtc
core. Accordingly, if it's configured then /dev support is now part of the
RTC core, and is never a separate module.
It's another step towards being able to remove "struct class_device".
[bunk@stusta.de: drivers/rtc/rtc-dev.c should #include "rtc-core.h"] Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-By: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ulrich Drepper [Tue, 8 May 2007 07:33:25 +0000 (00:33 -0700)]
utimensat implementation
Implement utimensat(2) which is an extension to futimesat(2) in that it
a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
of the BSD lutimes(3) functions
For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.
Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.
Also, the completely missing futimensat() functionality is added. We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).
Test application (the syscall number will need per-arch editing):
if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to zero");
status = 1;
}
if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to zero");
status = 1;
}
if (status != 0)
goto out;
if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
|| st2.st_atim.tv_sec > tv.tv_sec)
{
puts ("atim not set to NOW");
status = 1;
}
if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
|| st2.st_mtim.tv_sec > tv.tv_sec)
{
puts ("mtim not set to NOW");
status = 1;
}
if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
{
puts ("atim not reset to one");
status = 1;
}
if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
{
puts ("mtim not reset to one");
status = 1;
}
if (status == 0)
puts ("all OK");
out:
close (fd);
unlink ("ttt");
unlink ("tttsym");
Josh Triplett [Tue, 8 May 2007 07:33:22 +0000 (00:33 -0700)]
rcutorture: Remove redundant assignment to cur_ops in for loop
The for loop in rcutorture_init uses the condition
cur_ops = torture_ops[i], cur_ops
but then makes the same assignment to cur_ops inside the loop. Remove the
redundant assignment inside the loop, and remove now-unnecessary braces.
Signed-off-by: Josh Triplett <josh@kernel.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sched: redundant reschedule when set_user_nice() boosts a prio of a task from the "expired" array
- Make TASK_PREEMPTS_CURR(task, rq) return "true" only if the task's prio
is higher than the current's one and the task is in the "active" array.
This ensures we don't make redundant resched_task() calls when the task
is in the "expired" array (as may happen now in set_user_prio(),
rt_mutex_setprio() and pull_task() ) ;
- generalise conditions for a call to resched_task() in set_user_nice(),
rt_mutex_setprio() and sched_setscheduler()
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Con Kolivas <kernel@kolivas.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sched: optimize siblings status check logic in wake_idle()
When a logical cpu 'x' already has more than one process running, then most
likely the siblings of that cpu 'x' must be busy. Otherwise the idle
siblings would have likely(in most of the scenarios) picked up the extra
load making the load on 'x' atmost one.
Use this logic to eliminate the siblings status check and minimize the cache
misses encountered on a heavily loaded system.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Tue, 8 May 2007 07:32:57 +0000 (00:32 -0700)]
Speed up divides by cpu_power in scheduler
I noticed expensive divides done in try_to_wakeup() and
find_busiest_group() on a bi dual core Opteron machine (total of 4 cores),
moderatly loaded (15.000 context switch per second)
oprofile numbers :
CPU: AMD64 processors, speed 2600.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 50000
samples % symbol name
...
613914 1.0498 try_to_wake_up
834 0.0013 :ffffffff80227ae1: div %rcx
77513 0.1191 :ffffffff80227ae4: mov %rax,%r11
Some of these divides can use the reciprocal divides we introduced some
time ago (currently used in slab AFAIK)
We can assume a load will fit in a 32bits number, because with a
SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432
When/if we reach this limit one day, probably cpus will have a fast
hardware divide and we can zap the reciprocal divide trick.
Ingo suggested to rename cpu_power to __cpu_power to make clear it should
not be modified without changing its reciprocal value too.
I did not convert the divide in cpu_avg_load_per_task(), because tracking
nr_running changes may be not worth it ? We could use a static table of 32
reciprocal values but it would add a conditional branch and table lookup.
[akpm@linux-foundation.org: !SMP build fix] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the process idle load balancing in the presence of dynticks. cpus for
which ticks are stopped will sleep till the next event wakes it up.
Potentially these sleeps can be for large durations and during which today,
there is no periodic idle load balancing being done.
This patch nominates an owner among the idle cpus, which does the idle load
balancing on behalf of the other idle cpus. And once all the cpus are
completely idle, then we can stop this idle load balancing too. Checks added
in fast path are minimized. Whenever there are busy cpus in the system, there
will be an owner(idle cpu) doing the system wide idle load balancing.
Open items:
1. Intelligent owner selection (like an idle core in a busy package).
2. Merge with rcu's nohz_cpu_mask?
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sched: fix idle load balancing in softirqd context
Periodic load balancing in recent kernels happen in the softirq. In
certain -rt configurations, these softirqs are handled in softirqd context.
And hence the check for idle processor was always returning busy (as
nr_running > 1).
This patch captures the idle information at the tick and passes this info
to softirq context through an element 'idle_at_tick' in rq.
[kernel@kolivas.org: Fix reverse idle at tick logic] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 8 May 2007 07:32:31 +0000 (00:32 -0700)]
inode numbering: change libfs sb creation routines to avoid collisions with their root inodes
This patch makes it so that simple_fill_super and get_sb_pseudo assign their
root inodes to be number 1. It also fixes up a couple of callers of
simple_fill_super that were passing in files arrays that had an index at
number 1, and adds a warning for any caller that sends in such an array.
It would have been nice to have made it so that it wasn't possible to make
such a collision, but some callers need to be able to control what inode
number their entries get, so I think this is the best that can be done.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 8 May 2007 07:32:29 +0000 (00:32 -0700)]
inode numbering: make static counters in new_inode and iunique be 32 bits
The problems are:
- on filesystems w/o permanent inode numbers, i_ino values can be larger
than 32 bits, which can cause problems for some 32 bit userspace programs on
a 64 bit kernel. We can't do anything for filesystems that have actual
>32-bit inode numbers, but on filesystems that generate i_ino values on the
fly, we should try to have them fit in 32 bits. We could trivially fix this
by making the static counters in new_inode and iunique 32 bits, but...
- many filesystems call new_inode and assume that the i_ino values they are
given are unique. They are not guaranteed to be so, since the static
counter can wrap. This problem is exacerbated by the fix for #1.
- after allocating a new inode, some filesystems call iunique to try to get
a unique i_ino value, but they don't actually add their inodes to the
hashtable, and so they're still not guaranteed to be unique if that counter
wraps.
This patch set takes the simpler approach of simply using iunique and hashing
the inodes afterward. Christoph H. previously mentioned that he thought that
this approach may slow down lookups for filesystems that currently hash their
inodes.
The questions are:
1) how much would this slow down lookups for these filesystems?
2) is it enough to justify adding more infrastructure to avoid it?
What might be best is to start with this approach and then only move to using
IDR or some other scheme if these extra inodes in the hashtable prove to be
problematic.
I've done some cursory testing with this patch and the overhead of hashing and
unhashing the inodes with pipefs is pretty low -- just a few seconds of system
time added on to the creation and destruction of 10 million pipes (very
similar to the overhead that the IDR approach would add).
The hard thing to measure is what effect this has on other filesystems. I'm
open to ways to try and gauge this.
Again, I've only converted pipefs as an example. If this approach is
acceptable then I'll start work on patches to convert other filesystems.
With a pretty-much-worst-case microbenchmark provided by Eric Dumazet
<dada1@cosmosbay.com>:
When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error. We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on the fly, we
ought to try and have them fit within a 32 bit field.
This patch takes the first step toward this by making the static counters in
these two functions be 32 bits.
[jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat] Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Nikitenko [Tue, 8 May 2007 07:32:25 +0000 (00:32 -0700)]
au1550 SPI controller driver
Here is a driver for the Alchemy au1550 PSC (Programmable Serial
Controller) in SPI master mode.
It supports dma transfers using the Alchemy descriptor based dma controller
for 4-8 bits per word SPI transfers. For 9-24 bits per word transfers, pio
irq based mode is used to avoid setup of dma channels from scratch on each
number of bits per word change.
Tested with au1550; this may also work on other MIPS Alchemy cpus, like
au1200/au1210/au1250. Used extensively with SD card connected via SPI;
this handles 8.1MHz SPI clock transfers using dma without any problem (the
highest SPI clock freq possible with au1550 running on 324MHz).
The driver supports sharing of SPI bus by multiple devices. All features
of Alchemy SPI mode are supported (all SPI modes, msb/lsb first, bits per
word in 4-24 range).
As the SPI clock of the controller depends on main input clock that shall
be configured externally, platform data structure for au1550 SPI controller
driver contains mainclk_hz attribute to define the input clock rate. From
this value, dividers of the controller for SPI clock are set up for
required frequency.
Signed-off-by: Jan Nikitenko <jan.nikitenko@gmail.com>
Whitespace and section fixups. Remove partial workaround for platform
setup bug in dma_mask setup; it couldn't work with multiple controllers.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:32:21 +0000 (00:32 -0700)]
SPI kerneldoc
Various documentation updates for the SPI infrastructure, to clarify things
that may not have been clear, to cope with lack of editing, and fix
omissions.
Also, plug SPI into the kernel-api DocBook template, and fix all the
resulting glitches in document generation.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:32:13 +0000 (00:32 -0700)]
minor spi_butterfly cleanup
Simplify the spi_butterfly driver by removing incomplete/unused support for
the second SPI bus, implemented by the USI controller. This should make
this a clearer example of how to write a parport bitbang driver.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Invalid return value of execve() resulting in oopses
When elf loader fails to map executable (due to memory shortage or because
binary is malformed), it can return 0. Normally, this is invisible because
process is killed with SIGKILL and it never returns to user space.
But if exec() is called from kernel thread (hotplug, whatever)
consequences are more interesting and vary depending on architecture.
i386. Nothing especially interesting, execve() just returns
with "success" :-)
x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
with zeros, ergo... double fault.
ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it
oopses due to return to zero PC, sometimes it sees NaT in
rXX and oopses due to NaT consumption.
Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 8 May 2007 07:31:51 +0000 (00:31 -0700)]
hide spinlock in linux/quota.h behind __KERNEL__
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Jan Kara <jack@ucw.cz> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Woodhouse [Tue, 8 May 2007 07:31:49 +0000 (00:31 -0700)]
Add taskstats.h to kbuild
Add taskstats.h to include/linux/Kbuild, make headers_install would then
pickup taskstats.h. This needs to be done as taskstats.h is a user
interface header.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 8 May 2007 07:31:43 +0000 (00:31 -0700)]
cpusets: allow empty {cpus,mems}_allowed to be set for unpopulated cpuset
You currently cannot remove all cpus or mems from cpus_allowed or
mems_allowed of a cpuset. We now allow both if there are no attached
tasks.
Acked-by: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simon Horman [Tue, 8 May 2007 07:31:40 +0000 (00:31 -0700)]
Update the list information for kexec and kdump
There is a new list for kexec/kdump discussion.
Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Mollett [Tue, 8 May 2007 07:31:31 +0000 (00:31 -0700)]
udf: decrement correct link count in udf_rmdir
It appears that a minor thinko occurred in udf_rmdir and the
(already-cleared) link count on the directory that is being removed was
being decremented instead of the link count on its parent directory. This
gives rise to lots of kernel messages similar to:
when removing directory trees. No other ill effects have been observed but
I guess it could theoretically result in the link count overflowing on a
very long-lived, much modified directory.
Signed-off-by: Stephen Mollett <molletts@yahoo.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Jan Kara <jack@ucw.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OGAWA Hirofumi [Tue, 8 May 2007 07:31:28 +0000 (00:31 -0700)]
fat: fix VFAT compat ioctls on 64-bit systems
If you compile and run the below test case in an msdos or vfat directory on
an x86-64 system with -m32 you'll get garbage in the kernel_dirent struct
followed by a SIGSEGV.
The patch fixes this.
Reported and initial fix by Bart Oldeman
#include <sys/types.h>
#include <sys/ioctl.h>
#include <dirent.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
struct kernel_dirent {
long d_ino;
long d_off;
unsigned short d_reclen;
char d_name[256]; /* We must not include limits.h! */
};
#define VFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct kernel_dirent [2])
#define VFAT_IOCTL_READDIR_SHORT _IOR('r', 2, struct kernel_dirent [2])
int main(void)
{
int fd = open(".", O_RDONLY);
struct kernel_dirent de[2];
while (1) {
int i = ioctl(fd, VFAT_IOCTL_READDIR_BOTH, (long)de);
if (i == -1) break;
if (de[0].d_reclen == 0) break;
printf("SFN: reclen=%2d off=%d ino=%d, %-12s",
de[0].d_reclen, de[0].d_off, de[0].d_ino, de[0].d_name);
if (de[1].d_reclen)
printf("\tLFN: reclen=%2d off=%d ino=%d, %s",
de[1].d_reclen, de[1].d_off, de[1].d_ino, de[1].d_name);
printf("\n");
}
return 0;
}
dma_declare_coherent_memory() allocates a bitmap 1 bit per page, it
calculates the bitmap size based on size of long, but allocates bytes...
Thanks to James Bottomley for clarifications and corrections.
Signed-off-by: G. Liakhovetski <g.liakhovetski@gmx.de> Acked-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several people have observed that perhaps LOG_BUF_SHIFT should be in a more
obvious place than under DEBUG_KERNEL. Under some circumstances (such as the
PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an
undesirable trade off for something as trivial as increasing the kernel log
buffer size.
Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more
likely to be able to change it such a circumstance that the default buffer
size is insufficient.
Signed-off-by: Alistair John Strachan <s0348365@sms.ed.ac.uk> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 8 May 2007 07:31:11 +0000 (00:31 -0700)]
consolidate asm/const.h to linux/const.h
Make a global linux/const.h header file instead of having multiple,
per-arch files, and convert current users of asm/const.h to use
linux/const.h.
Built on x86_64 and sparc64.
[akpm@linux-foundation.org: fix include/asm-x86_64/Kbuild] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Parag Warudkar [Tue, 8 May 2007 07:31:09 +0000 (00:31 -0700)]
tpm: fix sleep-in-spinlock
flush_scheduled_work() can sleep, and we're calling it under spinlock.
AFAICS, moving flush_scheduled_work before spin_lock() should not cause any
problems.
Reason being - The only thing that can race against tpm_release is tpm_open
(tpm_release is called when last reference to the file is closed and only
thing that can happen after that is tpm_open??) and tpm_open acquires
driver_lock and more over it bails out with EBUSY if chip->num_opens is
greater than 0.
I also moved chip->num_pending-- to after deleting timer and setting data
pending as it looks more correct for the paranoid although it probably doesn't
matter as it is guarded by driver_lock. None the less this change should not
cause problems.
While I was at it I noticed a missing NULL check in tpm_register_hardware
which is fixed with this patch as well.
Jesper Juhl [Tue, 8 May 2007 07:31:06 +0000 (00:31 -0700)]
Fix chapter reference in CodingStyle
commit 226a6b84aaaf1fac7a5d41cf4e7387fd9ba895d5 renumbered Chapter 11 in
Documentation/CodingStyle to Chapter 12, but it didn't update the reference
to that chapter further down in the file. This patch corrects the chapter
reference.
Jan Kara [Tue, 8 May 2007 07:31:04 +0000 (00:31 -0700)]
ext3: copy i_flags to inode flags on write
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext2-specific i_flags. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Gibson [Tue, 8 May 2007 07:30:57 +0000 (00:30 -0700)]
Clean up mostly unused IOSPACE macros
Most architectures defined three macros, MK_IOSPACE_PFN(), GET_IOSPACE()
and GET_PFN() in pgtable.h. However, the only callers of any of these
macros are in Sparc specific code, either in arch/sparc, arch/sparc64 or
drivers/sbus.
This patch removes the redundant macros from all architectures except
sparc and sparc64.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Tue, 8 May 2007 07:30:54 +0000 (00:30 -0700)]
kill warnings when building mandocs
This patch shuts warnings of the sort:
make -C /mnt/samsung_200/sam/kernel/trees/21-rc6/build \
KBUILD_SRC=/mnt/samsung_200/sam/kernel/trees/21-rc6 \
KBUILD_EXTMOD="" -f /mnt/samsung_200/sam/kernel/trees/21-rc6/Makefile mandocs
make -f /mnt/samsung_200/sam/kernel/trees/21-rc6/scripts/Makefile.build obj=scripts/basic
make -f /mnt/samsung_200/sam/kernel/trees/21-rc6/scripts/Makefile.build obj=Documentation/DocBook mandocs
SRCTREE=/mnt/samsung_200/sam/kernel/trees/21-rc6/ /mnt/samsung_200/sam/kernel/trees/21-rc6/build/scripts/basic/docproc doc /mnt/samsung_200/sam/kernel/trees/21-rc6/Documentation/DocBook/wanbook.tmpl >Documentation/DocBook/wanbook.xml
if grep -q refentry Documentation/DocBook/wanbook.xml; then xmlto man -m /mnt/samsung_200/sam/kernel/trees/21-rc6/Documentation/DocBook/stylesheet.xsl -o Documentation/DocBook/man Documentation/DocBook/wanbook.xml ; gzip -f Documentation/DocBook/man/*.9; fi
Note: meta version: No productnumber or alternative sppp_close
Note: meta version: No refmiscinfo@class=version sppp_close
Note: Writing sppp_close.9
Note: meta version: No productnumber or alternative sppp_open
Note: meta version: No refmiscinfo@class=version sppp_open
by adding a RefMiscInfo xml tag in the form of the current kernel version
to the function, struct and enum definitions in files included by
kernel-doc when building 'mandocs'. However, the version string appears
truncated on the manpage due to some constraints in the xml DTD for the man
header, I believe, for the troff output is truncated too.
The UTF-8 part of the vt driver suffers from the following issues which are
addressed in my patch:
1) If there's no glyph found for a particular valid UTF-8 character, we try
to display U+FFFD. However if this one is not found either, here's what
the current kernel does:
- First, if the Unicode value is less than the number of glyphs, use the
glyph directly from that position of the glyph table. While it may be a
good idea in the 8-bit world, it has absolutely no sense with Unicode
in mind. For example, if a Latin-2 font is loaded and an application
prints U+00FB ("u with circumflex", not present in Latin-2) then as a
fallback solution the glyph from the 0xFB position of the Latin-2
fontset (which is an "u with double accent" - a different character) is
displayed.
- Second, if this fallback fails too, a simple ASCII question mark is
printed, which is visually undistinguishable from a real question mark.
I changed the code to skip the first step (except if in non-UTF-8 mode),
and changed the second step to print the question mark with inverse color
attributes, so it is visually clear that it's not a real question mark,
and resembles more to the common glyph of U+FFFD.
2) The UTF-8 decoder is buggy in many ways:
- Lone continuation bytes (section 3.1 of Markus Kuhn's UTF-8 stress
test) are not caught, they are displayed as some "random" (taken
directly form the font table, see above) glyphs instead the replacement
character.
- Incomplete sequences (sections 3.2 and 3.3 of the stress test) emit no
replacement character, but rather cause the subsequent valid character
to be displayed more times(!).
- The decoder is not safe: overlong sequences are not caught currently,
they are displayed as if these were valid representations. This may
even have security impacts.
- The decoder does not handle D800..DFFF and FFFE..FFFF specially, it
just emits these code points and lets it be looked up in the glyph
table. Since these are invalid code points, I replace them by U+FFFD
and hence give no chance for them to be looked up in the glyph table.
(Assuming no font ships glyphs for these code points, this change is
not visible to the users since the glyph shown will be the same.)
With my fixes to the decoder it now behaves exactly as Markus Kuhn's
stress test recommends.
3) It has no concept of double-width (CJK) characters. It's way beyond the
scope of my patch to try to display them, but at least I think it's
important for the cursor to jump two positions when printing such
characters, since this is what applications (such as text editors)
expect. Currently the cursor only jumps one position, and hence
applications suffer from displaying and refreshing problems, and editing
some English letters that are preceded by some CJK characters in the same
line is a nightmare. With my patch an additional space is inserted after
the CJK character has been printed (which usually means a replacement
symbol of course). (If U+FFFD isn't availble and hence an inverse
question mark is displayed in the first cell, I keep the inverted state
for the space in the 2nd column so it's quite easy to see that they are
tied together.)
4) There is a small built-in table of zero-width spaces that are not to be
printed but silently skipped. U+200A is included there, but it's not a
zero-width character, so I remove it from there.
Signed-off-by: Egmont Koblinger <egmont@uhulinux.hu> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 8 May 2007 07:30:33 +0000 (00:30 -0700)]
ext3: copy i_flags to inode flags on write
A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc. from
i_flags to EXT3_I(inode)->i_flags when inode is written to disk. The same
thing is done on GETFLAGS ioctl.
Quota code changes these flags on quota files (to make it harder for
sysadmin to screw himself) and these changes were not correctly propagated
into the filesystem (especially, lsattr did not show them and users were
wondering...).
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext3-specific i_flags. Hence, when someone sets these flags via a
different interface than ioctl, they are stored correctly.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tom Alsberg [Tue, 8 May 2007 07:30:31 +0000 (00:30 -0700)]
CPU time limit patch / setrlimit(RLIMIT_CPU, 0) cheat fix
As discovered here today, the change in Kernel 2.6.17 intended to inhibit
users from setting RLIMIT_CPU to 0 (as that is equivalent to unlimited) by
"cheating" and setting it to 1 in such a case, does not make a difference,
as the check is done in the wrong place (too late), and only applies to the
profiling code.
On all systems I checked running kernels above 2.6.17, no matter what the
hard and soft CPU time limits were before, a user could escape them by
issuing in the shell (sh/bash/zsh) "ulimit -t 0", and then the user's
process was not ever killed.
Attached is a trivial patch to fix that. Simply moving the check to a
slightly earlier location (specifically, before the line that actually
assigns the limit - *old_rlim = new_rlim), does the trick.
Do note that at least the zsh (but not ash, dash, or bash) shell has the
problem of "caching" the limits set by the ulimit command, so when running
zsh the fix will not immediately be evident - after entering "ulimit -t 0",
"ulimit -a" will show "-t: cpu time (seconds) 0", even though the actual
limit as returned by getrlimit(...) will be 1. It can be verified by
opening a subshell (which will not have the values of the parent shell in
cache) and checking in it, or just by running a CPU intensive command like
"echo '65536^1048576' | bc" and verifying that it dumps core after one
second.
Regardless of whether that is a misfeature in the shell, perhaps it would
be better to return -EINVAL from setrlimit in such a case instead of
cheating and setting to 1, as that does not really reflect the actual state
of the process anymore. I do not however know what the ground for that
decision was in the original 2.6.17 change, and whether there would be any
"backward" compatibility issues, so I preferred not to touch that right
now.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Nemoto [Tue, 8 May 2007 07:30:26 +0000 (00:30 -0700)]
serial_txx9: Use assigned device numbers
The serial_txx9 driver have abused device numbers (major 4, minor 128) if
CONFIG_SERIAL_TXX9_STDSERIAL was not set. This patch makes the driver use
proper device numbers assigned for it (major 204, minor 196-203). I
suppose a typical user of this driver set CONFIG_SERIAL_TXX9_STDSERIAL to Y
(i.e. use "ttyS0"), so this patch would not cause big compatibility issue.
Apparently it's not cool anymore to use SPIN/RW_LOCK_UNLOCKED. There's
some mention of this in Documentation/spinlocks.txt, but that only talks
about dynamic initialisation.
A comment in the code mentioning the preferred usage would be good IMHO.
[akpm@linux-foundation.org: add reason for deprecation] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.
Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patches modifies the pnpbios kernel thread to start with ktrhead_run
not kernel_thread and deamonize. Doing this makes the code a little
simpler and easier to maintain.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>