Eric Dumazet [Tue, 8 May 2007 07:32:57 +0000 (00:32 -0700)]
Speed up divides by cpu_power in scheduler
I noticed expensive divides done in try_to_wakeup() and
find_busiest_group() on a bi dual core Opteron machine (total of 4 cores),
moderatly loaded (15.000 context switch per second)
oprofile numbers :
CPU: AMD64 processors, speed 2600.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 50000
samples % symbol name
...
613914 1.0498 try_to_wake_up
834 0.0013 :ffffffff80227ae1: div %rcx
77513 0.1191 :ffffffff80227ae4: mov %rax,%r11
Some of these divides can use the reciprocal divides we introduced some
time ago (currently used in slab AFAIK)
We can assume a load will fit in a 32bits number, because with a
SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432
When/if we reach this limit one day, probably cpus will have a fast
hardware divide and we can zap the reciprocal divide trick.
Ingo suggested to rename cpu_power to __cpu_power to make clear it should
not be modified without changing its reciprocal value too.
I did not convert the divide in cpu_avg_load_per_task(), because tracking
nr_running changes may be not worth it ? We could use a static table of 32
reciprocal values but it would add a conditional branch and table lookup.
[akpm@linux-foundation.org: !SMP build fix] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the process idle load balancing in the presence of dynticks. cpus for
which ticks are stopped will sleep till the next event wakes it up.
Potentially these sleeps can be for large durations and during which today,
there is no periodic idle load balancing being done.
This patch nominates an owner among the idle cpus, which does the idle load
balancing on behalf of the other idle cpus. And once all the cpus are
completely idle, then we can stop this idle load balancing too. Checks added
in fast path are minimized. Whenever there are busy cpus in the system, there
will be an owner(idle cpu) doing the system wide idle load balancing.
Open items:
1. Intelligent owner selection (like an idle core in a busy package).
2. Merge with rcu's nohz_cpu_mask?
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sched: fix idle load balancing in softirqd context
Periodic load balancing in recent kernels happen in the softirq. In
certain -rt configurations, these softirqs are handled in softirqd context.
And hence the check for idle processor was always returning busy (as
nr_running > 1).
This patch captures the idle information at the tick and passes this info
to softirq context through an element 'idle_at_tick' in rq.
[kernel@kolivas.org: Fix reverse idle at tick logic] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 8 May 2007 07:32:31 +0000 (00:32 -0700)]
inode numbering: change libfs sb creation routines to avoid collisions with their root inodes
This patch makes it so that simple_fill_super and get_sb_pseudo assign their
root inodes to be number 1. It also fixes up a couple of callers of
simple_fill_super that were passing in files arrays that had an index at
number 1, and adds a warning for any caller that sends in such an array.
It would have been nice to have made it so that it wasn't possible to make
such a collision, but some callers need to be able to control what inode
number their entries get, so I think this is the best that can be done.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Tue, 8 May 2007 07:32:29 +0000 (00:32 -0700)]
inode numbering: make static counters in new_inode and iunique be 32 bits
The problems are:
- on filesystems w/o permanent inode numbers, i_ino values can be larger
than 32 bits, which can cause problems for some 32 bit userspace programs on
a 64 bit kernel. We can't do anything for filesystems that have actual
>32-bit inode numbers, but on filesystems that generate i_ino values on the
fly, we should try to have them fit in 32 bits. We could trivially fix this
by making the static counters in new_inode and iunique 32 bits, but...
- many filesystems call new_inode and assume that the i_ino values they are
given are unique. They are not guaranteed to be so, since the static
counter can wrap. This problem is exacerbated by the fix for #1.
- after allocating a new inode, some filesystems call iunique to try to get
a unique i_ino value, but they don't actually add their inodes to the
hashtable, and so they're still not guaranteed to be unique if that counter
wraps.
This patch set takes the simpler approach of simply using iunique and hashing
the inodes afterward. Christoph H. previously mentioned that he thought that
this approach may slow down lookups for filesystems that currently hash their
inodes.
The questions are:
1) how much would this slow down lookups for these filesystems?
2) is it enough to justify adding more infrastructure to avoid it?
What might be best is to start with this approach and then only move to using
IDR or some other scheme if these extra inodes in the hashtable prove to be
problematic.
I've done some cursory testing with this patch and the overhead of hashing and
unhashing the inodes with pipefs is pretty low -- just a few seconds of system
time added on to the creation and destruction of 10 million pipes (very
similar to the overhead that the IDR approach would add).
The hard thing to measure is what effect this has on other filesystems. I'm
open to ways to try and gauge this.
Again, I've only converted pipefs as an example. If this approach is
acceptable then I'll start work on patches to convert other filesystems.
With a pretty-much-worst-case microbenchmark provided by Eric Dumazet
<dada1@cosmosbay.com>:
When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error. We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on the fly, we
ought to try and have them fit within a 32 bit field.
This patch takes the first step toward this by making the static counters in
these two functions be 32 bits.
[jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat] Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Nikitenko [Tue, 8 May 2007 07:32:25 +0000 (00:32 -0700)]
au1550 SPI controller driver
Here is a driver for the Alchemy au1550 PSC (Programmable Serial
Controller) in SPI master mode.
It supports dma transfers using the Alchemy descriptor based dma controller
for 4-8 bits per word SPI transfers. For 9-24 bits per word transfers, pio
irq based mode is used to avoid setup of dma channels from scratch on each
number of bits per word change.
Tested with au1550; this may also work on other MIPS Alchemy cpus, like
au1200/au1210/au1250. Used extensively with SD card connected via SPI;
this handles 8.1MHz SPI clock transfers using dma without any problem (the
highest SPI clock freq possible with au1550 running on 324MHz).
The driver supports sharing of SPI bus by multiple devices. All features
of Alchemy SPI mode are supported (all SPI modes, msb/lsb first, bits per
word in 4-24 range).
As the SPI clock of the controller depends on main input clock that shall
be configured externally, platform data structure for au1550 SPI controller
driver contains mainclk_hz attribute to define the input clock rate. From
this value, dividers of the controller for SPI clock are set up for
required frequency.
Signed-off-by: Jan Nikitenko <jan.nikitenko@gmail.com>
Whitespace and section fixups. Remove partial workaround for platform
setup bug in dma_mask setup; it couldn't work with multiple controllers.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:32:21 +0000 (00:32 -0700)]
SPI kerneldoc
Various documentation updates for the SPI infrastructure, to clarify things
that may not have been clear, to cope with lack of editing, and fix
omissions.
Also, plug SPI into the kernel-api DocBook template, and fix all the
resulting glitches in document generation.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:32:13 +0000 (00:32 -0700)]
minor spi_butterfly cleanup
Simplify the spi_butterfly driver by removing incomplete/unused support for
the second SPI bus, implemented by the USI controller. This should make
this a clearer example of how to write a parport bitbang driver.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Invalid return value of execve() resulting in oopses
When elf loader fails to map executable (due to memory shortage or because
binary is malformed), it can return 0. Normally, this is invisible because
process is killed with SIGKILL and it never returns to user space.
But if exec() is called from kernel thread (hotplug, whatever)
consequences are more interesting and vary depending on architecture.
i386. Nothing especially interesting, execve() just returns
with "success" :-)
x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
with zeros, ergo... double fault.
ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it
oopses due to return to zero PC, sometimes it sees NaT in
rXX and oopses due to NaT consumption.
Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 8 May 2007 07:31:51 +0000 (00:31 -0700)]
hide spinlock in linux/quota.h behind __KERNEL__
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Jan Kara <jack@ucw.cz> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Woodhouse [Tue, 8 May 2007 07:31:49 +0000 (00:31 -0700)]
Add taskstats.h to kbuild
Add taskstats.h to include/linux/Kbuild, make headers_install would then
pickup taskstats.h. This needs to be done as taskstats.h is a user
interface header.
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 8 May 2007 07:31:43 +0000 (00:31 -0700)]
cpusets: allow empty {cpus,mems}_allowed to be set for unpopulated cpuset
You currently cannot remove all cpus or mems from cpus_allowed or
mems_allowed of a cpuset. We now allow both if there are no attached
tasks.
Acked-by: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simon Horman [Tue, 8 May 2007 07:31:40 +0000 (00:31 -0700)]
Update the list information for kexec and kdump
There is a new list for kexec/kdump discussion.
Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Mollett [Tue, 8 May 2007 07:31:31 +0000 (00:31 -0700)]
udf: decrement correct link count in udf_rmdir
It appears that a minor thinko occurred in udf_rmdir and the
(already-cleared) link count on the directory that is being removed was
being decremented instead of the link count on its parent directory. This
gives rise to lots of kernel messages similar to:
when removing directory trees. No other ill effects have been observed but
I guess it could theoretically result in the link count overflowing on a
very long-lived, much modified directory.
Signed-off-by: Stephen Mollett <molletts@yahoo.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Jan Kara <jack@ucw.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OGAWA Hirofumi [Tue, 8 May 2007 07:31:28 +0000 (00:31 -0700)]
fat: fix VFAT compat ioctls on 64-bit systems
If you compile and run the below test case in an msdos or vfat directory on
an x86-64 system with -m32 you'll get garbage in the kernel_dirent struct
followed by a SIGSEGV.
The patch fixes this.
Reported and initial fix by Bart Oldeman
#include <sys/types.h>
#include <sys/ioctl.h>
#include <dirent.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
struct kernel_dirent {
long d_ino;
long d_off;
unsigned short d_reclen;
char d_name[256]; /* We must not include limits.h! */
};
#define VFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct kernel_dirent [2])
#define VFAT_IOCTL_READDIR_SHORT _IOR('r', 2, struct kernel_dirent [2])
int main(void)
{
int fd = open(".", O_RDONLY);
struct kernel_dirent de[2];
while (1) {
int i = ioctl(fd, VFAT_IOCTL_READDIR_BOTH, (long)de);
if (i == -1) break;
if (de[0].d_reclen == 0) break;
printf("SFN: reclen=%2d off=%d ino=%d, %-12s",
de[0].d_reclen, de[0].d_off, de[0].d_ino, de[0].d_name);
if (de[1].d_reclen)
printf("\tLFN: reclen=%2d off=%d ino=%d, %s",
de[1].d_reclen, de[1].d_off, de[1].d_ino, de[1].d_name);
printf("\n");
}
return 0;
}
dma_declare_coherent_memory() allocates a bitmap 1 bit per page, it
calculates the bitmap size based on size of long, but allocates bytes...
Thanks to James Bottomley for clarifications and corrections.
Signed-off-by: G. Liakhovetski <g.liakhovetski@gmx.de> Acked-by: James Bottomley <James.Bottomley@SteelEye.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several people have observed that perhaps LOG_BUF_SHIFT should be in a more
obvious place than under DEBUG_KERNEL. Under some circumstances (such as the
PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an
undesirable trade off for something as trivial as increasing the kernel log
buffer size.
Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more
likely to be able to change it such a circumstance that the default buffer
size is insufficient.
Signed-off-by: Alistair John Strachan <s0348365@sms.ed.ac.uk> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 8 May 2007 07:31:11 +0000 (00:31 -0700)]
consolidate asm/const.h to linux/const.h
Make a global linux/const.h header file instead of having multiple,
per-arch files, and convert current users of asm/const.h to use
linux/const.h.
Built on x86_64 and sparc64.
[akpm@linux-foundation.org: fix include/asm-x86_64/Kbuild] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Parag Warudkar [Tue, 8 May 2007 07:31:09 +0000 (00:31 -0700)]
tpm: fix sleep-in-spinlock
flush_scheduled_work() can sleep, and we're calling it under spinlock.
AFAICS, moving flush_scheduled_work before spin_lock() should not cause any
problems.
Reason being - The only thing that can race against tpm_release is tpm_open
(tpm_release is called when last reference to the file is closed and only
thing that can happen after that is tpm_open??) and tpm_open acquires
driver_lock and more over it bails out with EBUSY if chip->num_opens is
greater than 0.
I also moved chip->num_pending-- to after deleting timer and setting data
pending as it looks more correct for the paranoid although it probably doesn't
matter as it is guarded by driver_lock. None the less this change should not
cause problems.
While I was at it I noticed a missing NULL check in tpm_register_hardware
which is fixed with this patch as well.
Jesper Juhl [Tue, 8 May 2007 07:31:06 +0000 (00:31 -0700)]
Fix chapter reference in CodingStyle
commit 226a6b84aaaf1fac7a5d41cf4e7387fd9ba895d5 renumbered Chapter 11 in
Documentation/CodingStyle to Chapter 12, but it didn't update the reference
to that chapter further down in the file. This patch corrects the chapter
reference.
Jan Kara [Tue, 8 May 2007 07:31:04 +0000 (00:31 -0700)]
ext3: copy i_flags to inode flags on write
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext2-specific i_flags. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Gibson [Tue, 8 May 2007 07:30:57 +0000 (00:30 -0700)]
Clean up mostly unused IOSPACE macros
Most architectures defined three macros, MK_IOSPACE_PFN(), GET_IOSPACE()
and GET_PFN() in pgtable.h. However, the only callers of any of these
macros are in Sparc specific code, either in arch/sparc, arch/sparc64 or
drivers/sbus.
This patch removes the redundant macros from all architectures except
sparc and sparc64.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Tue, 8 May 2007 07:30:54 +0000 (00:30 -0700)]
kill warnings when building mandocs
This patch shuts warnings of the sort:
make -C /mnt/samsung_200/sam/kernel/trees/21-rc6/build \
KBUILD_SRC=/mnt/samsung_200/sam/kernel/trees/21-rc6 \
KBUILD_EXTMOD="" -f /mnt/samsung_200/sam/kernel/trees/21-rc6/Makefile mandocs
make -f /mnt/samsung_200/sam/kernel/trees/21-rc6/scripts/Makefile.build obj=scripts/basic
make -f /mnt/samsung_200/sam/kernel/trees/21-rc6/scripts/Makefile.build obj=Documentation/DocBook mandocs
SRCTREE=/mnt/samsung_200/sam/kernel/trees/21-rc6/ /mnt/samsung_200/sam/kernel/trees/21-rc6/build/scripts/basic/docproc doc /mnt/samsung_200/sam/kernel/trees/21-rc6/Documentation/DocBook/wanbook.tmpl >Documentation/DocBook/wanbook.xml
if grep -q refentry Documentation/DocBook/wanbook.xml; then xmlto man -m /mnt/samsung_200/sam/kernel/trees/21-rc6/Documentation/DocBook/stylesheet.xsl -o Documentation/DocBook/man Documentation/DocBook/wanbook.xml ; gzip -f Documentation/DocBook/man/*.9; fi
Note: meta version: No productnumber or alternative sppp_close
Note: meta version: No refmiscinfo@class=version sppp_close
Note: Writing sppp_close.9
Note: meta version: No productnumber or alternative sppp_open
Note: meta version: No refmiscinfo@class=version sppp_open
by adding a RefMiscInfo xml tag in the form of the current kernel version
to the function, struct and enum definitions in files included by
kernel-doc when building 'mandocs'. However, the version string appears
truncated on the manpage due to some constraints in the xml DTD for the man
header, I believe, for the troff output is truncated too.
The UTF-8 part of the vt driver suffers from the following issues which are
addressed in my patch:
1) If there's no glyph found for a particular valid UTF-8 character, we try
to display U+FFFD. However if this one is not found either, here's what
the current kernel does:
- First, if the Unicode value is less than the number of glyphs, use the
glyph directly from that position of the glyph table. While it may be a
good idea in the 8-bit world, it has absolutely no sense with Unicode
in mind. For example, if a Latin-2 font is loaded and an application
prints U+00FB ("u with circumflex", not present in Latin-2) then as a
fallback solution the glyph from the 0xFB position of the Latin-2
fontset (which is an "u with double accent" - a different character) is
displayed.
- Second, if this fallback fails too, a simple ASCII question mark is
printed, which is visually undistinguishable from a real question mark.
I changed the code to skip the first step (except if in non-UTF-8 mode),
and changed the second step to print the question mark with inverse color
attributes, so it is visually clear that it's not a real question mark,
and resembles more to the common glyph of U+FFFD.
2) The UTF-8 decoder is buggy in many ways:
- Lone continuation bytes (section 3.1 of Markus Kuhn's UTF-8 stress
test) are not caught, they are displayed as some "random" (taken
directly form the font table, see above) glyphs instead the replacement
character.
- Incomplete sequences (sections 3.2 and 3.3 of the stress test) emit no
replacement character, but rather cause the subsequent valid character
to be displayed more times(!).
- The decoder is not safe: overlong sequences are not caught currently,
they are displayed as if these were valid representations. This may
even have security impacts.
- The decoder does not handle D800..DFFF and FFFE..FFFF specially, it
just emits these code points and lets it be looked up in the glyph
table. Since these are invalid code points, I replace them by U+FFFD
and hence give no chance for them to be looked up in the glyph table.
(Assuming no font ships glyphs for these code points, this change is
not visible to the users since the glyph shown will be the same.)
With my fixes to the decoder it now behaves exactly as Markus Kuhn's
stress test recommends.
3) It has no concept of double-width (CJK) characters. It's way beyond the
scope of my patch to try to display them, but at least I think it's
important for the cursor to jump two positions when printing such
characters, since this is what applications (such as text editors)
expect. Currently the cursor only jumps one position, and hence
applications suffer from displaying and refreshing problems, and editing
some English letters that are preceded by some CJK characters in the same
line is a nightmare. With my patch an additional space is inserted after
the CJK character has been printed (which usually means a replacement
symbol of course). (If U+FFFD isn't availble and hence an inverse
question mark is displayed in the first cell, I keep the inverted state
for the space in the 2nd column so it's quite easy to see that they are
tied together.)
4) There is a small built-in table of zero-width spaces that are not to be
printed but silently skipped. U+200A is included there, but it's not a
zero-width character, so I remove it from there.
Signed-off-by: Egmont Koblinger <egmont@uhulinux.hu> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 8 May 2007 07:30:33 +0000 (00:30 -0700)]
ext3: copy i_flags to inode flags on write
A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc. from
i_flags to EXT3_I(inode)->i_flags when inode is written to disk. The same
thing is done on GETFLAGS ioctl.
Quota code changes these flags on quota files (to make it harder for
sysadmin to screw himself) and these changes were not correctly propagated
into the filesystem (especially, lsattr did not show them and users were
wondering...).
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext3-specific i_flags. Hence, when someone sets these flags via a
different interface than ioctl, they are stored correctly.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tom Alsberg [Tue, 8 May 2007 07:30:31 +0000 (00:30 -0700)]
CPU time limit patch / setrlimit(RLIMIT_CPU, 0) cheat fix
As discovered here today, the change in Kernel 2.6.17 intended to inhibit
users from setting RLIMIT_CPU to 0 (as that is equivalent to unlimited) by
"cheating" and setting it to 1 in such a case, does not make a difference,
as the check is done in the wrong place (too late), and only applies to the
profiling code.
On all systems I checked running kernels above 2.6.17, no matter what the
hard and soft CPU time limits were before, a user could escape them by
issuing in the shell (sh/bash/zsh) "ulimit -t 0", and then the user's
process was not ever killed.
Attached is a trivial patch to fix that. Simply moving the check to a
slightly earlier location (specifically, before the line that actually
assigns the limit - *old_rlim = new_rlim), does the trick.
Do note that at least the zsh (but not ash, dash, or bash) shell has the
problem of "caching" the limits set by the ulimit command, so when running
zsh the fix will not immediately be evident - after entering "ulimit -t 0",
"ulimit -a" will show "-t: cpu time (seconds) 0", even though the actual
limit as returned by getrlimit(...) will be 1. It can be verified by
opening a subshell (which will not have the values of the parent shell in
cache) and checking in it, or just by running a CPU intensive command like
"echo '65536^1048576' | bc" and verifying that it dumps core after one
second.
Regardless of whether that is a misfeature in the shell, perhaps it would
be better to return -EINVAL from setrlimit in such a case instead of
cheating and setting to 1, as that does not really reflect the actual state
of the process anymore. I do not however know what the ground for that
decision was in the original 2.6.17 change, and whether there would be any
"backward" compatibility issues, so I preferred not to touch that right
now.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Nemoto [Tue, 8 May 2007 07:30:26 +0000 (00:30 -0700)]
serial_txx9: Use assigned device numbers
The serial_txx9 driver have abused device numbers (major 4, minor 128) if
CONFIG_SERIAL_TXX9_STDSERIAL was not set. This patch makes the driver use
proper device numbers assigned for it (major 204, minor 196-203). I
suppose a typical user of this driver set CONFIG_SERIAL_TXX9_STDSERIAL to Y
(i.e. use "ttyS0"), so this patch would not cause big compatibility issue.
Apparently it's not cool anymore to use SPIN/RW_LOCK_UNLOCKED. There's
some mention of this in Documentation/spinlocks.txt, but that only talks
about dynamic initialisation.
A comment in the code mentioning the preferred usage would be good IMHO.
[akpm@linux-foundation.org: add reason for deprecation] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.
Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patches modifies the pnpbios kernel thread to start with ktrhead_run
not kernel_thread and deamonize. Doing this makes the code a little
simpler and easier to maintain.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Tue, 8 May 2007 07:30:03 +0000 (00:30 -0700)]
highres/dyntick: prevent xtime lock contention
While the !highres/!dyntick code assigns the duty of the do_timer() call to
one specific CPU, this was dropped in the highres/dyntick part during
development.
Steven Rostedt discovered the xtime lock contention on highres/dyntick due
to several CPUs trying to update jiffies.
Add the single CPU assignement back. In the dyntick case this needs to be
handled carefully, as the CPU which has the do_timer() duty must drop the
assignement and let it be grabbed by another CPU, which is active.
Otherwise the do_timer() calls would not happen during the long sleep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Acked-by: Mark Lord <mlord@pobox.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Cameron [Tue, 8 May 2007 07:30:02 +0000 (00:30 -0700)]
cciss: include scsi/scsi.h unconditionally
Make cciss unconditionally include scsi/scsi.h, because of the use of
SCSI_IOCTL_GET_IDLUN and SCSI_IOCTL_GET_BUS_NUMBER.
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bjorn Helgaas [Tue, 8 May 2007 07:29:57 +0000 (00:29 -0700)]
EFI: warn only for pre-1.00 system tables
We used to warn unless the EFI system table major revision was exactly 1.
But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't
affect anything in Linux.
Randy Dunlap [Tue, 8 May 2007 07:29:51 +0000 (00:29 -0700)]
kernel-doc: html mode struct highlights
Johannes Berg reported that struct names are not highlighted
(bold, italic, etc.) in html kernel-doc output. (Also not in
text-mode output, but I don't see that changing.)
This patch adds the following:
- highlight struct names in html output mode
- highlight environment var. names in html output mode
- indent struct fields in the original struct layout
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeffrey Layton [Tue, 8 May 2007 07:29:48 +0000 (00:29 -0700)]
make iunique use a do/while loop rather than its obscure goto loop
A while back, Christoph mentioned that he thought that iunique ought to be
cleaned up to use a more conventional loop construct. This patch does that,
turning the strange goto loop into a do/while.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Johansen [Tue, 8 May 2007 07:29:44 +0000 (00:29 -0700)]
Remove redundant check from proc_sys_setattr()
notify_change() already calls security_inode_setattr() before
calling iop->setattr.
Alan sayeth
This is a behaviour change on all of these and limits some behaviour of
existing established security modules
When inode_change_ok is called it has side effects. This includes
clearing the SGID bit on attribute changes caused by chmod. If you make
this change the results of some rulesets may be different before or after
the change is made.
I'm not saying the change is wrong but it does change behaviour so that
needs looking at closely (ditto all other attribute twiddles)
Signed-off-by: Steve Beattie <sbeattie@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: John Johansen <jjohansen@suse.de> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John Johansen [Tue, 8 May 2007 07:29:41 +0000 (00:29 -0700)]
Remove redundant check from proc_setattr()
notify_change() already calls security_inode_setattr() before
calling iop->setattr.
Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: John Johansen <jjohansen@suse.de> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 8 May 2007 07:29:39 +0000 (00:29 -0700)]
fix hotplug for legacy platform drivers
We've had various reports of some legacy "probe the hardware" style
platform drivers having nasty problems with hotplug support.
The core issue is that those legacy drivers don't fully conform to the
driver model. They assume a role that should be the responsibility of
infrastructure code: creating device nodes.
The "modprobe" step in hotplugging relies on drivers to have split those
roles into different modules. The lack of this split causes the problems.
When a driver creates nodes for devices that don't exist (sending a hotplug
event), then exits (aborting one modprobe) before the "modprobe $MODALIAS"
step completes (by failing, since it's in the middle of a modprobe), the
result can be an endless loop of modprobe invocations ... badness.
This fix uses the newish per-device flag controlling issuance of "add"
events. (A previous version of this patch used a per-device "driver can
hotplug" flag, which only scrubbed $MODALIAS from the environment rather
than suppressing the entire hotplug event.) It also shrinks that flag to
one bit, saving a word in "struct device".
So the net of this patch is removing some nasty failures with legacy
drivers, while retaining hotplug capability for the majority of platform
drivers.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Greg KH <gregkh@suse.de> Cc: Andres Salomon <dilinger@debian.org> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Set rq->errors more correctly in cciss driver. Previously we had set it
synonymously with the meaning of the last parameter of end_that_last_request
and complete_buffers (the "uptodate" parameter) and had gotten away with it
for all this time because nobody ever looked at rq->errors.
SCSI_IOCTL_SEND_COMMAND looks at rq->errors, so now it matters that it be
right.
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com> Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For all of you that think cciss should be a scsi driver here is the patch that
you have been waiting for all these years. This patch actually adds the SG_IO
ioctl to cciss. The primary purpose is for clustering and high-availibilty.
But now anyone can exploit this ioctl in any manner they wish.
Note, SCSI_IOCTL_SEND_COMMAND doesn't work with this patch due to rq->errors
being set incorrectly. Subsequent patch fixes that.
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com> Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reformat some error handling code to reduce line lengths a bit.
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com> Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"Be careful: Write spaces around the ..., for otherwise it may be
parsed wrong when you use it with integer values."
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 8 May 2007 07:29:18 +0000 (00:29 -0700)]
dtlk: fix error checks in module_init()
This patch fixes two things in module_init.
- fix register_chrdev() error check
Currently dtlk doesn't check register_chrdev() failure correctly.
register_chrdev() returns a errno on failure.
- check probe failure
dtlk ignores probe failure and allows the module loading without
such device. I got "Trying to free nonexistent resource" message
by release_region() when unloading module without device.
[akpm@linux-foundation.org: fix error code return] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Chris Pallotta <chris@allmedia.com> Cc: Jim Van Zandt <jrv@vanzandt.mv.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for the Motorola sysv68 disk partition (slices in motorola
doc).
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We noticed a drop in n/w performance due to the irq_desc being cacheline
aligned rather than internode aligned. We see 50% of expected performance
when two e1000 nics local to two different nodes have consecutive irq
descriptors allocated, due to false sharing.
Note that this patch does away with cacheline padding for the UP case, as
it does not seem useful for UP configurations.
Pavel Emelianov [Tue, 8 May 2007 07:29:10 +0000 (00:29 -0700)]
Lockdep treats down_write_trylock like regular down_write
This causes constructions like
down_write(&mm1->mmap_sem);
if (down_write_trylock(&mm2->mmap_sem)) {
...
up_write(&mm2->mmap_sem);
}
up_write(&mm1->mmap_sem);
generate a lockdep warning about circular locking dependence.
Call rwsem_acquire() with trylock set to 1.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that there is no arch-specific compat ioctl handling left there is not
point in having a separate copat_ioctl.h, so merge it into compat_ioctl.c
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Tue, 8 May 2007 07:29:05 +0000 (00:29 -0700)]
kernel-doc: handle arrays with arithmetic expressions as initializers
In a different approach here's a patch that handles the special case of
composite arithmetic expressions in array size initializers. With it,
prior to pushing the split strings on the @first_arg array, I split the
keywords before the array name as before and then keep the array name along
with the subscript expression as a single whole element which gets pushed
last. In this manner, kernel-doc produces correct output without removing
whitespaces which makes the array subscripts unreadable in the docs.
Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>