Jan Kara [Wed, 6 Feb 2008 09:37:36 +0000 (01:37 -0800)]
quota: improve inode list scanning in add_dquot_ref()
We restarted scan of sb->s_inodes list whenever we had to drop inode_lock
in add_dquot_ref(). This leads to overall quadratic running time and thus
add_dquot_ref() can take several minutes when called on a life filesystem.
We fix the problem by using the fact that inode cannot be removed from
s_inodes list while we hold a reference to it and thus we can safely
restart the scan if we don't drop the reference. Here we use the fact that
inodes freshly added to s_inodes list are already guaranteed to have quotas
properly initialized and the ordering of inodes on s_inodes list does not
change so we cannot skip any inode.
Thanks goes to Nick <gentuu@gmail.com> for analyzing the problem and
testing the fix.
[akpm@linux-foundation.org: iput(NULL) is legal] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Nick <gentuu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denis Cheng [Wed, 6 Feb 2008 09:37:35 +0000 (01:37 -0800)]
drivers/char: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.
Signed-off-by: Denis Cheng <crquan@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paulo Marques [Wed, 6 Feb 2008 09:37:33 +0000 (01:37 -0800)]
kallsyms should prefer non weak symbols
When resolving symbol names from addresses with aliased symbol names,
kallsyms_lookup always returns the first symbol, even if it is a weak
symbol.
This patch changes this by sorting the symbols with the weak symbols last
before feeding them to the kernel. This way the kernel runtime isn't
changed at all, only the kallsyms build system is changed.
Another side effect is that the symbols get sorted by address, too. So,
even if future binutils version have some bug in "nm" that makes it fail to
correctly sort symbols by address, the kernel won't be affected by this.
Mathieu says:
I created a module in LTTng that uses kallsyms to get the symbol
corresponding to a specific system call address. Unfortunately, all the
unimplemented syscalls were all referring to the (same) weak symbol
identifying an unrelated system call rather that sys_ni (or whatever
non-weak symbol would be expected). Kallsyms was dumbly returning the first
symbol that matched.
This patch makes sure kallsyms returns the non-weak symbol when there is
one, which seems to be the expected result.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Looks-great-to: Rusty Russell <rusty@rustcorp.com.au> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Drake [Wed, 6 Feb 2008 09:37:30 +0000 (01:37 -0800)]
Documentation about unaligned memory access
Here's a document I wrote after figuring out what unaligned memory access
is all about. I've tried to cover the information I was looking for when
trying to learn about this, without producing a hopelessly detailed/complex
spew. I hope it is useful to others.
Signed-off-by: Daniel Drake <dsd@gentoo.org> Cc: Rob Landley <rob@landley.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Jan Engelhardt <jengelh@computergmbh.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Kyle Moffett <mrmacman_g4@mac.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 6 Feb 2008 09:37:29 +0000 (01:37 -0800)]
inotify: remove debug code
The inotify debugging code is supposed to verify that the
DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in
notifications getting lost nor extra needless locking generated.
Unfortunately there are also some races in the debugging code. And it isn't
very good at finding problems anyway. So remove it for now.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Wed, 6 Feb 2008 09:37:28 +0000 (01:37 -0800)]
inotify: fix race
There is a race between setting an inode's children's "parent watched" flag
when placing the first watch on a parent, and instantiating new children of
that parent: a child could miss having its flags set by
set_dentry_child_flags, but then inotify_d_instantiate might still see
!inotify_inode_watched.
The solution is to set_dentry_child_flags after adding the watch. Locking is
taken care of, because both set_dentry_child_flags and inotify_d_instantiate
hold dcache_lock and child->d_locks.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove superfluous checks for CONFIG_BLK_DEV_INITRD from initramfs.c
Given that init/Makefile includes initramfs.c in the build only if
CONFIG_BLK_DEV_INITRD is defined, there seems to be no point checking for
it yet again.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove rcu_assign_pointer() penalty for NULL pointers
The rcu_assign_pointer() primitive currently unconditionally executes a
memory barrier, even when a NULL pointer is being assigned. This has lead
some to avoid using rcu_assign_pointer() for NULL pointers, which loses the
self-documenting advantages of rcu_assign_pointer() This patch uses
__builtin_const_p() to omit needless memory barriers for NULL-pointer
assignments at compile time with no runtime penalty, as discussed in the
following thread:
Tested on x86_64 and ppc64, also compiled the four cases (NULL/non-NULL
and const/non-const) with gcc version 4.1.2, and hand-checked the
assembly output.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Dike [Wed, 6 Feb 2008 09:37:19 +0000 (01:37 -0800)]
Fix __const_udelay declaration and definition mismatches
The declaration and implementation of __const_udelay use different
names for the parameter on a number of architectures:
include/asm-avr32/delay.h:15:extern void __const_udelay(unsigned long usecs);
arch/avr32/lib/delay.c:39:inline void __const_udelay(unsigned long xloops)
include/asm-sh/delay.h:15:extern void __const_udelay(unsigned long usecs);
arch/sh/lib/delay.c:22:inline void __const_udelay(unsigned long xloops)
include/asm-m32r/delay.h:15:extern void __const_udelay(unsigned long usecs);
arch/m32r/lib/delay.c:58:void __const_udelay(unsigned long xloops)
include/asm-x86/delay.h:16:extern void __const_udelay(unsigned long usecs);
arch/x86/lib/delay_32.c:82:inline void __const_udelay(unsigned long xloops)
arch/x86/lib/delay_64.c:46:inline void __const_udelay(unsigned long xloops)
The units of the parameter isn't usecs, so that name is definitely
wrong. It's also not exactly loops, so I suppose xloops is an OK
name.
This patch changes these names from usecs to xloops.
Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Fulghum [Wed, 6 Feb 2008 09:37:18 +0000 (01:37 -0800)]
synclink_gt fix missed serial input signal changes
Fix missed serial input signal changes caused by rereading the serial
status register during interrupt processing. Now processing is performed
on original status register value.
Signed-off-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
synclink: standardize format of linux header file include's with "<>"
Use the recommended form of "<>" to include linux header files, and
move those includes up to join the rest of the linux includes.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Wed, 6 Feb 2008 09:37:16 +0000 (01:37 -0800)]
get rid of NR_OPEN and introduce a sysctl_nr_open
NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.
Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.
Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.
This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.
[akpm@linux-foundation.org: export it for sparc64] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jones [Wed, 6 Feb 2008 09:37:13 +0000 (01:37 -0800)]
via-rng: enable secondary noise source on CPUs where it is present
In the padlock spec:
"SRC Bits[9:8] Noise source select (I): These bits control the two noise
sources on the processor that input bits to the accumulation buffers.
On Nehemiah processors prior to stepping 8, these bits are reserved
and undefined. The default RESET state is both bits = 0."
Signed-off-by: Dave Jones <davej@redhat.com> Tested-by: Udo van den Heuvel <udovdh@xs4all.nl> Cc: Michael Buesch <mb@bu3sch.de> Cc: folkert van Heusden <folkert@vanheusden.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 6 Feb 2008 09:37:13 +0000 (01:37 -0800)]
inotify: send IN_ATTRIB events when link count changes
Currently, no notification event has been sent when inode's link count
changed. This is inconvenient for the application in some cases:
Suppose you have the following directory structure
foo/test
bar/
and you watch test. If someone does "mv foo/test bar/", you get event
IN_MOVE_SELF and you know something has happened with the file "test".
However if someone does "ln foo/test bar/test" and "rm foo/test" you get no
inotify event for the file "test" (only directories "foo" and "bar" receive
events).
Furthermore it could be argued that link count belongs to file's metadata and
thus IN_ATTRIB should be sent when it changes.
The following patch implements sending of IN_ATTRIB inotify events when link
count of the inode changes, i.e., when a hardlink to the inode is created or
when it is removed. This event is sent in addition to all the events sent so
far. In particular, when a last link to a file is removed, IN_ATTRIB event is
sent in addition to IN_DELETE_SELF event.
Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Morten Welinder <mwelinder@gmail.com> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Steven French <sfrench@us.ibm.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hfs: update comment to reflect actual init and exit routines
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NCPFS: update diagnostic strings to match routine names.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Schmidt [Wed, 6 Feb 2008 09:37:07 +0000 (01:37 -0800)]
proc: loadavg reading race
The avenrun[] values are supposed to be protected by xtime_lock.
loadavg_read_proc does not use it. Theoretically this may result in an
occasional glitch when the value read from /proc/loadavg would be as much
as 1<<11 times higher than it should be.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Herbert Xu [Wed, 6 Feb 2008 09:37:05 +0000 (01:37 -0800)]
Avoid divide in IS_ALIGN
I was happy to discover the brand new IS_ALIGN macro and quickly used it in
my code. To my dismay I found that the generated code used division to
perform the test.
This patch fixes it by changing the % test to an &. This avoids the
division.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard MUSIL [Wed, 6 Feb 2008 09:37:02 +0000 (01:37 -0800)]
tpm.c: fix crash during device removal
The clean up procedure now uses platform device "release" callback to
handle memory clean up. For this purpose "release" function callback was
added to struct tpm_vendor_specific, so hw device driver provider can get
called when it is safe to remove all allocated resources.
This is supposed to fix a bug in device removal, where device while in
receive function (waiting on timeout) was prone to segfault, if the
tpm_chip struct was unallocated before the timeout expired (in
tpm_remove_hardware).
Denys Vlasenko [Wed, 6 Feb 2008 09:37:02 +0000 (01:37 -0800)]
printk.c: use unsigned ints instead of longs for logbuf index
Stop using unsigned _longs_ for printk buffer indexes. Log buffer is way
smaller than 2 gigabytes and unsigned ints will work too . Indeed, they do
work nicely on all 32-bit platforms where longs and ints are the same.
With this patch, we have following size savings on amd64:
text data bss dec hex filename
5997 313 17736 24046 5dee 2.6.23.1.t64/kernel/printk.o
5858 313 17700 23871 5d3f 2.6.23.1.printk.t64/kernel/printk.o
Joern Engel [Wed, 6 Feb 2008 09:36:59 +0000 (01:36 -0800)]
Document I_SYNC and I_DATASYNC
After some archeology (see http://logfs.org/logfs/inode_state_bits) I
finally figured out what the three I_DIRTY bits do. Maybe others would
prefer less effort to reach this insight.
Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
log2.h: Define order_base_2() macro for convenience.
Given a number of places in the tree that need to calculate this value
explicitly, might as well just create a macro for it.
(akpm: must be implemented as a macro for callee typeof() usage)
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 6 Feb 2008 09:36:54 +0000 (01:36 -0800)]
cciss: use upper_32_bits() macro to eliminate warnings
Use upper_32_bits(x) macro to handle shifts that may be >= the width of
the data type.
drivers/block/cciss.c: In function 'do_cciss_request':
drivers/block/cciss.c:2655: warning: right shift count >= width of type
drivers/block/cciss.c:2656: warning: right shift count >= width of type
drivers/block/cciss.c:2657: warning: right shift count >= width of type
drivers/block/cciss.c:2658: warning: right shift count >= width of type
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Wed, 6 Feb 2008 09:36:43 +0000 (01:36 -0800)]
smbfs: fix calculation of kernel_recvmsg size parameter in smb_receive()
smb_receive calls kernel_recvmsg with a size that's the minimum of the
amount of buffer space in the kvec passed in or req->rq_rlen (which
represents the length of the response). This does not take into account
any data that was read in a request earlier pass through smb_receive.
If the first pass through smb_receive receives some but not all of the
response, then the next pass can call kernel_recvmsg with a size field
that's too big. kernel_recvmsg can overrun into the next response,
throwing off the alignment and making it unrecognizable.
This causes messages like this to pop up in the ring buffer:
smb_get_length: Invalid NBT packet, code=69
as well as other errors indicating that the response is unrecognizable.
Typically this is seen on a smbfs mount under heavy I/O.
This patch changes the code to use (req->rq_rlen - req->rq_bytes_recvd)
instead instead of just req->rq_rlen, since that should represent the
amount of unread data in the response.
I think this is correct, but an ACK or NACK from someone more familiar
with this code would be appreciated...
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some time ago ( http://lkml.org/lkml/2007/6/19/128 ) I wrote about
MNT_UNBINDABLE that it felt like a bug that it is not reset by "mount
--make-private".
Today I happened to see mount(8) and Documentation/sharedsubtree.txt and
both document the version obtained by applying the little patch given in
the above (and again below).
So, the present kernel code is not according to specs and must be regarded
as buggy.
Specification in Documentation/sharedsubtree.txt:
See state diagram: unbindable should become private upon make-private.
Specification in mount(8):
... It's
also possible to set up uni-directional propagation (with --make-
slave), to make a mount point unavailable for --bind/--rbind (with
--make-unbindable), and to undo any of these (with --make-private).
David Woodhouse [Wed, 6 Feb 2008 09:36:27 +0000 (01:36 -0800)]
Allow auto-destruction of loop devices
This allows a flag to be set on loop devices so that when they are
closed for the last time, they'll self-destruct.
In general, so that we can automatically allocate loop devices (as with
losetup -f) and have them disappear when we're done with them.
In particular, right now, so that we can stop relying on the hackish
special-case in umount(8) which kills off loop devices which were set up by
'mount -oloop'. That means we can stop putting crap in /etc/mtab which
doesn't belong there, which means it can be a symlink to /proc/mounts, which
means yet another writable file on the root filesystem is eliminated and the
'stateless' folks get happier... and OLPC trac #356 can be closed.
The mount(8) side of that is at
http://marc.info/?l=util-linux-ng&m=119362955431694&w=2
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Bernardo Innocenti <bernie@codewiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robin Getz [Wed, 6 Feb 2008 09:36:26 +0000 (01:36 -0800)]
remove support for un-needed _extratext section
When passing a zero address to kallsyms_lookup(), the kernel thought it was
a valid kernel address, even if it is not. This is because is_ksym_addr()
called is_kernel_extratext() and checked against labels that don't exist on
many archs (which default as zero). Since PPC was the only kernel which
defines _extra_text, (in 2005), and no longer needs it, this patch removes
_extra_text support.
For some history (provided by Jon):
http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019734.html
http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019736.html
http://ozlabs.org/pipermail/linuxppc-dev/2005-September/019751.html
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jon Loeliger <jdl@freescale.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matt Domsch [Wed, 6 Feb 2008 09:36:24 +0000 (01:36 -0800)]
dcdbas: add DMI-based module autloading
DMI autoload dcdbas on all Dell systems.
This looks for BIOS Vendor or System Vendor == Dell, so this should
work for systems both Dell-branded and those Dell builds but brands
for others. It causes udev to load the dcdbas module at startup,
which is used by tools called by HAL for wireless control and
backlight control, among other uses.
Thanks to Kay Sievers for figuring out how to do this with a single alias.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Miller [Wed, 6 Feb 2008 09:36:23 +0000 (01:36 -0800)]
Genericizing iova.[ch]
I would like to potentially move the sparc64 IOMMU code over to using
the nice new drivers/pci/iova.[ch] code for free area management..
In order to do that we have to detach the IOMMU page size assumptions
which only really need to exist in the intel-iommu.[ch] code.
This patch attempts to implement that.
[akpm@linux-foundation.org: build fix] Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pierre Peiffer [Wed, 6 Feb 2008 09:36:23 +0000 (01:36 -0800)]
IPC: fix error check in all new xxx_lock() and xxx_exit_ns() functions
In the new implementation of the [sem|shm|msg]_lock[_check]() routines, we
use the return value of ipc_lock() in container_of() without any check.
But ipc_lock may return a errcode. The use of this errcode in
container_of() may alter this errcode, and we don't want this.
And in xxx_exit_ns, the pointer return by idr_find is of type 'struct
kern_ipc_per'...
Today, the code will work as is because the member used in these
container_of() is the first member of its container (offset == 0), the
errcode isn't changed then. But in the general case, we can't count on
this assumption and this may lead later to a real bug if we don't correct
this.
Again, the proposed solution is simple and correct. But, as pointed by
Nadia, with this solution, the same check will be done several times (in
all sub-callers...), what is not very funny/optimal...
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Antipov [Wed, 6 Feb 2008 09:36:19 +0000 (01:36 -0800)]
SIGIO-driven I/O with inotify queues
Add SIGIO-driven I/O for descriptors returned by inotify_init(). The thing
may be enabled by convenient fcntl (fd, F_SETFL, O_ASYNC) call.
Signed-off-by: Dmitry Antipov <antipov@dev.rtsoft.ru> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext2 file system was by default ignoring errors and continuing. This is
not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system readonly. Debian
and ubuntu already does this as the default in their fstab.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes some instances where we were continuing after calling
ext2_error. ext2_error call panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext2_error call.
Matthew Wilcox [Wed, 6 Feb 2008 09:36:14 +0000 (01:36 -0800)]
hash: add explicit u32 and u64 versions of hash
The 32-bit version is more efficient (and apparently gives better hash
results than the 64-bit version), so users who are only hashing a 32-bit
quantity can now opt to use the 32-bit version explicitly, rather than
promoting to a long.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 6 Feb 2008 09:36:13 +0000 (01:36 -0800)]
use __set_task_state() for TRACED/STOPPED tasks
1. It is much easier to grep for ->state change if __set_task_state() is used
instead of the direct assignment.
2. ptrace_stop() and handle_group_stop() use set_task_state() which adds the
unneeded mb() (btw even if we use mb() it is still possible that do_wait()
sees the new ->state but not ->exit_code, but this is ok).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Neuling [Wed, 6 Feb 2008 09:36:12 +0000 (01:36 -0800)]
taskstats scaled time cleanup
This moves the ability to scale cputime into generic code. This allows us
to fix the issue in kernel/timer.c (noticed by Balbir) where we could only
add an unscaled value to the scaled utime/stime.
This adds a cputime_to_scaled function. As before, the POWERPC version
does the scaling based on the last SPURR/PURR ratio calculated. The
generic and s390 (only other arch to implement asm/cputime.h) versions are
both NOPs.
Also moves the SPURR and PURR snapshots closer.
Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Garzik [Wed, 6 Feb 2008 09:36:11 +0000 (01:36 -0800)]
riscom8: fix SMP brokenness
After analyzing the elements that save_flags/cli/sti/restore_flags were
protecting, convert their usages to a global spinlock (the easiest and
most obvious next-step). There were some usages of flags being
intentionally cached, because the code already knew the state of
interrupts. These have been taken into account.
This allows us to remove CONFIG_BROKEN_ON_SMP. Completely untested.
[akpm@linux-foundation.org: use DEFINE_SPINLOCK] Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yan Zheng [Wed, 6 Feb 2008 09:36:09 +0000 (01:36 -0800)]
A potential bug in inotify_user.c
Following comment is at fs/inotify_user.c:287
/* coalescing: drop this event if it is a dupe of the previous */
I think the previous event in the comment should be the last event in the
link list. But inotify_dev_get_event return the first event in the list.
In addition, it doesn't check whether the list is empty
Signed-off-by: Yan Zheng<yanzheng@21cn.com> Acked-by: Robert Love <rlove@rlove.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Wed, 6 Feb 2008 09:36:08 +0000 (01:36 -0800)]
fs/fat/: refine chmod checks
Prohibit mode changes in non-quiet mode that cannot be stored reliably with
the on-disk format.
Suppose a vfat filesystem is mounted with umask=0 and [not-quiet]. Then
all files will have mode 0777. Trying to change the owner will fail,
because fat does not know about owners or groups. chmod 0770, on the other
hand, will succeed, even though fat does not know about the permission
triplet [user/group/other].
So this patch changes fat's not-quiet behavior so that only UNIX modes are
accepted that can be mapped lossless between the fat disk format and the
local system. There is only one attribute, and that is the readonly
attribute, which is mapped to the UNIX write permission bit(s). chmod 0555
is therefore valid (taking away the +w bits <=> setting the readonly
attribute). Since chmod 0775 and chmod 0755 is an ambiguous case as to
whether to set or clear the readonly bit, these modes are also denied.
In quiet mode, chmod and chown will continue to "succeed" as they did
before, meaning that a subsequent stat() will temporarily return the new
mode as long as the inode is not reread from disk, and chown will silently
do nothing, not even return the new uid/gid in stat().
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Wed, 6 Feb 2008 01:57:54 +0000 (02:57 +0100)]
ide-tape: cleanup the remaining codestyle issues
... thus decreasing checkpatch.pl errors to 0.
Bart:
- remove needless function prototypes while at it
- remove needless parentheses while at it
- add missing KERN_ level to ide_tape_probe()
- other minor fixups