Move the definition of 'struct pm_ops' and related functions from <linux/pm.h>
to <linux/suspend.h> .
There are, at least, the following reasons to do that:
* 'struct pm_ops' is specifically related to suspend and not to the power
management in general.
* As long as 'struct pm_ops' is defined in <linux/pm.h>, any modification of it
causes the entire kernel to be recompiled, which is unnecessary and annoying.
* Some suspend-related features are already defined in <linux/suspend.h>, so it
is logical to move the definition of 'struct pm_ops' into there.
* 'struct hibernation_ops', being the hibernation-related counterpart of
'struct pm_ops', is defined in <linux/suspend.h> .
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:04:37 +0000 (03:04 -0700)]
logo.c: get rid of mips_machgroup
This has not been any serious user of this ill conceived thing since the
original invention in like '95 so I recently deleted this from everywhere
except the last instance in logo.c. This patch removes the last two
instances in logo.c. They conditions were not useful anyway as when
compiled in they would always evaluate as true.
Last not least this is necessary to get the SGI IP22 and DECstation kernels
to compile again.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tty_ioctl: fix the baud_table check in encode_baud_rate
The tty_termios_encode_baud_rate() function as defined by tty_ioctl.c has a
problem with the baud_table within. The comparison operators are reversed
and as a result this table's entries never match and BOTHER is always used.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Thu, 18 Oct 2007 10:04:34 +0000 (03:04 -0700)]
Remove CONFIG_VT_UNICODE
Since default_utf8 is already a sysfs attribute, having an extra
CONFIG_VT_UNICODE compile-time option is redundant, since sysfs attributes can
be set at boot and run time.
Also let Linux VCs default to UTF-8 (as per the discussion at
http://lkml.org/lkml/2007/9/6/99).
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Cc: Bill Nottingham <notting@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm not sure that the new URL satifies the requirement of status/info, but
it does at least as good a job as the old URL, and contains current
releases of kexec-tools, rather than somewhat ancient versions.
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Karsten Keil [Thu, 18 Oct 2007 10:04:32 +0000 (03:04 -0700)]
i4l: Fix random hard freeze with AVM c4 card
The patch
- Includes the call to capilib_data_b3_req in the spinlock. This routine
in turn calls the offending mq_enqueue routine that triggered the
freeze if not locked. This should also fix other indicators of
incosistent capilib_msgidqueue list, that trigger messages like:
Oct 5 03:05:57 BERL0 kernel: kcapi: msgid 3019 ncci 0x30301 not on queue
that we saw several times a day (usually several in a row).
- Fixes all occurrences of c4_dispatch_tx to be called with active
spinlock, there were some instances where no lock was active. Mostly
these are in very infrequently called routines, so the additional
performance penalty is minimal.
CC drivers/video/console/newport_con.o
drivers/video/console/newport_con.c: In function 'newport_show_logo':
drivers/video/console/newport_con.c:111: error: assignment of read-only location
drivers/video/console/newport_con.c:111: warning: assignment makes integer from pointer without a cast
drivers/video/console/newport_con.c:112: error: assignment of read-only location
drivers/video/console/newport_con.c:112: warning: assignment makes integer from pointer without a cast
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: lighten up resize transaction requirements
When resizing online, setup_new_group_blocks attempts to reserve a
potentially very large transaction, depending on the current filesystem
geometry. For some journal sizes, there may not be enough room for this
transaction, and the online resize will fail.
The patch below resizes & restarts the transaction as necessary while
setting up the new group, and should work with even the smallest journal.
Tested with something like:
[root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768
[root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384
[root@newbox ~]# mount -o loop fsfile mnt/
[root@newbox ~]# resize2fs /dev/loop0
resize2fs 1.40.2 (12-Jul-2007)
Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks.
resize2fs: No space left on device While trying to add group #2
[root@newbox ~]# dmesg | tail -n 1
JBD: resize2fs wants too many credits (258 > 256)
[root@newbox ~]#
With the below change, it works.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: fix setup_new_group_blocks locking
setup_new_group_blocks() manipulates the group descriptor block bh
under the block_bitmap bh's lock. It shouldn't matter since nobody
but resize should be touching these blocks, but it's worth fixing up.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
and bg_inode_table_lo. This helps in finding BUGs due to
direct partial access of these split 64 bit values
Jose R. Santos [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: FLEX_BG Kernel support v2.
This feature relaxes check restrictions on where each block groups meta
data is located within the storage media. This allows for the allocation
of bitmaps or inode tables outside the block group boundaries in cases
where bad blocks forces us to look for new blocks which the owning block
group can not satisfy. This will also allow for new meta-data allocation
schemes to improve performance and scalability.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andreas Dilger [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
Ext4: Uninitialized Block Groups
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use. This is this the most time consuming part
of the filesystem check. The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.
With this feature, there is a a high water mark of used inodes for each block
group. Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time. A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.
The feature is enabled through a mkfs option
mke2fs /dev/ -O uninit_groups
A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.
The patches have been stress tested with fsstress and fsx. In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem. In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users. With performance improvement of 2-20
times, depending on how full the filesystem is.
The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.
In each group descriptor if we have
EXT4_BG_INODE_UNINIT set in bg_flags:
Inode table is not initialized/used in this group. So we can skip
the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
No block in the group is used. So we can skip the block bitmap
verification for this group.
We also add two new fields to group descriptor as a part of
uninitialized group patch.
If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.
bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: remove #ifdef CONFIG_EXT4_INDEX
CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is
unconditionally defined in ext4_fs.h. tune2fs is already able to turn off
dir indexing, so at this point it's just cluttering up the code. Remove
it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jan Kara [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
jbd2: fix commit code to properly abort journal
We should really call journal_abort() and not __journal_abort_hard() in
case of errors. The latter call does not record the error in the journal
superblock and thus filesystem won't be marked as with errors later (and
user could happily mount it without any warning).
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mingming Cao [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
JBD2: jbd2 slab allocation cleanups
JBD2: Replace slab allocations with page allocations
JBD2 allocate memory for committed_data and frozen_data from slab. However
JBD2 should not pass slab pages down to the block layer. Use page allocator
pages instead. This will also prepare JBD for the large blocksize patchset.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Mingming Cao [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
JBD: JBD slab allocation cleanups
JBD: Replace slab allocations with page allocations
JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Andrew Morton [Wed, 17 Oct 2007 21:28:38 +0000 (14:28 -0700)]
[IA64] fix non-numa build
arch/ia64/kernel/machine_kexec.c: In function `arch_crash_save_vmcoreinfo':
arch/ia64/kernel/machine_kexec.c:131: error: `pgdat_list' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:131: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/machine_kexec.c:131: error: for each function it appears in.)
arch/ia64/kernel/machine_kexec.c:134: error: `node_memblk' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:135: error: `NR_NODE_MEMBLKS' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:136: error: invalid application of `sizeof' to incomplete type `node_memblk_s'
arch/ia64/kernel/machine_kexec.c:137: error: dereferencing pointer to incomplete type
arch/ia64/kernel/machine_kexec.c:138: error: dereferencing pointer to incomplete type
make[1]: *** [arch/ia64/kernel/machine_kexec.o] Error 1
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Wed, 17 Oct 2007 21:12:44 +0000 (14:12 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
net: libertas sdio driver
mmc: at91_mci: cleanup: use MCI_ERRORS
mmc: possible leak in mmc_read_ext_csd
A sysctl method was added to enable and disable debugging levels. After
further review, it was decided that there are better approaches to doing this
and the sysctl methodology isn't really desirable. This patch removes the
sysctl code from 9p.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Andrew Victor [Wed, 17 Oct 2007 09:53:40 +0000 (11:53 +0200)]
mmc: at91_mci: cleanup: use MCI_ERRORS
A small MMC driver cleanup.
Use the defined AT91_MCI_ERRORS in at91_mci_completed_command() instead
of specifying all the error bits individually.
Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Loose mode in 9p utilizes the page cache without respecting coherency with
the server. Any writes previously invaldiated the entire mapping for a file.
This patch softens the behavior to only invalidate the region of the actual
write.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Latchesar Ionkov [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: attach-per-user
The 9P2000 protocol requires the authentication and permission checks to be
done in the file server. For that reason every user that accesses the file
server tree has to authenticate and attach to the server separately.
Multiple users can share the same connection to the server.
Currently v9fs does a single attach and executes all I/O operations as a
single user. This makes using v9fs in multiuser environment unsafe as it
depends on the client doing the permission checking.
This patch improves the 9P2000 support by allowing every user to attach
separately. The patch defines three modes of access (new mount option
'access'):
- attach-per-user (access=user) (default mode for 9P2000.u)
If a user tries to access a file served by v9fs for the first time, v9fs
sends an attach command to the server (Tattach) specifying the user. If
the attach succeeds, the user can access the v9fs tree.
As there is no uname->uid (string->integer) mapping yet, this mode works
only with the 9P2000.u dialect.
- allow only one user to access the tree (access=<uid>)
Only the user with uid can access the v9fs tree. Other users that attempt
to access it will get EPERM error.
- do all operations as a single user (access=any) (default for 9P2000)
V9fs does a single attach and all operations are done as a single user.
If this mode is selected, the v9fs behavior is identical with the current
one.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Latchesar Ionkov [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: rename uid and gid parameters
Change the names of 'uid' and 'gid' parameters to the more appropriate
'dfltuid' and 'dfltgid'. This also sets the default uid/gid to -2
(aka nfsnobody)
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch abstracts out the interfaces to underlying transports so that
new transports can be added as modules. This should also allow kernel
configuration of transports without ifdef-hell.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Sam Ravnborg [Wed, 17 Oct 2007 19:16:33 +0000 (21:16 +0200)]
x86: fix kernel rebuild due to vsyscall fallout
Fix rebuild of kernel when there is no changes.
This happened for i386.
Using make V=2 hinted that the output files were
not assigned to targets - fixed by this patch.
Reported by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Roland McGrath [Wed, 17 Oct 2007 16:04:41 +0000 (18:04 +0200)]
x86: vdso linker script cleanup
I can't see the reason ". = VDSO_PRELINK + 0x900;" was ever there in
the linker script for the x86_64 vDSO. I can't find anything that
depends on this magic offset, or that should care at all about the
particular location of of the .data section (all from vvar.c) in the
vDSO image. If it is really desireable to place .data at 0x900, then it
should be after all the other sections so they fill in the space up to
0x900.
This removes the 0x900 magic and cleans up the output sections generally
in the vDSO linker script. This saves a few hundred bytes in the size
of the vDSO file, bringing it back well under 4kb total so that its vma
only needs one page.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
A threshold interrupt occurs when ECC memory correction is occuring at too
high a frequency. Thresholds are used by the ECC hardware as occasional
ECC failures are part of normal operation, but long sequences of ECC
failures usually indicate a memory chip that is about to fail.
Thermal event interrupts occur when a temperature threshold has been
exceeded for some CPU chip. IIRC, a thermal interrupt is also generated
when the temperature drops back to a normal level.
A spurious interrupt is an interrupt that was raised then lowered by the
device before it could be fully processed by the APIC. Hence the apic sees
the interrupt but does not know what device it came from. For this case
the APIC hardware will assume a vector of 0xff.
Rescheduling, call, and TLB flush interrupts are sent from one CPU to
another per the needs of the OS. Typically, their statistics would be used
to discover if an interrupt flood of the given type has been occuring.
AK: merged v2 and v4 which had some more tweaks
AK: replace Local interrupts with Local timer interrupts
AK: Fixed description of interrupt types.
[ tglx: arch/x86 adaptation ]
[ mingo: small cleanup ]
Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Tim Hockin <thockin@hockin.org> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Such output is gained with some ugly if (!nl) printk("\n"); code and
besides being a waste of lines, this is also annoying to read. The
following output looks better (and it is how it looks on x86_64):
Satyam Sharma [Wed, 17 Oct 2007 16:04:40 +0000 (18:04 +0200)]
x86: call cache_add_dev() from cache_sysfs_init() explicitly
Call cache_add_dev() from cache_sysfs_init() explicitly, instead of
referencing the CPU notifier callback directly from generic startup
code. Looks cleaner (to me at least) this way, and also makes it
possible to use other tricks to replace __cpuinit{data} annotations, as
recently discussed on this list.
Roland McGrath [Wed, 17 Oct 2007 16:04:40 +0000 (18:04 +0200)]
x86: vdso put vars in rodata
This adds a const to the definitions vvar.c makes, so that the vdso_*
variables go into .rodata instead of .data. This is essentially a
cosmetic change, just giving the section headers in the vDSO file more
pleasing flags. These variables are read-only from the perspective of
the vDSO itself and user mode, even though the contents of the DSO image
were adjusted at boot.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andrew Morton [Wed, 17 Oct 2007 16:04:39 +0000 (18:04 +0200)]
x86: asm-i386/io.h fix constness
- Fix this:
include/asm/io.h: In function `memcpy_fromio':
include/asm/io.h:208: warning: passing argument 2 of `__memcpy' discards qualifiers from pointer target type
- Clean up code a bit
Reported-by: Uwe Bugla <uwe.bugla@gmx.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Wed, 17 Oct 2007 16:04:39 +0000 (18:04 +0200)]
x86: fix cpu_to_node references
In x86_64 and i386 architectures most arrays that are sized using
NR_CPUS lay in local memory on node 0. Not only will most (99%?) of the
systems not use all the slots in these arrays, particularly when NR_CPUS
is increased to accommodate future very high cpu count systems, but a
number of cache lines are passed unnecessarily on the system bus when
these arrays are referenced by cpus on other nodes.
Typically, the values in these arrays are referenced by the cpu
accessing it's own values, though when passing IPI interrupts, the cpu
does access the data relevant to the targeted cpu/node. Of course, if
the referencing cpu is not on node 0, then the reference will still
require cross node exchanges of cache lines. A common use of this is
for an interrupt service routine to pass the interrupt to other cpus
local to that node.
Ideally, all the elements in these arrays should be moved to the per_cpu
data area. In some cases (such as x86_cpu_to_apicid) the array is
referenced before the per_cpu data areas are setup. In this case, a
static array is declared in the __initdata area and initialized by the
booting cpu (BSP). The values are then moved to the per_cpu area after
it is initialized and the original static array is freed with the rest
of the __initdata.
This patch:
Fix four instances where cpu_to_node is referenced by array instead of
via the cpu_to_node macro. This is preparation to moving it to the
per_cpu data area.
Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 17 Oct 2007 16:04:39 +0000 (18:04 +0200)]
x86: cleanup 64bit unistd.h
sys_iopl is long gone and there is no reason to declare
sys_rt_sigaction here.
Remove it all together and fix the whitespace mess as well.
It's worth the trouble: 25897 -> 21337 bytes, the win is
larger than the memory of my first computer :)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>