Paul Menage [Fri, 19 Oct 2007 06:39:30 +0000 (23:39 -0700)]
Task Control Groups: basic task cgroup framework
Generic Process Control Groups
--------------------------
There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others. These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.
This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.
The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:
- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.
- the additional kernel footprint of any of the competing resource
management systems is substantially reduced, since it doesn't need
to provide process grouping/containment, hence improving their
chances of getting into the kernel
This patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.
Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Jackson [Fri, 19 Oct 2007 06:39:28 +0000 (23:39 -0700)]
cpuset: zero malloc - revert the old cpuset fix
The cpuset code to present a list of tasks using a cpuset to user space could
write to an array that it had kmalloc'd, after a kmalloc request of zero size.
The problem was that the code didn't check for writes past the allocated end
of the array until -after- the first write.
This is a race condition that is likely rare -- it would only show up if a
cpuset went from being empty to having a task in it, during the brief time
between the allocation and the first write.
Prior to roughly 2.6.22 kernels, this was also a benign problem, because a
zero kmalloc returned a few usable bytes anyway, and no harm was done with the
bogus write.
With the 2.6.22 kernel changes to make issue a warning if code tries to write
to the location returned from a zero size allocation, this problem is no
longer benign. This cpuset code would occassionally trigger that warning.
The fix is trivial -- check before storing into the array, not after, whether
the array is big enough to hold the store.
Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 19 Oct 2007 06:39:28 +0000 (23:39 -0700)]
kernel-api docbook: fix content problems
Fix kernel-api docbook contents problems.
docproc: linux-2.6.23-git13/include/asm-x86/unaligned_32.h: No such file or directory
Warning(linux-2.6.23-git13//include/linux/list.h:482): bad line: of list entry
Warning(linux-2.6.23-git13//mm/filemap.c:864): No description found for parameter 'ra'
Warning(linux-2.6.23-git13//block/ll_rw_blk.c:3760): No description found for parameter 'req'
Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'private'
Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'cdev'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Fri, 19 Oct 2007 06:39:27 +0000 (23:39 -0700)]
reiserfs: ignore on disk s_bmap_nr value
Implement support for file systems larger than 8 TiB.
The reiserfs superblock contains a 16 bit value for counting the number of
bitmap blocks. The rest of the disk format supports file systems up to 2^32
blocks, but the bitmap block limitation artificially limits this to 8 TiB with
a 4KiB block size.
Rather than trust the superblock's 16-bit bitmap block count, we calculate it
dynamically based on the number of blocks in the file system. When an
incorrect value is observed in the superblock, it is zeroed out, ensuring that
older kernels will not be able to mount the file system.
Userspace support has already been implemented and shipped in reiserfsprogs
3.6.20.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Fri, 19 Oct 2007 06:39:26 +0000 (23:39 -0700)]
reiserfs: remove first_zero_hint
The first_zero_hint metadata caching was never actually used, and it's of
dubious optimization quality. This patch removes it.
It doesn't actually shrink the size of the reiserfs_bitmap_info struct, since
that doesn't work with block sizes larger than 8K. There was a big fixme in
there, and with all the work lately in allowing block size > page size, I
might as well kill the fixme as well.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Fri, 19 Oct 2007 06:39:25 +0000 (23:39 -0700)]
reiserfs: fix usage of signed ints for block numbers
Do a quick signedness check for block numbers. There are a number of places
where signed integers are used for block numbers, which limits the usable file
system size to 8 TiB. The disk format, excepting a problem which will be
fixed in the following patch, supports file systems up to 16 TiB in size.
This patch cleans up those sites so that we can enable the full usable size.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Fri, 19 Oct 2007 06:39:25 +0000 (23:39 -0700)]
reiserfs: fix memset byte count during resize
Correct the memset in reiserfs_resize to clear the memory allocated for the
new bitmap info structs. Previously, it would clear the memory used by the
old size. Depending on the contents of memory, this could cause incorrect
caching behavior for bitmap blocks in the newly allocated area.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Fri, 19 Oct 2007 06:39:24 +0000 (23:39 -0700)]
reiserfs: dont use BUG when panicking
Change reiserfs_panic() to use panic() initially instead of BUG(). Using
BUG() ignores the configurable panic behavior, so systems that should be
failing and rebooting are left hanging. This causes problems in
active/standby HA scenarios.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jose R. Santos [Fri, 19 Oct 2007 06:39:23 +0000 (23:39 -0700)]
JBD: Fix JBD warnings when compiling with CONFIG_JBD_DEBUG
Note from Mingming's JBD2 fix:
Noticed all warnings are occurs when the debug level is 0. Then found the
"jbd2: Move jbd2-debug file to debugfs" patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b
changed the jbd2_journal_enable_debug from int type to u8, makes the
jbd_debug comparision is always true when the debugging level is 0. Thus
the compile warning occurs.
Thought about changing the jbd2_journal_enable_debug data type back to int,
but can't, because the jbd2-debug is moved to debug fs, where calling
debugfs_create_u8() to create the debugfs entry needs the value to be u8
type.
Even if we changed the data type back to int, the code is still buggy,
kernel should not print jbd2 debug message if the jbd2_journal_enable_debug
is set to 0. But this is not the case.
The fix is change the level of debugging to 1. The same should fixed in
ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we
probably should fix it all together.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Fri, 19 Oct 2007 06:39:22 +0000 (23:39 -0700)]
jbd: fix commit code to properly abort journal
We should really call journal_abort() and not __journal_abort_hard() in
case of errors. The latter call does not record the error in the journal
superblock and thus filesystem won't be marked as with errors later (and
user could happily mount it without any warning).
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jose R. Santos [Fri, 19 Oct 2007 06:39:22 +0000 (23:39 -0700)]
jbd: config_jbd_debug cannot create /proc entry
The jbd-debug file used to be located in /proc/sys/fs/jbd-debug, but
create_proc_entry() does not do lookups on file names that are more that
one directory deep. This causes the entry creation to fail and hence, no
proc file is created.
Instead of fixing this on procfs might as well move the jbd2-debug file to
debugfs which would be the preferred location for this kind of tunable.
The new location is now /sys/kernel/debug/jbd/jbd-debug.
[akpm@linux-foundation.org: zillions of cleanups] Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
isdn/sc: remove unused REQUEST_IRQ and unnecessary header file
REQUEST_IRQ is never used, so delete it. In the process get rid of the
macro FREE_IRQ which makes the code unnecessarily difficult to read.
Signed-off-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Acked-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Fri, 19 Oct 2007 06:39:17 +0000 (23:39 -0700)]
Console events and accessibility
Some external modules like Speakup need to monitor console output.
This adds a VT notifier that such modules can use to get console output events:
allocation, deallocation, writes, other updates (cursor position, switch, etc.)
[akpm@linux-foundation.org: fix headers_check] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
copy_oldmem_page should not return leaving a page frame from the
previous kernel mapped.
Signed-off-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ps3av: remove unused fields in ps3av_monitor_quirks
Remove the `clear_50' and `clear_vesa' fields of struct
ps3av_monitor_quirk, as they're currently unused. We can always re-add
them when we really need them.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Fri, 19 Oct 2007 06:39:12 +0000 (23:39 -0700)]
Console keyboard events and accessibility
Some blind people use a kernel engine called Speakup which uses hardware
synthesis to speak what gets displayed on the screen. They use the
PC keyboard to control this engine (start/stop, accelerate, ...) and
also need to get keyboard feedback (to make sure to know what they are
typing, the caps lock status, etc.)
Up to now, the way it was done was very ugly. Below is a patch to add a
notifier list for permitting a far better implementation, see ChangeLog
above for details.
You may wonder why this can't be done at the input layer. The problem
is that what people want to monitor is the console keyboard, i.e. all
input keyboards that got attached to the console, and with the currently
active keymap (i.e. keysyms, not only keycodes).
This adds a keyboard notifier that such modules can use to get the keyboard
events and possibly eat them, at several stages:
- keycodes: even before translation into keysym.
- unbound keycodes: when no keysym is bound.
- unicode: when the keycode would get translated into a unicode character.
- keysym: when the keycode would get translated into a keysym.
- post_keysym: after the keysym got interpreted, so as to see the result
(caps lock, etc.)
This also provides access to k_handler so as to permit simulation of
keypresses.
[akpm@linux-foundation.org: various fixes] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 19 Oct 2007 06:39:10 +0000 (23:39 -0700)]
advansys: depends on VIRT_TO_BUS
Fix powerpc allmodconfig build: advansys requires virt_to_bus() but powerpc
doesn't implement it.
Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Matthew Wilcox <willy@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Majority of host drivers using IDE PCI layer set drive->autotune, the only
exceptions are:
generic.c
ns87415.c
rz1000.c
trm290.c
* no ->set_pio_mode method
it821x.c:
* if memory allocation fails drive->autotune won't be set
(but there also won't be ->set_pio_mode method in such case)
piix.c:
* MPIIX controller (no ->init_hwif method so also no ->set_pio_mode method)
However if there is no ->set_pio_mode method there are no changes in behavior
w.r.t. PIO tuning so always set drive->autotune in ide_pci_setup_ports().
Add IDE_HFLAG_LEGACY_IRQS host flag to tell ide_pci_setup_ports() to set
hwif->irq to legacy IRQ 14/15 (iff hwif->irq is not already set) and convert
atiixp, piix, serverworks, sis5513 and slc90e66 host drivers to use it.
While at it:
* In piix.c add IDE_HFLAGS_PIIX define and don't use ->init_hwif for MPIIX.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Add IDE_HFLAG_SERIALIZE host flag to tell ide_pci_setup_ports() to set
hwif/mate->serialized and convert aec62xx, cs5530 and sc1200 host drivers
to use it.
Add IDE_HFLAG_ERROR_STOPS_FIFO host flag and use it instead
of hwif->err_stops_fifo. As a side-effect this change fixes
hwif->err_stops_fifo not being restored by ide_hwif_restore().
* Split off hpt{374,371,366}_init() helper from init_setup_hpt{374,371,366}().
* Merge init_setup_{374,372n,371,372a,302,366}() into hpt366_init_one().
While at it:
* Use "HPT36x" name for HPT366/HPT368 chipsets.
* Add .chip_name to struct hpt_info and use it to set set d->name.
* Convert .max_ultra in struct hpt_info to .udma_mask and use it to set
d->udma_mask.
* Fix hpt302 to use HPT302_ALLOW_ATA133_6 define.
* Change HPT366/HPT374 interrupt fixup message from KERN_WARNING to KERN_INFO.
* Use the second hpt366_chipsets[] entry for HPT37x chipsets using HPT36x PCI
device ID and fix .enablebits/.host_flags for HPT36x hpt366_chipsets[] entry.
* Bump driver version.
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
ide: add ->mwdma_mask and ->swdma_mask to ide_pci_device_t (take 2)
* Add ->mwdma_mask and ->swdma_mask to ide_pci_device_t.
* Set ide_hwif_t DMA masks using DMA masks from ide_pci_device_t in
setup-pci.c::ide_pci_setup_ports() (iff DMA base is valid and ->init_hwif
method may still override them).
* Convert IDE PCI host drivers to use ide_pci_device_t DMA masks.
While at it:
* Use ATA_{UDMA,MWDMA,SWDMA}* defines.
* hpt34x.c: add separate ide_pci_device_t instances for HPT343 and HPT345.
* serverworks.c: fix DMA masks being set before checking DMA base.
v2:
* Add missing masks to DECLARE_GENERIC_PCI_DEV() macro.
ide: add IDE_HFLAG_NO_LBA48 and IDE_HFLAG_NO_LBA48_DMA host flags
Add IDE_HFLAG_NO_LBA48[_DMA] host flags, use it instead of hwif->no_lba48[_dma]
and then remove no longer needed hwif->no_lba48[_dma]. As a side-effect
this change fixes hwif->no_lba48_dma not being restored by ide_hwif_restore().
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
ide: remove ->init_setup_dma from ide_pci_device_t (take 2)
* Make ide_pci_device_t.host_flags u32 and add IDE_HFLAG_CS5520 host flag.
* Pass ide_pci_device_t *d to setup-pci.c::ide_get_or_set_dma_base()
and use d->name instead of hwif->cds->name.
* Set IDE_HFLAG_CS5520 host flag in cs5520 host driver and use it in
ide_get_or_set_dma_base() to find out which PCI BAR to use, remove no longer
needed cs5520.c::cs5520_init_setup_dma() and ide_pci_device_t.init_setup_dma.
This fixes PCI bus-mastering not being checked for CS5510/CS5520 hosts.
v2:
* It is wrong to check simplex bits on CS5510/CS5520 as v1 did.
(Noticed by Alan).
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Add IDE_HFLAG_NO_{DMA,AUTODMA} host flags. Convert all host drivers using
ide_pci_device_t to use these flags instead of d->autodma and then remove no
longer needed d->autodma.
Add IDE_HFLAG_BOOTABLE host flag and IDE_HFLAG_OFF_BOARD define. Convert
all host drivers using ide_pci_device_t to use IDE_HFLAG_{BOOTABLE,OFF_BOARD}
instead of d->bootable and then remove no longer needed d->bootable.
* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
hrtimer: hook compat_sys_nanosleep up to high res timer code
hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
Linus Torvalds [Thu, 18 Oct 2007 22:08:35 +0000 (15:08 -0700)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] kill ata_sg_is_last()
Update libata driver for bf548 atapi controller against the 2.6.24 tree.
libata-sff: Correct use of check_status()
drivers/ata: add support to Freescale 3.0Gbps SATA Controller
pata_acpi: fix build breakage if !CONFIG_PM
Linus Torvalds [Thu, 18 Oct 2007 21:51:02 +0000 (14:51 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] time: Move R4000 clockevent device code to separate configurable file
[MIPS] time: Delete dead cycles_per_jiffy, mips_timer_ack and null_timer_ack
[MIPS] IP32: Retire use of plat_timer_setup.
[MIPS] Jazz: Retire use of plat_timer_setup.
[MIPS] IP27: Convert to clock_event_device.
[MIPS] JMR3927: Convert to clock_event_device.
[MIPS] Always do the ARC64_TWIDDLE_PC thing.
Linus Torvalds [Thu, 18 Oct 2007 21:40:30 +0000 (14:40 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits)
[IPV6]: Fix again the fl6_sock_lookup() fixed locking
[NETFILTER]: nf_conntrack_tcp: fix connection reopening fix
[IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels
[IPV6]: Lost locking in fl6_sock_lookup
[IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list
[NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required
[NET]: Fix OOPS due to missing check in dev_parse_header().
[TCP]: Remove lost_retrans zero seqno special cases
[NET]: fix carrier-on bug?
[NET]: Fix uninitialised variable in ip_frag_reasm()
[IPSEC]: Rename mode to outer_mode and add inner_mode
[IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP
[IPSEC]: Use the top IPv4 route's peer instead of the bottom
[IPSEC]: Store afinfo pointer in xfrm_mode
[IPSEC]: Add missing BEET checks
[IPSEC]: Move type and mode map into xfrm_state.c
[IPSEC]: Fix length check in xfrm_parse_spi
[IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi
[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
[IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
...
Linus Torvalds [Thu, 18 Oct 2007 21:39:44 +0000 (14:39 -0700)]
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC/64]: Consolidate of_register_driver
[SPARC] Videopix Frame Grabber: Convert device_lock_sem to mutex
[SPARC]: Support for new termios.
[SPARC64]: Check of_get_property() return in pci_determine_mem_io_space().
[SPARC64]: Fix boot failures due to bootmem.
[SPARC64]: Implement atomic backoff.
Shannon Nelson [Thu, 18 Oct 2007 10:07:15 +0000 (03:07 -0700)]
I/OAT: Add completion callback for async_tx interface use
The async_tx interface includes a completion callback. This adds support
for using that callback, including using interrupts on completion.
[akpm@linux-foundation.org: various fixes] Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:14 +0000 (03:07 -0700)]
I/OAT: Tighten descriptor setup performance
The change to the async_tx interface cost this driver some performance by
spreading the descriptor setup across several functions, including multiple
passes over the new descriptor chain. Here we bring the work back into one
primary function and only do one pass.
[akpm@linux-foundation.org: cleanups, uninline] Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:13 +0000 (03:07 -0700)]
I/OAT: clean up error handling and some print messages
Make better use of dev_err(), and catch an error where the transaction
creation might fail.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:13 +0000 (03:07 -0700)]
I/OAT: clean up of dca provider start and stop
Don't start ioat_dca if ioat_dma didn't start, and then stop ioat_dca
before stopping ioat_dma. Since the ioat_dma side does the pci device
work, This takes care of ioat_dca trying to use a bad device reference.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shannon Nelson [Thu, 18 Oct 2007 10:07:12 +0000 (03:07 -0700)]
I/OAT: cleanup pci issues
Reorder the pci release actions
Letting go of the resources in the right order helps get rid of
occasional kernel complaints.
Fix the pci_driver object name [Randy Dunlap]
Rename the struct pci_driver data so that false section mismatch
warnings won't be produced.
Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Corey Minyard [Thu, 18 Oct 2007 10:07:11 +0000 (03:07 -0700)]
IPMI: fix hotmod remove lock
The removal of proc entries was done holding a lock, which is no longer
allowed. There is no need for the lock, only a mutex is required, so switch
over to a mutex.
Corey Minyard [Thu, 18 Oct 2007 10:07:10 +0000 (03:07 -0700)]
IPMI: new NMI handling
Convert over to the new NMI handling for getting IPMI watchdog timeouts via an
NMI. This add config options to know if there is the ability to receive NMIs
and if it has an NMI post processing call. Then it modifies the IPMI watchdog
to take advantage of this so that it can know if an NMI comes in.
It also adds testing that the IPMI NMI watchdog works.
Corey Minyard [Thu, 18 Oct 2007 10:07:08 +0000 (03:07 -0700)]
IPMI: remove bogus semaphore from watchdog
Lockdep was giving an error when loading the IPMI watchdog module. It turns
out that if you try to claim a lock in a parameter handling routine, lockdep
won't see that lock as "static" yet because the module is not yet on the
module list, so it will complain.
However, the semaphore in question is completely unnecessary. So just remove
it.
Corey Minyard [Thu, 18 Oct 2007 10:07:08 +0000 (03:07 -0700)]
IPMI: don't init irq until ready
Patrick found a race at startup. Interrupts were being enabled for the IPMI
interface before the driver was really ready to handle them. This could
result in an oops if something was pending on the interface at startup and
interrupt were already enabled (technically shouldn't happen, but need to
cover for this in real life). So move the IRQ setup to the code that starts
the actual IPMI processing.
Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Patrick Schoeller <Patrick.Schoeller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:07:07 +0000 (03:07 -0700)]
Replace __attribute_pure__ with __pure
To be consistent with the use of attributes in the rest of the kernel
replace all use of __attribute_pure__ with __pure and delete the definition
of __attribute_pure__.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 18 Oct 2007 10:07:05 +0000 (03:07 -0700)]
fuse: add blksize field to fuse_attr
There are cases when the filesystem will be passed the buffer from a single
read or write call, namely:
1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go
through the page cache, but go directly to the userspace fs
2) currently buffered writes are done with single page requests, but
if Nick's ->perform_write() patch goes it, it will be possible to
do larger write requests. But only if the original write() was
also bigger than a page.
In these cases the filesystem might want to give a hint to the app
about the optimal I/O size.
Allow the userspace filesystem to supply a blksize value to be returned by
stat() and friends. If the field is zero, it defaults to the old
PAGE_CACHE_SIZE value.
Miklos Szeredi [Thu, 18 Oct 2007 10:07:03 +0000 (03:07 -0700)]
fuse: add list of writable files to fuse_inode
Each WRITE request must carry a valid file descriptor. When a page is written
back from a memory mapping, the file through which the page was dirtied is not
available, so a new mechananism is needed to find a suitable file in
->writepage(s).
A list of fuse_files is added to fuse_inode. The file is removed from the
list in fuse_release().
This patch is in preparation for writable mmap support.
Miklos Szeredi [Thu, 18 Oct 2007 10:07:02 +0000 (03:07 -0700)]
fuse: support BSD locking semantics
It is trivial to add support for flock(2) semantics to the existing protocol,
by setting the lock owner field to the file pointer, and passing a new
FUSE_LK_FLOCK flag with the locking request.
Miklos Szeredi [Thu, 18 Oct 2007 10:07:00 +0000 (03:07 -0700)]
fuse: clean up open file passing in setattr
Clean up supplying open file to the setattr operation. In addition to being a
cleanup it prepares for the changes in the way the open file is passed to the
setattr method.