[XFS] Initial pass at going directly-to-bio on the buffered IO path. This
allows us to submit much larger I/Os instead of sending down lots of small
buffer_heads. To do this we need to have a rather complicated I/O
submission and completion tracking infrastructure. Part of the latter has
been merged already a long time ago for direct I/O support. Part of the
problem is that we need to track sub-pagesize regions and for that we
still need buffer_heads for the time beeing. Long-term I hope we can move
to better data strucutures and/or maybe move this to fs/mpage.c instead of
having it in XFS. Original patch from Nathan Scott with various updates
from David Chinner and Christoph Hellwig.
Yingping Lu [Wed, 11 Jan 2006 04:38:31 +0000 (15:38 +1100)]
[XFS] Fixed delayed_blks assert failure during umount. The delayed_blks
was caused by ENOSPC but not Rreclaimed by xfs_release or xfs_inactive.
The fix changed the condition in xfs_release and xfs_inactive to invoke
xfs_inactive_free_eofblocks for this special case, changed
xfs_inactive_free_eofblocks to clean the delayed blks after eof. It also
changed xfs_write to set correct eof when ENOSPC occurs.
David Chinner [Wed, 11 Jan 2006 04:37:58 +0000 (15:37 +1100)]
[XFS] Introduce per-filesystem delwri pagebuf flushing to reduce
contention between filesystems and prevent deadlocks between filesystems
when a flush dependency exists between them.
Yingping Lu [Wed, 11 Jan 2006 04:29:39 +0000 (15:29 +1100)]
[XFS] Fixed an assertion failure in xfs_reclaim caused by delayed block.
The assertion failure came from XFS QA41. The fix is done by enabling
truncate for delayed block in xfs_inactive.
Nathan Scott [Wed, 11 Jan 2006 04:28:28 +0000 (15:28 +1100)]
[XFS] Implement the di_extsize allocator hint for non-realtime files as
well. Also provides a mechanism for inheriting this property from the
parent directory for new files.
Ingo Molnar [Tue, 10 Jan 2006 21:07:44 +0000 (22:07 +0100)]
[PATCH] fix i386 mutex fastpath on FRAME_POINTER && !DEBUG_MUTEXES
Call the mutex slowpath more conservatively - e.g. FRAME_POINTERS can
change the calling convention, in which case a direct branch to the
slowpath becomes illegal. Bug found by Hugh Dickins.
Adrian Bunk [Tue, 10 Jan 2006 21:10:02 +0000 (13:10 -0800)]
[IRDA]: kill drivers/net/irda/sir_core.c
EXPORT_SYMBOL's do nowadays belong to the files where the actual
functions are.
Moving the module_init/module_exit to the file with the actual functions
has the advantage of saving a few bytes due to the removal of two
functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Small cleanups for drivers/atm/zatm.c
Get rid of unneeded cast of kmalloc() return value.
Small whitespace/CodingStyle/formatting cleanup (since I was in there anyway).
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 10 Jan 2006 01:48:09 +0000 (17:48 -0800)]
[NETFILTER]: Fix timeout sysctls on big-endian 64bit architectures
The connection tracking timeout variables are unsigned long, but
proc_dointvec_jiffies is used with sizeof(unsigned int) in the sysctl
tables. Since there is no proc_doulongvec_jiffies function, change the
timeout variables to unsigned int.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 10 Jan 2006 00:44:00 +0000 (16:44 -0800)]
[NETFILTER]: Fix another crash in ip_nat_pptp
The PPTP NAT helper calculates the offset at which the packet needs
to be mangled as difference between two pointers to the header. With
non-linear skbs however the pointers may point to two seperate buffers
on the stack and the calculation results in a wrong offset beeing
used.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Tue, 10 Jan 2006 00:43:43 +0000 (16:43 -0800)]
[NETFILTER]: Fix crash in ip_nat_pptp
When an inbound PPTP_IN_CALL_REQUEST packet is received the
PPTP NAT helper uses a NULL pointer in pointer arithmentic to
calculate the offset in the packet which needs to be mangled
and corrupts random memory or crashes.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The congestion ops and af_ops in the inet_connection_sock
can be const.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Ungerer [Tue, 10 Jan 2006 07:00:39 +0000 (17:00 +1000)]
[PATCH] m68knommu: fix find_next_zero_bit in bitops.h
We're starting a number of big applications (memory footprint app.
1MByte) on our Arcturus uC5272. Therefore memory fragmentation is a
real pain for us. We've switched to uClinux-2.4.27-uc1 and found that
page_alloc2 fragments the memory heavily.
Digging into it we found a bug in the find_next_zero_bit function in the
m68knommu/bitops.h file. if the size isn't a multiple of 32 than the
upper bits of the last word to be searched should be masked. But the
functions masks the lower bits of the last word because it uses a right
shift instead of a left shift operator.
Patch submitted by Sascha Smejkal <s.smejkal@centersystems.at>
Greg Ungerer [Tue, 10 Jan 2006 06:59:37 +0000 (16:59 +1000)]
[PATCH] uclinux: delay binfmt_flat trace
Modify the initial trace output (which is based on flags in the binary
header) so that it is not done until after the magic number check. This
may well not be a flat format binary, so the flags could be invalid.
(Prime example, running a script).
Greg Ungerer [Tue, 10 Jan 2006 06:59:04 +0000 (16:59 +1000)]
[PATCH] m68knommu: set irq priority/level different for each ColdFire serial port
Set the hardware interrupt priority to a different value for each
attached ColdFire serial port. According to the CPU documentation you
should not use the same combination of level/priority on more than one
device. People have reported odd serial port behavior with them set the
same.
Greg Ungerer [Tue, 10 Jan 2006 06:42:59 +0000 (16:42 +1000)]
[PATCH] m68knommu: fix a5 reg corruption in signal handlers
This is a patch adapted from a posting by Andrea Tarani which was
pointed out to me by Bernardo Innocenti. Thanks to both of them for
their help and patience.
The original posting is here:
http://mailman.uclinux.org/pipermail/uclinux-dev/2005-July/033543.html
The problem first manifest itself as busybox ping terminating with an
"Illegal instruction". I reduced this to a test case and found that
variable size arrays allocated on the stack could lead to stacks not
aligned on 32 bit boundaries. For the Coldfire this proved fatal.
Having been pointed out this patch by Bernardo, I applied it and it
fixed the first test case. I then went back to busybox's ping. This
still failed with "Illegal instruction", but in a different way. Before
it depended on the size allocated for the ping buffer, now it happened
every time. I also found it depended on optimisation level (gcc-3.4.0)
-Os was okay but not -O2.
After a lot of looking, it turned out that register a5 was being
corrupted by the signal handler (after applying the patch). I re-worked
the patch a bit to save/restore a5 and now all seems well.
Patch submitted by Stuart Hughs <stuarth@freescale.com>
Linus Torvalds [Tue, 10 Jan 2006 16:56:39 +0000 (08:56 -0800)]
Fix rpc shutdown event condition bug
We want to wait for the cl_users to go down to zero, not for it to stay
positive. Quoth Trond (who wasn't even the author, but acked the wrong
version): "Argh! I need to increase my daily caffeine dosages."
Oleg Nesterov [Tue, 10 Jan 2006 13:48:02 +0000 (16:48 +0300)]
[PATCH] rcu: join rcu_ctrlblk and rcu_state
This patch moves rcu_state into the rcu_ctrlblk. I think there
are no reasons why we should have 2 different variables to control
rcu state. Every user of rcu_state has also "rcu_ctrlblk *rcp" in
the parameter list.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens Axboe [Tue, 10 Jan 2006 09:48:02 +0000 (10:48 +0100)]
[PATCH] dm: don't enable bouncing by default
DM doesn't need to bounce bio's on its own, but the block layer defaults
to that in blk_queue_make_request(). The lower level drivers should
bounce ios themselves, that is what they need to do if not layered below
dm anyways.
Anton Blanchard [Tue, 10 Jan 2006 07:21:20 +0000 (18:21 +1100)]
[PATCH] Work around ppc64 compiler bug
In the process of optimising our per cpu data code, I found a ppc64
compiler bug that has been around forever. Basically the current
RELOC_HIDE can end up trashing r30. Details of the bug can be found at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25572
This bug is present in all compilers before 4.1. It is masked by the
fact that our current per cpu data code is inefficient and causes
other loads that end up marking r30 as used.
A workaround identified by Alan Modra is to use the =r asm constraint
instead of =g.
Signed-off-by: Anton Blanchard <anton@samba.org>
[ Verified that this makes no real difference on x86[-64] */ Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jiri Slaby [Tue, 10 Jan 2006 04:54:24 +0000 (20:54 -0800)]
[PATCH] char/isicom: Pci probing added
Pci probing functions added, most of functions rewrited because of it (some
for loops were redundant). Used PCI_DEVICE macro. dev_* used for printing
wherever possible. Renamed some functions to have isicom_ in the name.
Jiri Slaby [Tue, 10 Jan 2006 04:54:23 +0000 (20:54 -0800)]
[PATCH] char/isicom: Other little changes
Move some code from one place to another. Get rid of ugly ifdefs in code in
next p[patches, so here create functions and macros to enable it. Rename some
functions and align some code to 80 chars.
Alan Cox [Tue, 10 Jan 2006 04:54:13 +0000 (20:54 -0800)]
[PATCH] TTY layer buffering revamp
The API and code have been through various bits of initial review by
serial driver people but they definitely need to live somewhere for a
while so the unconverted drivers can get knocked into shape, existing
drivers that have been updated can be better tuned and bugs whacked out.
This replaces the tty flip buffers with kmalloc objects in rings. In the
normal situation for an IRQ driven serial port at typical speeds the
behaviour is pretty much the same, two buffers end up allocated and the
kernel cycles between them as before.
When there are delays or at high speed we now behave far better as the
buffer pool can grow a bit rather than lose characters. This also means
that we can operate at higher speeds reliably.
For drivers that receive characters in blocks (DMA based, USB and
especially virtualisation) the layer allows a lot of driver specific
code that works around the tty layer with private secondary queues to be
removed. The IBM folks need this sort of layer, the smart serial port
people do, the virtualisers do (because a virtualised tty typically
operates at infinite speed rather than emulating 9600 baud).
Finally many drivers had invalid and unsafe attempts to avoid buffer
overflows by directly invoking tty methods extracted out of the innards
of work queue structs. These are no longer needed and all go away. That
fixes various random hangs with serial ports on overflow.
The other change in here is to optimise the receive_room path that is
used by some callers. It turns out that only one ldisc uses receive room
except asa constant and it updates it far far less than the value is
read. We thus make it a variable not a function call.
I expect the code to contain bugs due to the size alone but I'll be
watching and squashing them and feeding out new patches as it goes.
Because the buffers now dynamically expand you should only run out of
buffering when the kernel runs out of memory for real. That means a lot of
the horrible hacks high performance drivers used to do just aren't needed any
more.
Description:
tty_insert_flip_char is an old API and continues to work as before, as does
tty_flip_buffer_push() [this is why many drivers dont need modification]. It
does now also return the number of chars inserted
There are also
tty_buffer_request_room(tty, len)
which asks for a buffer block of the length requested and returns the space
found. This improves efficiency with hardware that knows how much to
transfer.
and tty_insert_flip_string_flags(tty, str, flags, len)
to insert a string of characters and flags
For a smart interface the usual code is
len = tty_request_buffer_room(tty, amount_hardware_says);
tty_insert_flip_string(tty, buffer_from_card, len);
More description!
At the moment tty buffers are attached directly to the tty. This is causing a
lot of the problems related to tty layer locking, also problems at high speed
and also with bursty data (such as occurs in virtualised environments)
I'm working on ripping out the flip buffers and replacing them with a pool of
dynamically allocated buffers. This allows both for old style "byte I/O"
devices and also helps virtualisation and smart devices where large blocks of
data suddenely materialise and need storing.
So far so good. Lots of drivers reference tty->flip.*. Several of them also
call directly and unsafely into function pointers it provides. This will all
break. Most drivers can use tty_insert_flip_char which can be kept as an API
but others need more.
At the moment I've added the following interfaces, if people think more will
be needed now is a good time to say
int tty_buffer_request_room(tty, size)
Try and ensure at least size bytes are available, returns actual room (may be
zero). At the moment it just uses the flipbuf space but that will change.
Repeated calls without characters being added are not cumulative. (ie if you
call it with 1, 1, 1, and then 4 you'll have four characters of space. The
other functions will also try and grow buffers in future but this will be a
more efficient way when you know block sizes.
int tty_insert_flip_char(tty, ch, flag)
As before insert a character if there is room. Now returns 1 for success, 0
for failure.
int tty_insert_flip_string(tty, str, len)
Insert a block of non error characters. Returns the number inserted.
int tty_prepare_flip_string(tty, strptr, len)
Adjust the buffer to allow len characters to be added. Returns a buffer
pointer in strptr and the length available. This allows for hardware that
needs to use functions like insl or mencpy_fromio.
Signed-off-by: Alan Cox <alan@redhat.com> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Tue, 10 Jan 2006 04:54:08 +0000 (20:54 -0800)]
[PATCH] Serial: disable jsm in ppc64 defconfig
Changes to the serial driver to remove flip buffers have broken the serial
jsm driver. It doesn't even compile anymore. The jsm driver was enabled
in only one defconfig - ppc64. In order to keep defconfigs building,
disable CONFIG_SERIAL_JSM for the time being.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Tue, 10 Jan 2006 04:54:06 +0000 (20:54 -0800)]
[PATCH] fs/ext3/: small cleanups
This patch contains the following cleanups:
- there's no need for ext3_count_free() #ifndef EXT3FS_DEBUG
- having prototypes for ext3_count_free() in two different headers is
nonsense
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>