Encapsulate the buddy allocator used for MTT segments. This cleans up the
code and also gets us ready to add FMR support.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland Dreier [Sat, 16 Apr 2005 22:26:24 +0000 (15:26 -0700)]
[PATCH] IB/mthca: fix MTT allocation in mem-free mode
Fix bug in MTT allocation in mem-free mode.
I misunderstood the MTT size value returned by the firmware -- it is really
the size of a single MTT entry, since mem-free mode does not segment the MTT
as the original firmware did. This meant that our MTT addresses ended up
being off by a factor of 8. This meant that our MTT allocations might
overlap, and so we could overwrite and corrupt earlier memory regions when
writing new MTT entries.
We fix this by always using our 64-byte MTT segment size. This allows some
simplification of the code as well, since there's no reason to put the MTT
segment size in a variable -- we can always use our enum value directly.
Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix error handling in MR allocation for mem-free mode: mthca_free must get an
MR index, not a key.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland Dreier [Sat, 16 Apr 2005 22:26:18 +0000 (15:26 -0700)]
[PATCH] IB/mthca: allocate correct number of doorbell pages
Doorbell record pages are allocated in HCA page size chunks (always 4096
bytes), so we need to divide by 4096 and not PAGE_SIZE when figuring out how
many pages we'll need space for.
Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland Dreier [Sat, 16 Apr 2005 22:26:17 +0000 (15:26 -0700)]
[PATCH] IB/mthca: clean up mthca_dereg_mr()
It's cleaner to kfree mthca_mr, and not rely on the fact that ib_mr is the
first field in mthca_mr.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The first buffer of a memory region is not required to be page-aligned, so
don't return an error if it's not.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland Dreier [Sat, 16 Apr 2005 22:26:16 +0000 (15:26 -0700)]
[PATCH] IB/mthca: fix posting sends with immediate data
When posting a work request with immediate data, put the immediate data in the
immediate data field of the hardware's work request (rather than overwriting
the flags field).
Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland Dreier [Sat, 16 Apr 2005 22:26:13 +0000 (15:26 -0700)]
[PATCH] IB/mthca: map MPT/MTT context in mem-free mode
In mem-free mode, when allocating memory regions, make sure that the HCA has
context memory mapped to cover the virtual space used for the MPT and MTTs
being used.
Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland Dreier [Sat, 16 Apr 2005 22:26:11 +0000 (15:26 -0700)]
[PATCH] IB: Fix user MAD registrations with class 0
Fix handling of MAD agent registrations with mgmt_class == 0. In this case
ib_umad should pass a NULL registration request to the MAD core rather than a
request with mgmt_class set to 0.
Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sean Hefty [Sat, 16 Apr 2005 22:26:08 +0000 (15:26 -0700)]
[PATCH] IB: Keep MAD work completion valid
Replace the *wc field in ib_mad_recv_wc from pointing to a structure on the
stack to one allocated with the received MAD buffer. This allows a client to
access the *wc field after their receive completion handler has returned.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland Dreier [Sat, 16 Apr 2005 22:26:06 +0000 (15:26 -0700)]
[PATCH] IPoIB: fix static rate calculation
Correct and simplify calculation of static rate. We need to round up the
quotient of (local_rate - path_rate) / path_rate. To round up we add
(path_rate - 1) to the numerator, so the quotient simplifies to (local_rate -
1) / path_rate.
No idea how I came up with the old formula.
Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
gcc-4 warns with
include/linux/cpuset.h:21: warning: type qualifiers ignored on function
return type
cpuset_cpus_allowed is declared with const
extern const cpumask_t cpuset_cpus_allowed(const struct task_struct *p);
First const should be __attribute__((const)), but the gcc manual
explains that:
"Note that a function that has pointer arguments and examines the data
pointed to must not be declared const. Likewise, a function that calls a
non-const function usually must not be const. It does not make sense for
a const function to return void."
The following patch remove const from the function declaration.
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] pci enumeration on ixp2000: overflow in kernel/resource.c
IXP2000 (ARM-based) platforms use a separate 'struct resource' for PCI MEM
space. Resource allocation for PCI BARs always fails because the 'root'
resource (the IXP2000 PCI MEM resource) always has the entire address space
(00000000-ffffffff) free, and find_resource() calculates the size of that
range as ffffffff-00000000+1=0, so all allocations fail because it thinks
there is no space.
(akpm: pls. double-check)
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Bottomley [Sat, 16 Apr 2005 22:25:54 +0000 (15:25 -0700)]
[PATCH] add Big Endian variants of ioread/iowrite
In the new io infrastructure, all of our operators are expecting the
underlying device to be little endian (because the PCI bus, their main
consumer, is LE).
However, there are a fair few devices and busses in the world that are
actually Big Endian. There's even evidence that some of these BE bus and
chip types are attached to LE systems. Thus, there's a need for a BE
equivalent of our io{read,write}{16,32} operations.
The attached patch adds this as io{read,write}{16,32}be. When it's in,
I'll add the first consume (the 53c700 SCSI chip driver).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Daniel McNeil [Sat, 16 Apr 2005 22:25:50 +0000 (15:25 -0700)]
[PATCH] Direct IO async short read fix
The direct I/O code is mapping the read request to the file system block. If
the file size was not on a block boundary, the result would show the the read
reading past EOF. This was only happening for the AIO case. The non-AIO case
truncates the result to match file size (in direct_io_worker). This patch
does the same thing for the AIO case, it truncates the result to match the
file size if the read reads past EOF.
When I/O completes the result can be truncated to match the file size
without using i_size_read(), thus the aio result now matches the number of
bytes read to the end of file.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These have been deprecated since ->compat_ioctl when in, thus only a short
deprecation period. There's four users left: i2o_config, s390/z90crypy,
s390/dasd and s390/zfcp and for the first two patches are about to be
submitted to get rid of it.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] quota: possible bug in quota format v2 support
Don't put root block of quota tree to the free list (when quota file is
completely empty). That should not actually happen anyway (somebody should
get accounted for the filesystem root and so quota file should never be
empty) but better prevent it here than solve magical quota file
corruption.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bernard Blackham [Sat, 16 Apr 2005 22:25:45 +0000 (15:25 -0700)]
[PATCH] ext2 corruption - regression between 2.6.9 and 2.6.10
Whilst trying to stress test a Promise SX8 card, we stumbled across
some nasty filesystem corruption in ext2. Our tests involved
creating an ext2 partition, mounting, running several concurrent
fsx's over it, umounting, and fsck'ing, all scripted[1]. The fsck
would always return with errors.
This regression was traced back to a change between 2.6.9 and
2.6.10, which moves the functionality of ext2_put_inode into
ext2_clear_inode. The attached patch reverses this change, and
eliminated the source of corruption.
Mingming Cao <cmm@us.ibm.com> said:
I think his patch for ext2 is correct. The corruption on ext3 is not the same
issue he saw on ext2. I believe that's the race between discard reservation
and reservation in-use that we already fixed it in 2.6.12- rc1.
For the problem related to ext2, at the time when we design reservation for
ext3, we decide we only need to discard the reservation at the last file
close, so we have ext3_discard_reservation on iput_final- >ext3_clear_inode.
The ext2 handle discard preallocation differently at that time, it discard the
preallocation at each iput(), not in input_final(), so we think it's
unnecessary to thrash it so frequently, and the right thing to do, as we did
for ext3 reservation, discard preallocation on last iput(). So we moved the
ext2_discard_preallocation from ext2_put_inode(0 to ext2_clear_inode.
Since ext2 preallocation is doing pre-allocation on disk, so it is possible
that at the unmount time, someone is still hold the reference of the inode, so
the preallocation for a file is not discard yet, so we still mark those blocks
allocated on disk, while they are not actually in the inode's block map, so
fsck will catch/fix that error later.
This is not a issue for ext3, as ext3 reservation(pre-allocation) is done in
memory.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- plug various leaks and use after frees in the remove and
initialization failure path (some still left)
- remove useless global list of boards and use pci_set_drvdata instead
- unobsfucate init path by merging functions together
- kill various totally useless state variables
- .. probably more I forgot
Note that the tty part still generates lots of sparse warnings and there's
still a totally useless layer of function pointer indirections, but maybe
someone else will fix that bit up.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ken Chen [Sat, 16 Apr 2005 22:25:43 +0000 (15:25 -0700)]
[PATCH] use cheaper elv_queue_empty when unplug a device
In function __generic_unplug_device(), kernel can use a cheaper function
elv_queue_empty() instead of more expensive elv_next_request to find
whether the queue is empty or not. blk_run_queue can also made conditional
on whether queue's emptiness before calling request_fn().
Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fix 3 calls to module_param_string() in
driver/media/video/tuner-core.c and drivers/media/video/tda9887.c. In all
three places, the len and the perm parameter was switched.
[PATCH] kernel/param.c: don't use .max when .num is NULL in param_array_set()
there seems to be a bug, at least for me, in kernel/param.c for arrays with
.num == NULL. If .num == NULL, the function param_array_set() uses &.max
for the call to param_array(), wich alters the .max value to the number of
arguments. The result is, you can't set more array arguments as the last
time you set the parameter.
example:
# a module 'example' with
# static int array[10] = { 0, };
# module_param_array(array, int, NULL, 0644);
$ insmod example.ko array=1,2,3
$ cat /sys/module/example/parameters/array
1,2,3
$ echo "4,3,2,1" > /sys/module/example/parameters/array
$ dmesg | tail -n 1
kernel: array: can take only 3 arguments
A question on sigwaitinfo based IO mechanism in multithreaded applications.
I am trying to use RT signals to notify me of IO events using RT signals
instead of SIGIO in a multithreaded applications. I noticed that there was
some discussion on lkml during november 1999 with the subject of the
discussion as "Signal driven IO". In the thread I noticed that RT signals
were being delivered to the worker thread. I am running 2.6.10 kernel and
I am trying to use the very same mechanism and I find that only SIGIO being
propogated to the worker threads and RT signals only being propogated to
the main thread and not the worker threads where I actually want them to be
propogated too. On further inspection I found that the following patch
which I have attached solves the problem.
I am not sure if this is a bug or feature in the kernel.
Roland McGrath <roland@redhat.com> said:
This relates only to fcntl F_SETSIG, which is a Linux extension. So there is
no POSIX issue. When changing various things like the normal SIGIO signalling
to do group signals, I was concerned strictly with the POSIX semantics and
generally avoided touching things in the domain of Linux inventions. That's
why I didn't change this when I changed the call right next to it. There is
no reason I can see that F_SETSIG-requested signals shouldn't use a group
signal like normal SIGIO does. I'm happy to ACK this patch, there is nothing
wrong with its change to the semantics in my book. But neither POSIX nor I
care a whit what F_SETSIG does.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a possibility that a bio will be accessed after it has been freed
on SCSI. It happens if you submit a bio with BIO_SYNC marked and the
auto-unplugging kicks the request_fn, SCSI re-enables interrupts in-between
so if the request completes between the add_request() in __make_request()
and the bio_sync() call, we could be looking at a dead bio. It's a slim
race, but it has been triggered in the Real World.
Pavel Machek [Sat, 16 Apr 2005 22:25:34 +0000 (15:25 -0700)]
[PATCH] u32 vs. pm_message_t in ppc and radeon
This fixes pm_message_t vs. u32 confusion in ppc and aty (I *hope* that's
basically radeon code...). I was not able to test most of these, but I'm
not really changing anything, so it should be okay.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pavel Machek [Sat, 16 Apr 2005 22:25:32 +0000 (15:25 -0700)]
[PATCH] fix u32 vs. pm_message_t in drivers/macintosh
I thought I'm done with fixing u32 vs. pm_message_t ... unfortunately that
turned out not to be the case as Russel King pointed out. Here are fixes for
drivers/macintosh.
Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pavel Machek [Sat, 16 Apr 2005 22:25:30 +0000 (15:25 -0700)]
[PATCH] fix pm_message_t vs. u32 in alsa
I thought I'm done with fixing u32 vs. pm_message_t ... unfortunately that
turned out not to be the case as Russel King pointed out. This fixes last few
bits in alsa.
Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pavel Machek [Sat, 16 Apr 2005 22:25:24 +0000 (15:25 -0700)]
[PATCH] pm_message_t: more fixes in common and i386
I thought I'm done with fixing u32 vs. pm_message_t ... unfortunately
that turned out not to be the case as Russel King pointed out. Here are
fixes for Documentation and common code (mainly system devices).
Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] x86, x86_64: dual core proc-cpuinfo and sibling-map fix
- broken sibling_map setup in x86_64
- grouping all the core and HT related cpuinfo fields.
We are reasonably sure that adding new cpuinfo fields after "siblings" field,
will not cause any app failure. Thats because today's /proc/cpuinfo
format is completely different on x86, x86_64 and we haven't heard of any
x86 app breakage because of this issue. Grouping these fields will
result in more or less common format on all architectures (ia64, x86 and
x86_64) and will cause less confusion.
[PATCH] x86_64: Switch SMP bootup over to new CPU hotplug state machine
This will allow hotplug CPU in the future and in general cleans up a lot of
crufty code. It also should plug some races that the old hackish way
introduces. Remove one old race workaround in NMI watchdog setup that is not
needed anymore.
I removed the old total sum of bogomips reporting code. The brag value of
BogoMips has been greatly devalued in the last years on the open market.
Real CPU hotplug will need some more work, but the infrastructure for it is
there now.
One drawback: the new TSC sync algorithm is less accurate than before. The
old way of zeroing TSCs is too intrusive to do later. Instead the TSC of the
BP is duplicated now, which is less accurate.
akpm:
- sync_tsc_bp_init seems to have the sense of `init' inverted.
- SPIN_LOCK_UNLOCKED is deprecated - use DEFINE_SPINLOCK.
[PATCH] x86_64: Rename the extended cpuid level field
It was confusingly named.
Signed-off-by: Andi Kleen <ak@suse.de>
DESC
x86_64: Switch SMP bootup over to new CPU hotplug state machine
EDESC
From: "Andi Kleen" <ak@suse.de>
This will allow hotplug CPU in the future and in general cleans up a lot of
crufty code. It also should plug some races that the old hackish way
introduces. Remove one old race workaround in NMI watchdog setup that is not
needed anymore.
I removed the old total sum of bogomips reporting code. The brag value of
BogoMips has been greatly devalued in the last years on the open market.
Real CPU hotplug will need some more work, but the infrastructure for it is
there now.
One drawback: the new TSC sync algorithm is less accurate than before. The
old way of zeroing TSCs is too intrusive to do later. Instead the TSC of the
BP is duplicated now, which is less accurate.
Exceptions and hardware interrupts can, to a certain degree, nest, so when
attempting to follow the sequence of stacks used in order to dump their
contents this has to be accounted for. Also, IST stacks have their tops
stored in the TSS, so there's no need to add the stack size to get to their
ends.
Minor changes from AK.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up the code greatly. Now uses the infrastructure from the Intel dual
core patch Should fix a final bug noticed by Tyan of not detecting the nodes
correctly in some corner cases.
Patch for x86-64 and i386
Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] x86_64: add support for Intel dual-core detection and displaying
Appended patch adds the support for Intel dual-core detection and displaying
the core related information in /proc/cpuinfo.
It adds two new fields "core id" and "cpu cores" to x86 /proc/cpuinfo and the
"core id" field for x86_64("cpu cores" field is already present in x86_64).
Number of processor cores in a die is detected using cpuid(4) and this is
documented in IA-32 Intel Architecture Software Developer's Manual (vol 2a)
(http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)
This patch also adds cpu_core_map similar to cpu_sibling_map.
[PATCH] x86_64: Keep only a single debug notifier chain
Calling a notifier three times in the debug handler does not make much sense,
because a debugger can figure out the various conditions by itself. Remove
the additional calls to DIE_DEBUG and DIE_DEBUGSTEP completely.
This matches what i386 does now.
This also makes sure interrupts are always still disabled when calling a
debugger, which prevents:
BUG: using smp_processor_id() in preemptible [00000001] code: tpopf/1470
caller is post_kprobe_handler+0x9/0x70
Could lead to a lost reschedule event when the process already rescheduled on
exception exit, and needs it again while still being in the kernel. Unlikely
case though.
Also remove one redundant cli in another entry.S path.