Jan Beulich [Mon, 26 Jun 2006 11:59:02 +0000 (13:59 +0200)]
[PATCH] x86_64: miscellaneous mm/init.c fixes
- fix an off-by-one error in phys_pmd_init()
- prevent phys_pmd_init() from removing mappings established earlier
- fix the direct mapping early printk to in fact show the end of the range
- remove an apparently orphan comment
Jacob Shin [Mon, 26 Jun 2006 11:58:50 +0000 (13:58 +0200)]
[PATCH] x86_64: mce_amd relocate sysfs files
Get rid of /sys/devices/system/threshold directory and move
mce_amd thresholding files into the machine sysfs directory --
/sys/devices/system/machinecheck.
Vojtech Pavlik [Mon, 26 Jun 2006 11:58:35 +0000 (13:58 +0200)]
[PATCH] x86_64: Explain why HPET T0_CMP register is written twice
After writing the CFG register, the first value written to the T0_CMP
register is the value at which next interrupt should be triggered, every
value after that sets the period of the interrupt. For that reason, the code
needs to write the value twice - to set both the phase and period.
[AK: I had already figured it out by myself, but it's still useful
to have a comment for this.]
Vojtech Pavlik [Mon, 26 Jun 2006 11:58:26 +0000 (13:58 +0200)]
[PATCH] x86_64: Add X86_FEATURE_RDTSCP, fix rdtscp in /proc/cpuinfo
This patch adds the X86_FEATURE_RDTSCP #define, so that kernel code can
check for the feature easily and also fixes the location of the "rdtscp"
string in the cpuinfo tables.
Vojtech Pavlik [Mon, 26 Jun 2006 11:58:20 +0000 (13:58 +0200)]
[PATCH] x86_64: Add useful constants to time.h
In timekeeping code, one often does need to use conversion constants. Naming
these leads to code that's easier to understand, showing the reader between
which units the conversion is made.
Rohit Seth [Mon, 26 Jun 2006 11:58:17 +0000 (13:58 +0200)]
[PATCH] x86_64: moving phys_proc_id and cpu_core_id to cpuinfo_x86
Most of the fields of cpuinfo are defined in cpuinfo_x86 structure.
This patch moves the phys_proc_id and cpu_core_id for each processor to
cpuinfo_x86 structure as well.
Jon Mason [Mon, 26 Jun 2006 11:58:14 +0000 (13:58 +0200)]
[PATCH] x86_64: Calgary IOMMU - Calgary specific bits
This patch hooks Calgary into the build, the x86-64 IOMMU
initialization paths, and introduces the Calgary specific bits. The
implementation draws inspiration from both PPC (which has support for
the same chip but requires firmware support which we don't have on
x86-64) and gart. Calgary is different from gart in that it support a
translation table per PHB, as opposed to the single gart aperture.
Changes from previous version:
* Addition of boot-time disablement for bus-level translation/isolation
(e.g, enable userspace DMA for things like X)
* Usage of newer IOMMU abstraction functions
This patch creates a new interface for IOMMUs by adding a centralized
location for IOMMU allocation (for translation tables/apertures) and
IOMMU initialization. In creating these, code was moved around for
abstraction, uniformity, and consiceness.
Take note of the move of the iommu_setup bootarg parsing code to
__setup. This is enabled by moving back the location of the aperture
allocation/detection to mem init (which while ugly, was already the
location of the swiotlb_init).
While a slight departure from the previous patch, I belive this provides
the true intention of the previous versions of the patch which changed
this code. It also makes the addition of the upcoming calgary code much
cleaner than previous patches.
[AK: Removed one broken change. iommu_setup still has to be called
early]
swiotlb relies on the gart specific iommu_aperture variable to know if
we discovered a hardware IOMMU before swiotlb initialization. Introduce
iommu_detected to do the same thing, but in a HW IOMMU neutral manner,
in preparation for adding the Calgary HW IOMMU.
Jan Beulich [Mon, 26 Jun 2006 11:57:41 +0000 (13:57 +0200)]
[PATCH] i386: reliable stack trace support (i386)
These are the i386-specific pieces to enable reliable stack traces. This is
going to be even more useful once CFI annotations get added to he assembly
code, namely to entry.S.
Jan Beulich [Mon, 26 Jun 2006 11:57:38 +0000 (13:57 +0200)]
[PATCH] x86_64: reliable stack trace support (x86-64 syscall
Adjust the CFA offset for 64- and 32-bit syscall entries so that the five
slots pre-subtracted from the stack pointer do not appear to reside outside
of the current frame.
Jan Beulich [Mon, 26 Jun 2006 11:57:32 +0000 (13:57 +0200)]
[PATCH] x86_64: reliable stack trace support (x86-64)
These are the x86_64-specific pieces to enable reliable stack traces. The
only restriction with this is that it currently cannot unwind across the
interrupt->normal stack boundary, as that transition is lacking proper
annotation.
Jan Beulich [Mon, 26 Jun 2006 11:57:28 +0000 (13:57 +0200)]
[PATCH] x86_64: reliable stack trace support
These are the generic bits needed to enable reliable stack traces based
on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will
enable x86-64 and i386 to make use of this.
Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities
for improvement.
bibo,mao [Mon, 26 Jun 2006 11:57:25 +0000 (13:57 +0200)]
[PATCH] x86_64: x86_86 msi miss one entry handler
In x86_64 architecture, if device driver with msi function
gets 0xee vector by assign_irq_vector() function, system will
crash if this interrupt happens. It is because 0xee interrupt
entry is empty. This patch modifies this. This patch is based
on 2.6.17-rc6.
Andi Kleen [Mon, 26 Jun 2006 11:57:22 +0000 (13:57 +0200)]
[PATCH] x86_64: Rename IOMMU option, fix help and mark option embedded.
- Rename the GART_IOMMU option to IOMMU to make clear it's not
just for AMD
- Rewrite the help text to better emphatise this fact
- Make it an embedded option because too many people get it wrong.
To my astonishment I discovered the aacraid driver tests this
symbol directly. This looks quite broken to me - it's an internal
implementation detail of the PCI DMA API. Can the maintainer
please clarify what this test was intended to do?
Ingo Molnar [Mon, 26 Jun 2006 11:57:16 +0000 (13:57 +0200)]
[PATCH] x86_64: fix vector_lock deadlock in io_apic.c
Fix a potential deadlock scenario introduced by io_apic.c's new vector_lock
on i386 and x86_64.
Found by the locking correctness validator. The patch was boot-tested on
x86. For details of the deadlock scenario, see the validator output:
======================================================
[ BUG: hard-safe -> hard-unsafe lock order detected! ]
------------------------------------------------------
idle/1 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(msi_lock){....}, at: [<c04ff8d2>] startup_msi_irq_wo_maskbit+0x10/0x35
and this task is already holding:
(&irq_desc[i].lock){++..}, at: [<c015b924>] probe_irq_on+0x36/0x107
which would create a new lock dependency:
(&irq_desc[i].lock){++..} -> (msi_lock){....}
but this new dependency connects a hard-irq-safe lock:
(&irq_desc[i].lock){++..}
... which became hard-irq-safe at:
[<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c015aff5>] __do_IRQ+0x3d/0x113
[<c01062d3>] do_IRQ+0x8c/0xad
to a hard-irq-unsafe lock:
(vector_lock){--..}
... which became hard-irq-unsafe at:
... [<c01468c4>] lockdep_acquire+0x68/0x84
[<c10485e9>] _spin_lock+0x21/0x2f
[<c011b5e8>] assign_irq_vector+0x34/0xc8
[<c1aa82fa>] setup_IO_APIC+0x45a/0xcff
[<c1aa56e3>] smp_prepare_cpus+0x5ea/0x8aa
[<c010033f>] init+0x32/0x2cb
[<c0102005>] kernel_thread_helper+0x5/0xb
which could potentially lead to deadlocks!
other info that might help us debug this:
3 locks held by idle/1:
#0: (port_mutex){--..}, at: [<c067070d>] uart_add_one_port+0x61/0x289
#1: (&state->mutex){--..}, at: [<c067071f>] uart_add_one_port+0x73/0x289
#2: (&irq_desc[i].lock){++..}, at: [<c015b924>] probe_irq_on+0x36/0x107
Jon Mason [Mon, 26 Jun 2006 11:57:13 +0000 (13:57 +0200)]
[PATCH] x86_64: remove unused gart header file
include/asm-x86_64/gart-mapping.h is only ever used in
arch/x86_64/kernel/setup.c and none of its contents are referenced.
Looks to be leftover cruft not removed in the dma_ops patch.
Andi Kleen [Mon, 26 Jun 2006 11:57:07 +0000 (13:57 +0200)]
[PATCH] x86_64: Remove ia32_sys_call_table export
It was originally added for 2.4 oprofile, but 2.6 oprofile doesn't
need that anymore. Shouldn't be any use in tree anymore and it doesn't
make much sense to export the ia32 syscalls when the main syscalls
are not exported.
I think Adrian Bunk asked for removing it several times.
Also included hunk from Adrian to remove the .globl ia32_sys_call_table
Andi Kleen [Mon, 26 Jun 2006 11:56:52 +0000 (13:56 +0200)]
[PATCH] x86_64: Add compat_printk and sysctl to turn off compat layer warnings
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.
Andi Kleen [Mon, 26 Jun 2006 11:56:40 +0000 (13:56 +0200)]
[PATCH] x86_64: Clean and enhance up K8 northbridge access code
- Factor out the duplicated access/cache code into a single file
* Shared between i386/x86-64.
- Share flush code between AGP and IOMMU
* Fix a bug: AGP didn't wait for end of flush before
- Drop 8 northbridges limit and allocate dynamically
- Add lock to serialize AGP and IOMMU GART flushes
- Add PCI ID for next AMD northbridge
- Random related cleanups
The old K8 NUMA discovery code is unchanged. New systems
should all use SRAT for this.
Cc: "Navin Boppuri" <navin.boppuri@newisys.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jon Mason [Mon, 26 Jun 2006 11:56:37 +0000 (13:56 +0200)]
[PATCH] x86_64: trivial gart clean-up
A trivial change to have gart_unmap_sg call gart_unmap_single directly,
instead of bouncing through the dma_unmap_single wrapper in
dma-mapping.h.
This change required moving the gart_unmap_single above gart_unmap_sg,
and under gart_map_single (which seems a more logical place that its
current location IMHO).
Ingo Molnar [Mon, 26 Jun 2006 11:56:25 +0000 (13:56 +0200)]
[PATCH] x86_64: x86_64-enable-large-bzImage.patch
enable large bzImages on x86_64. (fix is from x86's build.c) Using this
patch i have successfully built and booted an allyesconfig 13MB+ bzImage
on x86_64 too:
Gerd Hoffmann [Mon, 26 Jun 2006 11:56:16 +0000 (13:56 +0200)]
[PATCH] x86_64: x86_64 version of the smp alternative patch.
Changes are largely identical to the i386 version:
* alternative #define are moved to the new alternative.h file.
* one new elf section with pointers to the lock prefixes which can be
nop'ed out for non-smp.
* two new elf sections simliar to the "classic" alternatives to
replace SMP code with simpler UP code.
* fixup headers to use alternative.h instead of defining their own
LOCK / LOCK_PREFIX macros.
The patch reuses the i386 version of the alternatives code to avoid code
duplication. The code in alternatives.c was shuffled around a bit to
reduce the number of #ifdefs needed. It also got some tweaks needed for
x86_64 (vsyscall page handling) and new features (noreplacement option
which was x86_64 only up to now). Debug printk's are changed from
compile-time to runtime.
Loosely based on a early version from Bastian Blank <waldi@debian.org>
Andi Kleen [Mon, 26 Jun 2006 11:56:13 +0000 (13:56 +0200)]
[PATCH] i386/x86-64: Emulate CPUID4 on AMD
Intel systems report the cache level data from CPUID 4 in sysfs.
Add a CPUID 4 emulation for AMD CPUs to report the same
information for them. This allows programs to read this
information in a uniform way.
The AMD way to report this is less flexible so some assumptions
are hardcoded (e.g. no L3)
Andi Kleen [Mon, 26 Jun 2006 11:56:10 +0000 (13:56 +0200)]
[PATCH] i386/x86-64: Use new official CPUID to get APICID/core split on AMD platforms
Previously the apicid<->coreid split was computed based on the max
number of cores. Now use a new CPUID AMD defined for that. On most
systems right now it should be 0 and the old method will be used.
[PATCH] x86_64: Use local APIC ID from local APIC instead of CPUID
vSMPowered systems use apic_cluster too. Forcing apic_physflat works
on these systems too, but only if we change phys_pkg_id to use
hard_smp_prcoessor_id() instead of cpuid_ebx. I am guessing other
multichassi cluster systems would need this too.
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC]: Add iomap interfaces.
[OPENPROM]: Rewrite driver to use in-kernel device tree.
[OPENPROMFS]: Rewrite using in-kernel device tree and seq_file.
[SPARC]: Add unique device_node IDs and a ".node" property.
[SPARC]: Add of_set_property() interface.
[SPARC64]: Export auxio_register to modules.
[SPARC64]: Add missing interfaces to dma-mapping.h
[SPARC64]: Export _PAGE_IE to modules.
[SPARC64]: Allow floppy driver to build modular.
[SPARC]: Export x_bus_type to modules.
[RIOWATCHDOG]: Fix the build.
[CPWATCHDOG]: Fix the build.
[PARPORT] sunbpp: Fix typo.
[MTD] sun_uflash: Port to new EBUS device layer.
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
[IOAT]: Do not dereference THIS_MODULE directly to set unsafe.
[NETROM]: Fix possible null pointer dereference.
[NET] netpoll: break recursive loop in netpoll rx path
[NET] netpoll: don't spin forever sending to stopped queues
[IRDA]: add some IBM think pads
[ATM]: atm/mpc.c warning fix
[NET]: skb_find_text ignores to argument
[NET]: make net/core/dev.c:netdev_nit static
[NET]: Fix GSO problems in dev_hard_start_xmit()
[NET]: Fix CHECKSUM_HW GSO problems.
[TIPC]: Fix incorrect correction to discovery timer frequency computation.
[TIPC]: Get rid of dynamically allocated arrays in broadcast code.
[TIPC]: Fixed link switchover bugs
[TIPC]: Enhanced & cleaned up system messages; fixed 2 obscure memory leaks.
[TIPC]: First phase of assert() cleanup
[TIPC]: Disallow config operations that aren't supported in certain modes.
[TIPC]: Fixed memory leak in tipc_link_send() when destination is unreachable
[TIPC]: Added missing warning for out-of-memory condition
[TIPC]: Withdrawing all names from nameless port now returns success, not error
[TIPC]: Optimized argument validation done by connect().
...
Peter Williams [Mon, 26 Jun 2006 06:58:00 +0000 (16:58 +1000)]
[PATCH] sched: fix SCHED_FIFO bug in sys_sched_rr_get_interval()
The introduction of SCHED_BATCH scheduling class with a value of 3 means
that the expression (p->policy & SCHED_FIFO) will return true if policy
is SCHED_BATCH or SCHED_FIFO.
Unfortunately, this expression is used in sys_sched_rr_get_interval()
and in the absence of a comment to say that this is intentional I
presume that it is unintentional and erroneous.
The fix is to change the expression to (p->policy == SCHED_FIFO).
Jesper Juhl [Mon, 26 Jun 2006 17:01:01 +0000 (19:01 +0200)]
Clean up 'inline is not at beginning' warnings for usb storage
Usually we don't care much about 'gcc -W' warnings, but some of us do build
kernels that way to look for problems, and then the fewer warnings we have
to wade through the better. Especially when they are very easy and
non-intrusive to clean up. Which is the case for the following warnings
spewed by drivers/usb/storage/usb.h :
drivers/usb/storage/usb.h:163: warning: `inline' is not at beginning of
+declaration
drivers/usb/storage/usb.h:166: warning: `inline' is not at beginning of
+declaration
There's also some precedence for cleaning up these warnings. I've had
a few patches merged in the past that remove exactly this class of
warnings.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
H. Peter Anvin [Mon, 26 Jun 2006 07:28:02 +0000 (00:28 -0700)]
[PATCH] initramfs overwrite fix
This patch ensures that initramfs overwrites work correctly, even when dealing
with device nodes of different types. Furthermore, when replacing a file
which already exists, we must make very certain that we truncate the existing
file.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Mon, 26 Jun 2006 07:28:01 +0000 (00:28 -0700)]
[PATCH] drivers/md/md.c: make code static
Make needlessly global code static.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:58 +0000 (00:27 -0700)]
[PATCH] md: Set/get state of array via sysfs
This allows the state of an md/array to be directly controlled via sysfs and
adds the ability to stop and array without tearing it down.
Array states/settings:
clear
No devices, no size, no level
Equivalent to STOP_ARRAY ioctl
inactive
May have some settings, but array is not active
all IO results in error
When written, doesn't tear down array, but just stops it
suspended (not supported yet)
All IO requests will block. The array can be reconfigured.
Writing this, if accepted, will block until array is quiescent
readonly
no resync can happen. no superblocks get written.
write requests fail
read-auto
like readonly, but behaves like 'clean' on a write request.
clean - no pending writes, but otherwise active.
When written to inactive array, starts without resync
If a write request arrives then
if metadata is known, mark 'dirty' and switch to 'active'.
if not known, block and switch to write-pending
If written to an active array that has pending writes, then fails.
active
fully active: IO and resync can be happening.
When written to inactive array, starts with resync
write-pending (not supported yet)
clean, but writes are blocked waiting for 'active' to be written.
active-idle
like active, but no writes have been seen for a while (100msec).
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:57 +0000 (00:27 -0700)]
[PATCH] md: Don't write dirty/clean update to spares - leave them alone
- record the 'event' count on each individual device (they
might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
block only needs to be updated to record a clean<->dirty
transition.
- Prefer odd event numbers for dirty states and even numbers
for clean states
- Using all the above, don't update the superblock on
a spare device if the update is just doing a clean-dirty
transition. To accomodate this, a transition from
dirty back to clean might now decrement the events counter
if nothing else has changed.
The net effect of this is that spare drives will not see any IO requests
during normal running of the array, so they can go to sleep if that is what
they want to do.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:56 +0000 (00:27 -0700)]
[PATCH] md: Allow re-add to work on array without bitmaps
When an array has a bitmap, a device can be removed and re-added and only
blocks changes since the removal (as recorded in the bitmap) will be resynced.
It should be possible to do a similar thing to arrays without bitmaps. i.e.
if a device is removed and re-added and *no* changes have been made in the
interim, then the add should not require a resync.
This patch allows that option. This means that when assembling an array one
device at a time (e.g. during device discovery) the array can be enabled
read-only as soon as enough devices are available, but extra devices can still
be added without causing a resync.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:55 +0000 (00:27 -0700)]
[PATCH] md: Fix bug that stops raid5 resync from happening
As data_disks is *less* than raid_disks, the current test here is obviously
wrong. And as the difference is already available in conf->max_degraded, it
makes much more sense to use that.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:49 +0000 (00:27 -0700)]
[PATCH] md: Change md/bitmap file handling to use bmap to file blocks-fix
Fix problems with new bmap based access to bitmap files.
1/ When not using a file based bitmap, attach a NULL list of buffers
to each page so the common free_buffer routine can cope.
2/ Use submit_bh to read as well as write, rather than vfs_read.
This makes read and write more symetric.
3/ sync the file before reading, to ensure that the page cache has no
dirty pages that might get written out later.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:48 +0000 (00:27 -0700)]
[PATCH] md/bitmap: change md/bitmap file handling to use bmap to file blocks
If md is asked to store a bitmap in a file, it tries to hold onto the page
cache pages for that file, manipulate them directly, and call a cocktail of
operations to write the file out. I don't believe this is a supportable
approach.
This patch changes the approach to use the same approach as swap files. i.e.
bmap is used to enumerate all the block address of parts of the file and we
write directly to those blocks of the device.
swapfile only uses parts of the file that provide a full pages at contiguous
addresses. We don't have that luxury so we have to cope with pages that are
non-contiguous in storage. To handle this we attach buffers to each page, and
store the addresses in those buffers.
With this approach the pagecache may contain data which is inconsistent with
what is on disk. To alleviate the problems this can cause, md invalidates the
pagecache when releasing the file. If the file is to be examined while the
array is active (a non-critical but occasionally useful function), O_DIRECT io
must be used. And new version of mdadm will have support for this.
This approach simplifies a lot of code:
- we no longer need to keep a list of pages which we need to wait for,
as the b_endio function can keep track of how many outstanding
writes there are. This saves a mempool.
- -EAGAIN returns from write_page are no longer possible (not sure if
they ever were actually).
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:47 +0000 (00:27 -0700)]
[PATCH] md/bitmap: tidy up i_writecount handling in md/bitmap
md/bitmap modifies i_writecount of a bitmap file to make sure that no-one else
writes to it. The reverting of the change is sometimes done twice, and there
is one error path where it is omitted.
This patch tidies that up.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:46 +0000 (00:27 -0700)]
[PATCH] md/bitmap: remove unnecessary page reference manipulations from md/bitmap code
md/bitmap gets a collection of pages representing the bitmap when it
initialises the bitmap, and puts all the references when discarding the
bitmap.
It also occasionally takes extra references without any good reason, and
sometimes drops them ... though it doesn't always drop them, which can result
in a memory leak.
This patch removes the unnecessary 'get_page' calls, and the corresponding
'put_page' calls.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:45 +0000 (00:27 -0700)]
[PATCH] md/bitmap: cleaner separation of page attribute handlers in md/bitmap
md/bitmap has some attributes per-page. Handling of these attributes in
largely abstracted in set_page_attr and clear_page_attr. However
get_page_attr exposes the format used to store them. So prior to changing
that format, introduce test_page_attr instead of get_page_attr, and make
appropriate usage changes.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:44 +0000 (00:27 -0700)]
[PATCH] md/bitmap: remove bitmap writeback daemon
md/bitmap currently has a separate thread to wait for writes to the bitmap
file to complete (as we cannot get a callback on that action).
However this isn't needed as bitmap_unplug is called from process context and
waits for the writeback thread to do it's work. The same result can be
achieved by doing the waiting directly in bitmap_unplug.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:43 +0000 (00:27 -0700)]
[PATCH] md/bitmap: fix online removal of file-backed bitmaps
When "mdadm --grow /dev/mdX --bitmap=none" is used to remove a filebacked
bitmap, the bitmap was disconnected from the array, but the file wasn't closed
(until the array was stopped).
The file also wasn't closed if adding the bitmap file failed.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Mon, 26 Jun 2006 07:27:42 +0000 (00:27 -0700)]
[PATCH] md: make md_print_devices() static
This patch makes the needlessly global md_print_devices() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:41 +0000 (00:27 -0700)]
[PATCH] md: support stripe/offset mode in raid10
The "industry standard" DDF format allows for a stripe/offset layout where
data is duplicated on different stripes. e.g.
A B C D
D A B C
E F G H
H E F G
(columns are drives, rows are stripes, LETTERS are chunks of data).
This is similar to raid10's 'far' mode, but not quite the same. So enhance
'far' mode with a 'far/offset' option which follows the layout of DDFs
stripe/offset.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:40 +0000 (00:27 -0700)]
[PATCH] md: allow checkpoint of recovery with version-1 superblock
For a while we have had checkpointing of resync. The version-1 superblock
allows recovery to be checkpointed as well, and this patch implements that.
Due to early carelessness we need to add a feature flag to signal that the
recovery_offset field is in use, otherwise older kernels would assume that a
partially recovered array is in fact fully recovered.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:38 +0000 (00:27 -0700)]
[PATCH] md: merge raid5 and raid6 code
There is a lot of commonality between raid5.c and raid6main.c. This patches
merges both into one module called raid456. This saves a lot of code, and
paves the way for online raid5->raid6 migrations.
There is still duplication, e.g. between handle_stripe5 and handle_stripe6.
This will probably be cleaned up later.
Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:37 +0000 (00:27 -0700)]
[PATCH] md: increase the delay before marking metadata clean, and make it configurable
When a md array has been idle (no writes) for 20msecs it is marked as 'clean'.
This delay turns out to be too short for some real workloads. So increase it
to 200msec (the time to update the metadata should be a tiny fraction of that)
and make it sysfs-configurable.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:37 +0000 (00:27 -0700)]
[PATCH] md: remove useless ioctl warning
This warning was slightly useful back in 2.2 days, but is more an annoyance
now. It makes it awkward to add new ioctls (that we we are likely to do that
in the current climate, but it is possible).
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Mon, 26 Jun 2006 07:27:36 +0000 (00:27 -0700)]
[PATCH] md: remove arbitrary limit on chunk size
The largest chunk size the code can support without substantial surgery is
2^30 bytes, so make that the limit instead of an arbitrary 4Meg. Some day,
the 'chunksize' should change to a sector-shift instead of a byte-count. Then
no limit would be needed.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>